ValueError on tsa with constant column #2043

Closed
DavideCanton opened this Issue Oct 13, 2014 · 16 comments

Projects

None yet

4 participants

@DavideCanton

Hi, the following code:

__author__ = 'davide'

import datetime
import numpy as np

import pandas as pd
from statsmodels.tsa.vector_ar.var_model import VAR


def compute_coeffs(series, p, index=None):
    data = pd.DataFrame(series)

    if index is None:
        d = datetime.datetime.now()
        delta = datetime.timedelta(days=1)
        index = []
        for i in range(data.shape[0]):
            index.append(d)
            d += delta

    data.index = pd.DatetimeIndex(index)

    model = VAR(data)
    res = model.fit(p)
    return res.intercept, res.coefs


if __name__ == "__main__":
    series = np.array([[2., 2.], [1, 2.], [1, 2.], [1, 2.], [1., 2.]])
    coeffs = compute_coeffs(series, 1)

    print(coeffs)

gives me the following error:

ValueError: total size of new array must be unchanged

This is because the add_constant function in tools.py skips the addition on a column when there is already a constant column in the data (from the documentation of the function). The problem is that I can't estimate the coefficients of the model where there is a constant column such that. I commented out rows 290-292 in tools.py as a workaround, but I would know if this is a feature or it is a bug. (At least change the error message or catch it and raise a more appropriated exception).

Bye

@psarka
psarka commented Nov 12, 2014

I was also surprised by add_constant behavior when .predict() failed in the middle of cross validation with ValueError: matrices are not aligned. Maybe a flag for always adding a constant?

@jseabold
Member

Thanks for the report. Sorry I didn't see this before the release. I'll look in to it.

@jseabold
Member

What do you think is a reasonable solution here? I'm leaning towards just raising an error about the constant already being present rather than trying to do anything fancy.

@jseabold
Member

@psarka Can you elaborate a little bit on your use case. It's not clear to me why I would want to force add a perfectly collinear column to the data.

@DavideCanton

Maybe add a parameter in order to select the desired behaviour. Personally I don't understand the ratio behind this choice...

@jseabold
Member

What choice do you mean? The choice not to allow two constants in a model? Or the choice (bug) to let this pass through without raising an error?

The reason not to allow it is that the effects of the two variables are not separately identified. The way it's implemented now, you get a (scaled) split of the constant that's more or less arbitrary across two variables just as a consequence of using the SVD to solve the least squares problem. Other approaches won't work with a RHS that's not full column rank.

@psarka
psarka commented Nov 13, 2014

@jseabold The way I ran into this problem was while doing forward cross validation for time series. I was splitting, training and testing, and that involved adding a constant to both training and testing sets (unless I am doing something wrong). The test set of one of the folds was rather small (6 rows) and happened to have a constant column. There is nothing wrong in having proportional columns in the test set, so in this case I'd like add_constant to add constant without checking for co-linearity.

Now that is a tiny problem, solved by two lines, but it took me some time to realize what was going on. Raising an error fixes that.

@jseabold
Member

Have a look at #2093. Basically just a better error message in this case.

@psarka
psarka commented Nov 16, 2014

Looks good to me, thanks!

@DavideCanton

I would prefer a flag to disable this behaviour.

@psarka
psarka commented Nov 17, 2014

So that's what has_constant='add' will do (that is, will always add constant).

@DavideCanton

Yes, and has_constant='none' (or None), should not add anything to the matrix.

@psarka
psarka commented Nov 17, 2014

I don't understand why you would want a function add_constant with a flag not to never add a constant. Just don't use the function?

Your code example has no direct use of add_constant anyway, so you will not able to provide the flag.

@DavideCanton

Sorry, there was a misunderstanding. I supposed you were talking of adding a flag to the model.fit() method, because it is the source of the exception in my code.

@jseabold
Member

There is already a flag to disable the behavior, pass trend='nc' to fit. I'm not sure I want to add another keyword that is there only to disable the behavior of another keyword. In general, I try to stay away from keywords that govern the behavior of other keywords. Can you just check for a constant beforehand and change the value of trend, or alternatively, you can use a try/except to catch the error.

@DavideCanton

Ok, so I guess a better error message could do the trick

@jseabold jseabold closed this in 885009a Nov 17, 2014
@jseabold jseabold added a commit that referenced this issue Dec 2, 2014
@jseabold jseabold Backport PR #2093: Add user control over what happens if a constant i…
…s already present.

Fix for #2043.
b1a1d6a
@josef-pkt josef-pkt added this to the 0.6.1 milestone Feb 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment