ARIMA fit failing for small set of data due to invalid maxlag #1146

feathj · 2013-10-24T16:43:33Z

I am not sure if this is expected behavior or not, but ARIMA models which have endog of length less than or equal to 5 will always fail due to maxlag < nobs check.

Maxlag is calculated with the following value if not provided:

int(round(12*(nobs/100.)**(1/4.)))

Since the call to the parent "fit" method does not explicitly pass the maxlag argument, it is calculated using the formula above and results in this error:

error with ARIMA model (4, 1, 2):  maxlag should be < nobs
Traceback (most recent call last):
  File "/Users/jfeatherstone/dev/deos/post_processing/market_data/mrforecaster.py", line 126, in createModel
    results = model.fit(disp=-1, exog=range(1,len(items)), maxiter=100)
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/arima_model.py", line 1005, in fit
    full_output, disp, callback, **kwargs)
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/arima_model.py", line 828, in fit
    start_params = self._fit_start_params((k_ar,k_ma,k), method)
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/arima_model.py", line 453, in _fit_start_params
    start_params = self._fit_start_params_hr(order)
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/arima_model.py", line 399, in _fit_start_params_hr
    armod = AR(endog).fit(ic='bic', trend='nc')
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/ar_model.py", line 548, in fit
    k_ar = self.select_order(k_ar, ic, trend, method)
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/ar_model.py", line 425, in select_order
    X = self._stackX(maxlag, trend) # sets k_trend
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/ar_model.py", line 393, in _stackX
    X = lagmat(endog, maxlag=k_ar, trim='both')
  File "/Users/jfeatherstone/dev/virtualenvs/deos/lib/python2.7/site-packages/statsmodels/tsa/tsatools.py", line 305, in lagmat
    raise ValueError("maxlag should be < nobs")
ValueError: maxlag should be < nobs

Note that an arima call in "R" with 5 items completes successfully.

The text was updated successfully, but these errors were encountered:

jseabold · 2013-10-24T17:27:23Z

Can you post some example data that returns successfully in R?

feathj · 2013-10-24T17:31:54Z

Given:
(2008, -1214.360173) (2009, -1848.209905), (2010, -2100.918158) (2011, -3647.483678) (2012, -4711.186773)
Order (2, 0, 3)

Forecasted anualy to 2022 in R:
-4803.626759 -6027.364361 -7466.85036 -7631.179054 -8431.198696 -10081.71807 -10528.78902 -10929.02852 -12563.78066 -13421.09138

feathj · 2013-10-24T17:32:06Z

Apologies for the formatting

josef-pkt · 2013-10-24T17:36:21Z

The forecasts in R are not stationary. Did you include a trend? Or, does R not impose stationarity for ARIMA.

jseabold · 2013-10-24T17:40:10Z

Yeah, I'm wary of a 'successful' completion here. It's why I ask to see data. My guess is that R just dumped something, anything back. That's been my experience in the past. You'll almost never get the same answer across stats packages for a problem like this. There's just not enough information to estimate anything meaningful here.

y <- c(-1214.360173, -1848.209905, -2100.918158, -3647.483678, -4711.186773)
arima(y, c(2,0,3))

Call:
arima(x = y, order = c(2, 0, 3))

Coefficients:
         ar1      ar2      ma1     ma2     ma3  intercept
      1.7587  -0.9983  -1.0965  0.4163  0.0255  -3767.755
s.e.  0.7167   0.0117   1.5237  1.0616  0.3522   3872.650

sigma^2 estimated as 87412:  log likelihood = -40.25,  aic = 94.5
Warning messages:
1: In log(s2) : NaNs produced
2: In log(s2) : NaNs produced
3: In log(s2) : NaNs produced
4: In log(s2) : NaNs produced

jseabold · 2013-10-24T17:40:54Z

R does impose stationarity but it doesn't impose that what you get back is anything that makes sense IME.

josef-pkt · 2013-10-24T17:51:09Z

However, adjusting the maxlag is a still valid bug/enhancement.

We might be able to estimate "reasonably ok" an ARMA(1,1) or ARMA(0,1) in this case.

jseabold · 2013-10-24T17:51:13Z

gretl and X-12 ARIMA refuse to estimate the model. Stata estimates it (after going through 90 iterations with a lot of switches in the optimization algorithm and backing up) but warns that there is not sufficient data and is unable to calculate any inferential statistics for the constant or the sigma. The parameters it gives are different than R (and are more or less meaningless)

            v1:       ARMA:       ARMA:       ARMA:       ARMA:       ARMA:      sigma:
                         L.         L2.          L.         L2.         L3.            
         _cons          ar          ar          ma          ma          ma       _cons
y1  -3475.7847   1.6594282  -.99558649  -1.9151185   1.9151354  -1.0000227    179.6545

I somewhat recall doing this exercise before and deciding it's better to just refuse to do any estimation here rather than try to give some garbage back. Thoughts?

jseabold · 2013-10-24T17:52:28Z

I can certainly fix the maxlags issue though. Maybe make it a little more informative, if and when we can't estimate a model.

feathj · 2013-10-24T18:38:21Z

Your comments make sense. I am developing a system that was prototyped using the R arima package, and I am seeing how far off we are from the original forecasting logic. I am not sure if returning a series (albeit not very accurate) or giving a better error message makes more sense, but I think that it would really help anyone with a similar problem going forward. Thanks for your prompt response @jseabold @josef-pkt

josef-pkt · 2013-10-24T19:04:44Z

@feathj Please report back with any agreement or disagreement that you find out.

To your original data, just so we have an idea about use cases: Are you using ARIMA or time series analysis on very short samples? or was this just a test case?

There are several places where the very small sample behavior for time series analysis hasn't been checked, because most cases in our background (economics) have at least a hundred or a few thousand observations.
for example PR #1018 just adds the minimum nobs check to the grangercausalitytest.

josef-pkt · 2013-10-24T19:11:34Z

I am not sure if returning a series (albeit not very accurate) or giving a better error message makes more sense

If I had to bet my money on the outcome of the ARIMA forecast in this case, I would refuse the bet.

feathj · 2013-10-24T19:15:23Z

@josef-pkt Unfortunately, some of our datasets are that small. We are working with annual data and some of our sources do not have anything before 2008. Thankfully, human interaction occurs after the fact to fix most of the obvious issues.

feathj · 2013-10-24T19:15:55Z

If I had to bet my money on the outcome of the ARIMA forecast in this case, I would refuse the bet.

Agreed :)

jseabold · 2013-10-25T10:53:55Z

See #1149 for a discussion of the behavior I decided on. I'll wait for any comments on that PR.

This closes #1146 and #1046. 1. We now check to make sure that we have at least one degree of freedom to estimate the problem. If so, then we try the estimation. 1. Most / all of these estimations will return garbage. We have an extra check that we can estimate stationary initial params. Usually we can't in these cases, so the usual error will be raised here asking to set start_params. This should be enough of a warning to the user that this is "odd." If in the small chance the estimation goes through for a model with 5 observations and 1 degree of freedom, it's on the user then to determine things are no good. 2. We now avoid the problem of maxlag >= nobs happening in the call to AR so this avoids the problem of #1046 that also presented itself as part of #1146.

josef-pkt mentioned this issue Oct 24, 2013

TST: test_arima_small_data_bug on current master #1046

Closed

jseabold added a commit to jseabold/statsmodels that referenced this issue Oct 25, 2013

BUG: Don't let maxlag > nobs. Closes statsmodels#1146.

6cc6c6a

jseabold added a commit to jseabold/statsmodels that referenced this issue Oct 25, 2013

TST: Add test for statsmodels#1146.

b70da1b

jseabold mentioned this issue Oct 25, 2013

BUG: Fix small data issues for ARIMA. #1149

Merged

jseabold closed this as completed in #1149 Oct 25, 2013

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

BUG: Don't let maxlag > nobs. Closes statsmodels#1146.

e745d01

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014

TST: Add test for statsmodels#1146.

a4dc339

josef-pkt mentioned this issue May 18, 2015

ar model maxlag default setting #2415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARIMA fit failing for small set of data due to invalid maxlag #1146

ARIMA fit failing for small set of data due to invalid maxlag #1146

feathj commented Oct 24, 2013

jseabold commented Oct 24, 2013

feathj commented Oct 24, 2013

feathj commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

jseabold commented Oct 24, 2013

jseabold commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

jseabold commented Oct 24, 2013

jseabold commented Oct 24, 2013

feathj commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

feathj commented Oct 24, 2013

feathj commented Oct 24, 2013

jseabold commented Oct 25, 2013

ARIMA fit failing for small set of data due to invalid maxlag #1146

ARIMA fit failing for small set of data due to invalid maxlag #1146

Comments

feathj commented Oct 24, 2013

jseabold commented Oct 24, 2013

feathj commented Oct 24, 2013

feathj commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

jseabold commented Oct 24, 2013

jseabold commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

jseabold commented Oct 24, 2013

jseabold commented Oct 24, 2013

feathj commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

josef-pkt commented Oct 24, 2013

feathj commented Oct 24, 2013

feathj commented Oct 24, 2013

jseabold commented Oct 25, 2013