BUG : arima_model._get_predict_out_of_sample, ignores exogenous of there is no trend ? #1123

Closed
jsphon opened this Issue Oct 16, 2013 · 19 comments

Projects

None yet

2 participants

@jsphon
Contributor
jsphon commented Oct 16, 2013

Hello,

I have a feeling there is a problem with arima_model._get_predict_out_of_sample() if there is an exogenous variable, but no trend. Looking at the code, it just doesn't use exog if there is no trend.

I have produced a sample script to reproduce the problem.

In the sample, arma_mod2.fit() produces a forecast of zero. It produces a forecast of zero, whatever I set exog to be. I believe the forecast should be the 'my_forecast' variable.

Apologies in advance if I have misread / misunderstood the documentation or how it is supposed to work.

Example code:

import numpy as np
import pandas as pd

import statsmodels.api as sm
from statsmodels.tsa.arima_process import arma_generate_sample

np.random.seed(12345)

arparams = np.array([.75, -.25])
maparams = np.array([.65, .35])

arparam = np.r_[1, -arparams]
maparam = np.r_[1, maparams]

nobs = 20

dates = sm.tsa.datetools.dates_from_range('1980',length=nobs)

y  = arma_generate_sample(arparams, maparams, nobs)

df       = pd.DataFrame({'y':y},index=dates)
df['y1'] = df['y'].shift(1)

arma_mod2     = sm.tsa.ARMA( df['y'][1:] , order=(0,0), exog=df['y1'][1:] )
arma_res2     = arma_mod2.fit(trend='nc', disp=False)    
arma_forecast = arma_res2.forecast(exog = df['y'][-1])

my_forecast   = df['y'][-1] * arma_res2.params[0]

print(' y[-1]            : {0:0.4f}'.format( df['y'][-1] ) )
print( 'Exog coef        : {0:0.4f}'.format( arma_res2.params[0] ) )
print( 'Model Forecast   : {0:0.4f}'.format( arma_forecast[0] ) )
print( 'My Forecast      : {0:0.4f}'.format( my_forecast ) )
@jseabold
Member

What version are you using?

[Edit: nm. This problem exists in current master.]

@jseabold
Member

I think there is a problem here but this is an odd case. I'm not sure ARMA should support an order (0,0) fit. Many changes would have to be made.

@jseabold jseabold added a commit to jseabold/statsmodels that referenced this issue Oct 17, 2013
@jseabold jseabold BUG: Allow forecasting with trend=nc. Closes #1123. f012457
@jseabold
Member

#1124 should fix this problem (though it likely still doesn't work for a (0,0) order model).

@jsphon
Contributor
jsphon commented Oct 17, 2013

Thanks for fixing this! I am on version 0.6.0.dev-3b7082c which I downloaded from http://statsmodels.sourceforge.net/binaries/, though when I downloaded it a few days ago there was a version for python 3.3, which does not appear to be available now.

How do I install the latest version of the toolkit with this patch?

@jseabold
Member

It's not merged into master yet, so the fix won't be there, though you can apply the patch to the source yourself. I have tentative plans for a merge sprint this weekend.

@jseabold jseabold added a commit to jseabold/statsmodels that referenced this issue Oct 23, 2013
@jseabold jseabold BUG: Allow forecasting with trend=nc. Closes #1123. 82b89ec
@jseabold jseabold closed this in 82b89ec Oct 23, 2013
@jsphon
Contributor
jsphon commented Oct 23, 2013

Thanks for the update!

@jseabold
Member

Please try it out and let us know if you run into any difficulties.

@jsphon
Contributor
jsphon commented Oct 26, 2013

Hello,

I hope you are all having a great weekend. I have tried the update, but unfortunately it does not work for me.

To test the ARMA(0,0) model, I used an equivalent ARMA(1,0) model and a basic linear regression (which both agree). See the code at the bottom of this post.


The problem:
In arima_model.py,
Line 227 states:
#NOTE: you shouldn't have to give in-sample exog!
However, from stepping through the code, at this point exog contains the in-sample exog as well as the single exog that I have passed to it.
Also, my example code calls _get_predict_out_of_sample twice. Each time it requests a 1 step forecast. The first time, the returned mu is of length 1. The second time, the returned mu is of length 20. So I believe something is wrong here.


To explain my strange use case...
I would like to do time series analysis on a series of closing prices. e.g. Prices taken at end of every month.
However, I will only be able to act on the data on the 1st day of the next month.
So I want to predict price returns from 1st-last day of every month.
However, my historical monthly returns will be calculated from [last day of month] till the next [last day of month].
I am therefore using exogenous variables for [last day of month] to next [last day of month]. And using endogenous variables for [first to last day of month]
To see the code I have been using:

@jsphon
Contributor
jsphon commented Oct 26, 2013

import numpy as np
import pandas as pd

from sklearn import linear_model

import statsmodels.api as sm
from statsmodels.tsa.arima_process import arma_generate_sample

np.random.seed(12345)

arparams = np.array([.75, -.25])
maparams = np.array([.65, .35])

arparam = np.r_[1, -arparams]
maparam = np.r_[1, maparams]

nobs = 20

dates = sm.tsa.datetools.dates_from_range('1980',length=nobs)

y = arma_generate_sample(arparams, maparams, nobs)

df = pd.DataFrame({'y':y},index=dates)
df['y1'] = df['y'].shift(1)

print('data is : {0}'.format( y.str() ))
print('y[-1]={0}'.format(y[-1]))

arma_mod1 = sm.tsa.ARMA( df['y'], order=(1, 0) )
arma_res1 = arma_mod1.fit(trend='nc', disp=False, method='css' )
arma_fc1 = arma_res1.forecast( steps=1 )

arma_mod2 = sm.tsa.ARMA( df['y'][1:] , order=(0,0), exog=df['y1'][1:] )
arma_res2 = arma_mod2.fit(trend='nc', disp=False)
arma_fc2 = arma_res2.forecast( steps=1, exog=[ y[-1] ] )

clf1 = linear_model.LinearRegression( fit_intercept=False)
fX = [[v] for v in df['y1'][1:].values]
fy = df['y'][1:].values
clf1.fit(fX,fy)
clf_pred = clf1.predict(y[-1])

print( 'ARMA(1,0) coef : {0:0.4f}'.format( arma_res1.params[0] ) )
print( 'ARMA(0,0) coef : {0:0.4f}'.format( arma_res2.params[0] ) )
print( 'LinReg coef1 : {0:0.4f}'.format( clf1.coef_[0]) )

print( 'ARMA(1,0) FC : {0:0.4f}'.format( arma_fc1[0] ) )
print( 'ARMA(0,0) FC : {0:0.4f}'.format( arma_fc2[0][0] ) )
print( 'LinReg Pred : {0:0.4f}'.format( clf_pred[0] ) )

@jseabold jseabold reopened this Oct 26, 2013
@jseabold
Member

On Sat, Oct 26, 2013 at 11:37 AM, Barnyard notifications@github.com wrote:

Hello,
I hope you are all having a great weekend. I have tried the update, but
unfortunately it does not work for me.
To test the ARMA(0,0) model, I used an equivalent ARMA(1,0) model and a
basic linear regression (which both agree). See the code at the bottom of
this post.

The problem:
In arima_model.py,
Line 227 states:
#NOTE: you shouldn't have to give in-sample exog!
However, from stepping through the code, at this point exog contains the
in-sample exog as well as the single exog that I have passed to it.

Yes, this is intentional. I mean that the users shouldn't have to include
in-sample lagged exogenous variables when they're doing prediction, which
they did before.

@jseabold
Member

On Sat, Oct 26, 2013 at 11:37 AM, Barnyard notifications@github.com wrote:

Also, my example code calls getpredict_out_of_sample twice. Each time it requests a 1 step forecast. The first time, the returned mu is of length 1. The second time, the returned mu is of length 20. So I believe something is wrong here.

Can you post code that replicates this?

@jseabold
Member

On Sat, Oct 26, 2013 at 11:37 AM, Barnyard notifications@github.com wrote:

import numpy as np
import pandas as pd

from sklearn import linear_model

import statsmodels.api as sm
from statsmodels.tsa.arima_process import arma_generate_sample

np.random.seed(12345)

arparams = np.array([.75, -.25])
maparams = np.array([.65, .35])

arparam = np.r_[1, -arparams]
maparam = np.r_[1, maparams]

nobs = 20

dates = sm.tsa.datetools.dates_from_range('1980',length=nobs)

y = arma_generate_sample(arparams, maparams, nobs)

df = pd.DataFrame({'y':y},index=dates)
df['y1'] = df['y'].shift(1)

print('data is : {0}'.format( y.str() ))
print('y[-1]={0}'.format(y[-1]))

arma_mod1 = sm.tsa.ARMA( df['y'], order=(1, 0) )
arma_res1 = arma_mod1.fit(trend='nc', disp=False, method='css' )
arma_fc1 = arma_res1.forecast( steps=1 )

arma_mod2 = sm.tsa.ARMA( df['y'][1:] , order=(0,0), exog=df['y1'][1:] )
arma_res2 = arma_mod2.fit(trend='nc', disp=False)
arma_fc2 = arma_res2.forecast( steps=1, exog=[ y[-1] ] )

clf1 = linear_model.LinearRegression( fit_intercept=False)
fX = [[v] for v in df['y1'][1:].values]
fy = df['y'][1:].values
clf1.fit(fX,fy)
clf_pred = clf1.predict(y[-1])

print( 'ARMA(1,0) coef : {0:0.4f}'.format( arma_res1.params[0] ) )
print( 'ARMA(0,0) coef : {0:0.4f}'.format( arma_res2.params[0] ) )
print( 'LinReg coef1 : {0:0.4f}'.format( clf1.coef_[0]) )

print( 'ARMA(1,0) FC : {0:0.4f}'.format( arma_fc1[0] ) )
print( 'ARMA(0,0) FC : {0:0.4f}'.format( arma_fc2[0][0] ) )
print( 'LinReg Pred : {0:0.4f}'.format( clf_pred[0] ) )

What am I supposed to be looking at here?

Also note that we do not support an ARMA(0,0) model, as I mentioned before.
I'm going to change the code to raise an error on this.

@jsphon
Contributor
jsphon commented Oct 26, 2013

The first 3 lines of output from the above are:

ARMA(1,0) coef : 0.5417
ARMA(0,0) coef : 0.5417
LinReg coef1 : 0.5417

That shows that the 3 models have fitted the same parameter. Therefore they should give the same forecasts.

The next 3 lines are:

ARMA(1,0) FC : 0.3561
ARMA(0,0) FC : -0.0961
LinReg Pred : 0.3561

That shows that the ARMA(1,0) and Linear Regression models agree with each other. But the ARMA(0,0) doesn't agree, even though it has the same parameter.

To see where the code goes wrong, insert a break point just after line 247 of arima_model.py which is

endog, resid, mu = _get_predict_out_of_sample(endog, p, q, k_trend, k_exog,
                                               start, errors, trendparam,
                                               exparams, arparams,
                                               maparams, steps, method,
                                               exog)

The example code will call the above line twice. The first time from the arma(1,0) model:

arma_fc1 = arma_res1.forecast( steps=1 )

The 2nd time from

arma_fc2 = arma_res2.forecast( steps=1, exog=[ y[-1] ] )

In the first time, it correctly returns mu, of length 1. As it is the mu of the first out of sample forecast.

The second time, it returns mu of length 20, even though it only needs 1 mu for the first out of sample step. mu[0] is the mean for the first in-sample prediction. I believe that is where the problem is, as it should not be using the mu for an insample period.

@jseabold
Member

Yes, don't fit an ARMA(0,0) model. An ARMA(0,0) model isn't an ARMA model, and it's not supported, so I wouldn't expect it to do the right thing.

@jsphon
Contributor
jsphon commented Oct 26, 2013

OK. So an ARMA(0,0) model is not supported.

However, my example shows that any other ARMA model with exogenous variables will probably not work either.

I might be able to stick together an example of an ARMA(1,0) model with 1 exogenous variable not working a bit later, but have run out of development time this weekend. ;{

@jseabold
Member

How does your example show that? The reason that mu is of length 20 in your (0,0) example is because the code is written assuming that p and q are not both zero.

Other ARMA models with exogenous variables are tested, so I'm fairly certain they work fine, but if you show me an example that's broken I'll fix it.

@jsphon
Contributor
jsphon commented Oct 26, 2013

Ah. OK. I did not step back far enough in the code to see what p and q do. I will do some tinkering next week and hope that I am incorrect.

Thanks for all your patience! Sorry for asking all these awkward question!

@jseabold
Member

No it's great that you're banging on this. I want it to be robust. I found bugs in other software while I was writing this code, because no one had gone through the corner cases.

@jseabold jseabold closed this in 71a312a Oct 27, 2013
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@jseabold jseabold BUG: Allow forecasting with trend=nc. Closes #1123. db70060
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@jseabold jseabold ENH: Raise on 0,0 order models in AR(I)MA. Closes #1123 2bb21e1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment