Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: ARIMAX versus Stata #2474

Open
josef-pkt opened this issue Jun 27, 2015 · 2 comments
Open

FAQ: ARIMAX versus Stata #2474

josef-pkt opened this issue Jun 27, 2015 · 2 comments

Comments

@josef-pkt
Copy link
Member

When estimating an ARIMAX(p, 1, q), Stata differences also the exog variables. The statsmodels version does not difference the exog.

In order to replicate the behavior of Stata, we need to diff the exog ourselves.
Note: When we diff, then we need to preserve the initial observation which is nan, since it will be truncated during estimation.

Numpy np.diff drops the invalid initial observation. pandas DataFrame diff keeps the initial observation as missing.

The following replicates the Stata results. (I'm using ndarrays in the model, but the same should work with pandas.

mod111 = sm.tsa.ARIMA(np.asarray(data_sample['loginv']), (1,1,1), 
                   #exog=np.asarray(data_sample[['loggdp', 'logcons']]))   # exog in levels
                   exog=np.asarray(data_sample[['loggdp', 'logcons']].diff()))

res111 = mod111.fit(disp=1, solver='bfgs', maxiter=5000)
exog_full_d = data[['loggdp', 'logcons']].diff()
res111.predict(start=197, end=202, exog=exog_full_d.values[197:])
@josef-pkt
Copy link
Member Author

@jseabold @ChadFulton Did you run into this before?

@jseabold
Copy link
Member

jseabold commented Jun 27, 2015 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants