-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225
Comments
Because the starting parameters estimators are usually consistent estimators of the true parameters (even if not efficient), if they suggest a non-stationary model then that likely indicates problems with the model specification. The original behavior of statsmodels was to raise an error in this case. However, there is nothing wrong with at least trying to fit a model in these cases, so using arbitrary stationary starting parameters (like all zeros) is a valid option. More recently, there has been a greater emphasis on model selection / automatic forecasting / cross validation type exercises, where a large number of model specifications are evaluated, and the error became cumbersome to work around. Because of this, we modified SARIMAX to only issue a warning, so that these tasks would be easier. We did not retrofit ARIMA because that model is essentially in a "maintenance-only" state. (Closing as answered, but feel free to follow up if you have questions or comments). |
thanks for your patience and quick response!!! several followup questions:
|
If you have If you set those equal to
We don't do anything special in this case, we just return the results as usual.
It's probably best to use SARIMAX, yes. |
can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure?
why do you say so? in https://otexts.com/fpp2/arima-r.html they say that:
doesn't that mean that non-stationary / non-invertible models are bad? or is it only a problem if the roots are close to the unit root, not if they are far inside/outside the circle? and if it's not a problem, than why did you say about the non-stationarity of the start_params that:
I don't understand why its a problem there but not here.
Then maybe it's a good idea that the popular https://github.com/tgsmith61591/pmdarima package use SARIMAX instead of ARIMA... |
The log likelihood is only defined for stationary time series when using
full MLE. This is how and why the model is restricted to be stationary.
Invertibility isn't a strict requirement but helps with point
identification.
Unstable when need the unit circle means that they are not precisely
estimated. This is leads to problems forecasting since forecasts from. 99
and. 975 and 0.9 look very different after a few steps.
…On Wed, Nov 6, 2019, 21:13 Ido Hadanny ***@***.***> wrote:
it is not possible to get an non-stationary / non-invertible model
can you please point me to a paper or to the code of how you're doing
that? are you replacing the bad roots with their reciprocals? or is it
another procedure?
There's nothing wrong with a non-stationary / non-invertible model
why do you say so? in https://otexts.com/fpp2/arima-r.html <http://url>
they say that:
Any roots close to the unit circle may be numerically unstable, and the
corresponding model will not be good for forecasting
doesn't that mean that non-stationary / non-invertible models are bad? or
is it only a problem if the roots are *close* to the unit root, not if
they are far inside/outside the circle? and if it's not a problem, than why
did you say about the non-stationarity of the start_params that:
if they suggest a non-stationary model then that likely indicates problems
with the model specification
It's probably best to use SARIMAX, yes
Then maybe it's a good idea that the popular
https://github.com/tgsmith61591/pmdarima <http://url> package use SARIMAX
instead of ARIMA...
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6225?email_source=notifications&email_token=ABKTSRMFE7DD7XCCV6OXHA3QSMXQFA5CNFSM4JIY4XA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIADDA#issuecomment-550502796>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKTSRKRCR4D6V4MOCDIOHTQSMXQFANCNFSM4JIY4XAQ>
.
|
As for pdmarima, you will need to discuss that in the pdmarima tracker
since that project is not affiliated with sm.
On Wed, Nov 6, 2019, 21:25 Kevin Sheppard <kevin.k.sheppard@gmail.com>
wrote:
… The log likelihood is only defined for stationary time series when using
full MLE. This is how and why the model is restricted to be stationary.
Invertibility isn't a strict requirement but helps with point
identification.
Unstable when need the unit circle means that they are not precisely
estimated. This is leads to problems forecasting since forecasts from. 99
and. 975 and 0.9 look very different after a few steps.
On Wed, Nov 6, 2019, 21:13 Ido Hadanny ***@***.***> wrote:
> it is not possible to get an non-stationary / non-invertible model
> can you please point me to a paper or to the code of how you're doing
> that? are you replacing the bad roots with their reciprocals? or is it
> another procedure?
>
> There's nothing wrong with a non-stationary / non-invertible model
> why do you say so? in https://otexts.com/fpp2/arima-r.html <http://url>
> they say that:
> Any roots close to the unit circle may be numerically unstable, and the
> corresponding model will not be good for forecasting
> doesn't that mean that non-stationary / non-invertible models are bad? or
> is it only a problem if the roots are *close* to the unit root, not if
> they are far inside/outside the circle? and if it's not a problem, than why
> did you say about the non-stationarity of the start_params that:
> if they suggest a non-stationary model then that likely indicates
> problems with the model specification
>
> It's probably best to use SARIMAX, yes
> Then maybe it's a good idea that the popular
> https://github.com/tgsmith61591/pmdarima <http://url> package use
> SARIMAX instead of ARIMA...
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#6225?email_source=notifications&email_token=ABKTSRMFE7DD7XCCV6OXHA3QSMXQFA5CNFSM4JIY4XA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIADDA#issuecomment-550502796>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABKTSRKRCR4D6V4MOCDIOHTQSMXQFANCNFSM4JIY4XAQ>
> .
>
|
We maximize the likelihood function numerically, and we do not consider parameter combinations that would lead to a non-stationary / non-invertible model (as long as Specifically, this is done by numerically maximizing over an unconstrained parameter space so that these parameters essentially describe partial autocorrelations. Then we convert these unconstrained (partial autocorrelation) parameters into the corresponding autoregressive or moving average components, which will be stationary / invertible by definition. The citation is: Monahan, John F. 1984. "A Note on Enforcing Stationarity in Autoregressive-moving Average Models." Biometrika 71 (2) (August 1): 403-404. |
thank you very much for these insights, I'd be sure to check out this paper and try my best to understand the technique. When I asked you if non-stationary fit in the start_params method (CSS or some other approximate estimator) are a problem you said it's a big problem, and you even considered throwing an error:
but when I asked you about non-stationary result of the fancy MLE estimation, you said that its not really a problem:
And you're not at all worried about almost non-stationary results, which can happen and Hyndman (https://otexts.com/fpp2/arima-r.html) suggests never to use:
|
The answer to your question is that there are three different issues here:
For (1): If your data is integrated (non-stationary) then if you select an SARIMAX model that enforces stationarity, you will not be able to recover the true data generating process. What will happen is that the parameter estimates will likely be very close to a non-stationary model, but as I mentioned above, they are constrained to be stationary. That is why if you select a model that enforces stationarity, but the estimated starting parameters are non-stationary, we issue the warning, so that you know that your model may be inappropriate for the data. In the page you liked to, Hyndman is describing a procedure to automatically determine an SARIMAX model specification, including the order of integration. He is using a heuristic procedure to do this, and so he has apparently made the choice that it is best to reject models that are very close to being non-stationary (I would guess in favor of an additional application of differencing). It is different here, though, because For (2): For (3): there are various known statistical issues with numerical stability around the boundaries of parameter constraints. We bound our parameters very slightly away from the boundary for this reason. Hyndman's concern, however, appears to me to be not so much about numerical stability as it is about finding a good heuristic method for automatically selecting model orders. |
in this commit: https://github.com/statsmodels/statsmodels/commit/d03474da1aae5dac54c2b4441311d01517cd2567 the sarimax model was changed to only warn on non-invertible/stationary start_params and select "0" start params instead, while on ARMA we continue to fail if that happens.
Why? what's the reasoning behind not doing this automatically? will putting zeros instead always lead to fitting a non-invertible/stationary model? or is it bad otherwise?
The text was updated successfully, but these errors were encountered: