why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225

ihadanny · 2019-11-04T20:39:21Z

in this commit: https://github.com/statsmodels/statsmodels/commit/d03474da1aae5dac54c2b4441311d01517cd2567 the sarimax model was changed to only warn on non-invertible/stationary start_params and select "0" start params instead, while on ARMA we continue to fail if that happens.

Why? what's the reasoning behind not doing this automatically? will putting zeros instead always lead to fitting a non-invertible/stationary model? or is it bad otherwise?

ChadFulton · 2019-11-05T18:35:51Z

Because the starting parameters estimators are usually consistent estimators of the true parameters (even if not efficient), if they suggest a non-stationary model then that likely indicates problems with the model specification. The original behavior of statsmodels was to raise an error in this case. However, there is nothing wrong with at least trying to fit a model in these cases, so using arbitrary stationary starting parameters (like all zeros) is a valid option.

More recently, there has been a greater emphasis on model selection / automatic forecasting / cross validation type exercises, where a large number of model specifications are evaluated, and the error became cumbersome to work around. Because of this, we modified SARIMAX to only issue a warning, so that these tasks would be easier.

We did not retrofit ARIMA because that model is essentially in a "maintenance-only" state.

(Closing as answered, but feel free to follow up if you have questions or comments).

ihadanny · 2019-11-06T07:52:05Z

thanks for your patience and quick response!!! several followup questions:

once the SARIMAX fit is done, how are you handling a non-invertible/stationary result? do you warn about it? raise an error? are you trying to convert it to a invertible/stationary result by some method?
and what is the code doing if the SARIMAX fit is done and the result is almost non-invertible/stationary? (the roots are close to the unit circle)
ARIMA because that model is essentially in a "maintenance-only" state - so are you recommending that we'll use SARIMAX instead whenever possible? or is there an advantage in continuing to use ARIMA?

ChadFulton · 2019-11-06T15:03:58Z

once the SARIMAX fit is done, how are you handling a non-invertible/stationary result? do you warn about it? raise an error? are you trying to convert it to a invertible/stationary result by some method?

If you have enforce_stationary=True and enforce_invertibility=True (the defaults), then it is not possible to get an non-stationary / non-invertible model.

If you set those equal to False, then we just return the results, whether or not they are stationary / invertible. There's nothing wrong with a non-stationary / non-invertible model.

and what is the code doing if the SARIMAX fit is done and the result is almost non-invertible/stationary? (the roots are close to the unit circle)

We don't do anything special in this case, we just return the results as usual.

ARIMA because that model is essentially in a "maintenance-only" state - so are you recommending that we'll use SARIMAX instead whenever possible? or is there an advantage in continuing to use ARIMA?

It's probably best to use SARIMAX, yes.

ihadanny · 2019-11-06T21:13:32Z

it is not possible to get an non-stationary / non-invertible model

can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure?

There's nothing wrong with a non-stationary / non-invertible model

why do you say so? in https://otexts.com/fpp2/arima-r.html they say that:

Any roots close to the unit circle may be numerically unstable, and the corresponding model will not be good for forecasting

doesn't that mean that non-stationary / non-invertible models are bad? or is it only a problem if the roots are close to the unit root, not if they are far inside/outside the circle? and if it's not a problem, than why did you say about the non-stationarity of the start_params that:

if they suggest a non-stationary model then that likely indicates problems with the model specification

I don't understand why its a problem there but not here.

It's probably best to use SARIMAX, yes

Then maybe it's a good idea that the popular https://github.com/tgsmith61591/pmdarima package use SARIMAX instead of ARIMA...

bashtage · 2019-11-06T21:26:07Z

The log likelihood is only defined for stationary time series when using full MLE. This is how and why the model is restricted to be stationary. Invertibility isn't a strict requirement but helps with point identification. Unstable when need the unit circle means that they are not precisely estimated. This is leads to problems forecasting since forecasts from. 99 and. 975 and 0.9 look very different after a few steps.

…

On Wed, Nov 6, 2019, 21:13 Ido Hadanny ***@***.***> wrote: it is not possible to get an non-stationary / non-invertible model can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure? There's nothing wrong with a non-stationary / non-invertible model why do you say so? in https://otexts.com/fpp2/arima-r.html <http://url> they say that: Any roots close to the unit circle may be numerically unstable, and the corresponding model will not be good for forecasting doesn't that mean that non-stationary / non-invertible models are bad? or is it only a problem if the roots are *close* to the unit root, not if they are far inside/outside the circle? and if it's not a problem, than why did you say about the non-stationarity of the start_params that: if they suggest a non-stationary model then that likely indicates problems with the model specification It's probably best to use SARIMAX, yes Then maybe it's a good idea that the popular https://github.com/tgsmith61591/pmdarima <http://url> package use SARIMAX instead of ARIMA... — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6225?email_source=notifications&email_token=ABKTSRMFE7DD7XCCV6OXHA3QSMXQFA5CNFSM4JIY4XA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIADDA#issuecomment-550502796>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKTSRKRCR4D6V4MOCDIOHTQSMXQFANCNFSM4JIY4XAQ> .

bashtage · 2019-11-06T21:26:51Z

As for pdmarima, you will need to discuss that in the pdmarima tracker since that project is not affiliated with sm. On Wed, Nov 6, 2019, 21:25 Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:

…

The log likelihood is only defined for stationary time series when using full MLE. This is how and why the model is restricted to be stationary. Invertibility isn't a strict requirement but helps with point identification. Unstable when need the unit circle means that they are not precisely estimated. This is leads to problems forecasting since forecasts from. 99 and. 975 and 0.9 look very different after a few steps. On Wed, Nov 6, 2019, 21:13 Ido Hadanny ***@***.***> wrote: > it is not possible to get an non-stationary / non-invertible model > can you please point me to a paper or to the code of how you're doing > that? are you replacing the bad roots with their reciprocals? or is it > another procedure? > > There's nothing wrong with a non-stationary / non-invertible model > why do you say so? in https://otexts.com/fpp2/arima-r.html <http://url> > they say that: > Any roots close to the unit circle may be numerically unstable, and the > corresponding model will not be good for forecasting > doesn't that mean that non-stationary / non-invertible models are bad? or > is it only a problem if the roots are *close* to the unit root, not if > they are far inside/outside the circle? and if it's not a problem, than why > did you say about the non-stationarity of the start_params that: > if they suggest a non-stationary model then that likely indicates > problems with the model specification > > It's probably best to use SARIMAX, yes > Then maybe it's a good idea that the popular > https://github.com/tgsmith61591/pmdarima <http://url> package use > SARIMAX instead of ARIMA... > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#6225?email_source=notifications&email_token=ABKTSRMFE7DD7XCCV6OXHA3QSMXQFA5CNFSM4JIY4XA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIADDA#issuecomment-550502796>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABKTSRKRCR4D6V4MOCDIOHTQSMXQFANCNFSM4JIY4XAQ> > . >

ChadFulton · 2019-11-07T02:25:35Z

can you please point me to a paper or to the code of how you're doing that? are you replacing the bad roots with their reciprocals? or is it another procedure?

We maximize the likelihood function numerically, and we do not consider parameter combinations that would lead to a non-stationary / non-invertible model (as long as enforce_stationary=True and enforce_invertibility=True).

Specifically, this is done by numerically maximizing over an unconstrained parameter space so that these parameters essentially describe partial autocorrelations. Then we convert these unconstrained (partial autocorrelation) parameters into the corresponding autoregressive or moving average components, which will be stationary / invertible by definition. The citation is:

Monahan, John F. 1984. "A Note on Enforcing Stationarity in Autoregressive-moving Average Models." Biometrika 71 (2) (August 1): 403-404.

ihadanny · 2019-11-07T12:18:29Z

thank you very much for these insights, I'd be sure to check out this paper and try my best to understand the technique.
But just to close this logic puzzle, there's one piece that's avoiding me:

When I asked you if non-stationary fit in the start_params method (CSS or some other approximate estimator) are a problem you said it's a big problem, and you even considered throwing an error:

Because the starting parameters estimators are usually consistent estimators of the true parameters (even if not efficient), if they suggest a non-stationary model then that likely indicates problems with the model specification

but when I asked you about non-stationary result of the fancy MLE estimation, you said that its not really a problem:

There's nothing wrong with a non-stationary / non-invertible model.

And you're not at all worried about almost non-stationary results, which can happen and Hyndman (https://otexts.com/fpp2/arima-r.html) suggests never to use:

The auto.arima() function is even stricter, and will not select a model with roots close to the unit circle either

ChadFulton · 2019-11-07T16:29:40Z

The answer to your question is that there are three different issues here:

What is the correct order of integration of a time series
What constitutes a valid model for estimation
What parameter values are numerically stable

For (1): If your data is integrated (non-stationary) then if you select an SARIMAX model that enforces stationarity, you will not be able to recover the true data generating process. What will happen is that the parameter estimates will likely be very close to a non-stationary model, but as I mentioned above, they are constrained to be stationary.

That is why if you select a model that enforces stationarity, but the estimated starting parameters are non-stationary, we issue the warning, so that you know that your model may be inappropriate for the data.

In the page you liked to, Hyndman is describing a procedure to automatically determine an SARIMAX model specification, including the order of integration. He is using a heuristic procedure to do this, and so he has apparently made the choice that it is best to reject models that are very close to being non-stationary (I would guess in favor of an additional application of differencing).

It is different here, though, because SARIMAX requires that you specify the model you want to fit. It then finds the best parameters for the given specification. The closer analogue to SARIMAX in R is Arima() and not auto.arima().

For (2): SARIMAX is estimated by putting the model into state space form, and there is no theoretical problem with non-stationary state space models. The likelihood function is slightly different due to different initializations, but our model class can handle this case with no problem.

For (3): there are various known statistical issues with numerical stability around the boundaries of parameter constraints. We bound our parameters very slightly away from the boundary for this reason. Hyndman's concern, however, appears to me to be not so much about numerical stability as it is about finding a good heuristic method for automatically selecting model orders.

ChadFulton added question comp-tsa labels Nov 5, 2019

ChadFulton closed this as completed Nov 5, 2019

ihadanny mentioned this issue Nov 7, 2019

always use SARIMAX instead of ARIMA, it is deprecated alkaline-ml/pmdarima#211

Closed

tgsmith61591 mentioned this issue Nov 14, 2019

[MRG+1] SARIMAX only alkaline-ml/pmdarima#219

Merged

10 tasks

bashtage added this to the 0.11 milestone Dec 18, 2019

tgsmith61591 mentioned this issue Mar 21, 2020

auto_arima's error_action="ignore" does not work when alternative training methods are specified alkaline-ml/pmdarima#312

Closed

aaronreidsmith mentioned this issue Jul 8, 2020

predict yields array of 0s. alkaline-ml/pmdarima#357

Closed

int-chaos mentioned this issue Oct 22, 2021

Integrate multivariate time series forecasting microsoft/FLAML#254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225

why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225

ihadanny commented Nov 4, 2019

ChadFulton commented Nov 5, 2019

ihadanny commented Nov 6, 2019

ChadFulton commented Nov 6, 2019

ihadanny commented Nov 6, 2019 •

edited

bashtage commented Nov 6, 2019 via email

bashtage commented Nov 6, 2019 via email

ChadFulton commented Nov 7, 2019

ihadanny commented Nov 7, 2019

ChadFulton commented Nov 7, 2019 •

edited

why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225

why is it that in sarimax we only warn for non-invertible/stationary start params and in arima we fail? #6225

Comments

ihadanny commented Nov 4, 2019

ChadFulton commented Nov 5, 2019

ihadanny commented Nov 6, 2019

ChadFulton commented Nov 6, 2019

ihadanny commented Nov 6, 2019 • edited

bashtage commented Nov 6, 2019 via email

bashtage commented Nov 6, 2019 via email

ChadFulton commented Nov 7, 2019

ihadanny commented Nov 7, 2019

ChadFulton commented Nov 7, 2019 • edited

ihadanny commented Nov 6, 2019 •

edited

ChadFulton commented Nov 7, 2019 •

edited