Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: ARMA order select convenience function #1334

Merged
merged 9 commits into from Jan 28, 2014

Conversation

Projects
None yet
3 participants
@jseabold
Copy link
Member

commented Jan 27, 2014

Just a way to quickly aggregate results for consideration. Doesn't do any automatic selection. The start parameters check is kind of a hack and should be fixed with #1301.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Jan 27, 2014

This is a little different than Hannan and Rissannen's (1982) originally proposed method (lated looked at by Box, Jenkins, and Reinsel and Choi). It sounds like their approach is mainly used for computational reasons. Maybe we should consider it? Their approach

  1. Estimate an AR(p_e) series using Yule-Walker with p_e determined by the minimum AIC (BIC) according to RR (BJR/Choi).
  2. Obtain the residuals, e, from this series
  3. Estimate by OLS y ~ L(p)y + L(q)e for all p and q

Thoughts? Ours isn't terribly slow, and will be much faster when we merge the ARIMA speed-up changes.

https://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_arima_sect030.htm

Hyndman and Khandakar are way fancier (auto.arima in the forecast package). I've also written wrappers for AUTOMDL in what is now X-13-arima. To be included when I get the API right and figure out how to handle the configuration for the dependence.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 27, 2014

The advantage is that this can be made very fast, especially if we use QR for the max_p * max_q regressions.

Because it needs a large initial AR(p), my guess is that it only works well if the time series is long enough or persistence is not very high. Maybe more of a complement to the full info MLE which should be more accurate in the smaller sample case.

Do AIC or BIC "unselect" near common roots?
From what I saw recently, I think Hyndman is checking and dropping nearly common roots.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Jan 27, 2014

I'm not sure about the common roots issue or the performance of AIC/BIC more generally in applications and in the literature. I doubt that it unselects these models or people would just use it alone and be done with it rather than the extensive model selection algorithms that are used. This is just a quick way to aggregate IC for a time-series, as far as I'm concerned.

I know AUTOMDL in x-13-arima checks for common roots, allows users to prefer balanced models where p + d == q, etc. I assume auto.arima does too. I'd save all of this that for any kind of duplicate auto.arima function (which this isn't pretending to be). I'm skeptical about re-implementing these algorithms just to have them in Python though. I think wrapping AUTOMDL (or auto.arima) should be enough for almost any use case.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Jan 27, 2014

The SAS documentation notes that a large negative BIC value can be used to diagnose a near perfect fit from using short series.

@jseabold

This comment has been minimized.

Copy link
Member Author

commented Jan 27, 2014

Thinking about just merging this as is now that there's an implementation note in the docs. If people want to talk about bells and whistles or approximation implementations, we can fix it up later.

@coveralls

This comment has been minimized.

Copy link

commented Jan 27, 2014

Coverage Status

Coverage remained the same when pulling ca28977 on jseabold:arma-order-select into 96c410f on statsmodels:master.

for ar in ar_range:
for ma in ma_range:
if ar == 0 and ma == 0:
results[:, ar, ma] = np.nan

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 27, 2014

Member

with Kevin's merge this would by OLS

This comment has been minimized.

Copy link
@jseabold

jseabold Jan 27, 2014

Author Member

Will fix when that's merged.

np.random.seed(2014)
y = arma_generate_sample(arparams, maparams, nobs)
res = arma_order_select_ic(y, ic=['aic', 'bic'], trend='nc')
res = arma_order_select_ic(y, ic='aic', trend='c')

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 27, 2014

Member

I would add a regression test, that just compares the ic table to the current values, to catch future refactoring changes

This comment has been minimized.

Copy link
@jseabold

jseabold Jan 27, 2014

Author Member

That's why we have IC tests?

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 27, 2014

Member

The IC test are to make sure the numbers are correct.
Here I'm just asking for "regression" tests, so we know when or how this function changes, and that it actually produces numbers and not just nans.

This comment has been minimized.

Copy link
@jseabold

jseabold Jan 27, 2014

Author Member

Ah, right. I misread. Hmm, if we change to using approximate estimation methods then we're not going to get the same IC, so I'm not sure how much use any regression tests will be.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 27, 2014

Member

But if we change those, then it's also a change for the user, and they should be made aware of it, and the unit tests should point it out to us.
For the intended usage of this function, the precision doesn't matter much, and only the ranking is really relevant.

(And, regression test work partially as developer documentation, I was going back through the git/bzr history, to figure out what the expected old behavior of HuberScale was, after I started to make some changes. Unfortunately, the history search was unsuccessful. If I had had some regression numbers, I would have chased fewer dead-end guesses.)

This comment has been minimized.

Copy link
@jseabold

jseabold Jan 27, 2014

Author Member

Fine, I'll do it, but I don't think we really gain anything other than more make work right now given that the numbers will not be and aren't expected to be the same on a refactor (I wouldn't even bet on preserved ranking in this case). We have release notes to document changes. We have long github discusions on almost every merge. I'm not sure what a regression test for HuberScale would do when you can also just look at before/after your changes.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Jan 27, 2014

Member

ARMA is your code, and it's up to you.

I've just seen many refactoring victims, especially also in my sandbox code, where I didn't write a unit test because I didn't find a verifying example, and I didn't want to add a regression test.
In HuberScale I was investigating also the previous refactoring from the mad/std_mad change.

It's often unrelated refactoring that causes problems, not when we consciously change one part, and associated unit tests. For example any change in default start_params in ARMA might change the behavior of this function (try except). But it might raise an exception and then the smoke test is enough.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jan 27, 2014

Thinking about just merging this as is now that there's an implementation note in the docs. If people want to talk about bells and whistles or approximation implementations, we can fix it up later.

I agree,
except I prefer a regression test with numbers, so we know what it did when it was written.

jseabold added a commit that referenced this pull request Jan 28, 2014

Merge pull request #1334 from jseabold/arma-order-select
ENH: ARMA order select convenience function

@jseabold jseabold merged commit 034f232 into statsmodels:master Jan 28, 2014

@jseabold jseabold deleted the jseabold:arma-order-select branch Jan 28, 2014

@jseabold jseabold referenced this pull request Feb 25, 2014

Open

ARIMA order selection #1224

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1334 from jseabold/arma-order-select
ENH: ARMA order select convenience function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.