New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARIMA fit failing for small set of data due to invalid maxlag #1146
Comments
Can you post some example data that returns successfully in R? |
Given: Forecasted anualy to 2022 in R: |
Apologies for the formatting |
The forecasts in R are not stationary. Did you include a trend? Or, does R not impose stationarity for ARIMA. |
Yeah, I'm wary of a 'successful' completion here. It's why I ask to see data. My guess is that R just dumped something, anything back. That's been my experience in the past. You'll almost never get the same answer across stats packages for a problem like this. There's just not enough information to estimate anything meaningful here.
|
R does impose stationarity but it doesn't impose that what you get back is anything that makes sense IME. |
However, adjusting the We might be able to estimate "reasonably ok" an ARMA(1,1) or ARMA(0,1) in this case. |
gretl and X-12 ARIMA refuse to estimate the model. Stata estimates it (after going through 90 iterations with a lot of switches in the optimization algorithm and backing up) but warns that there is not sufficient data and is unable to calculate any inferential statistics for the constant or the sigma. The parameters it gives are different than R (and are more or less meaningless)
I somewhat recall doing this exercise before and deciding it's better to just refuse to do any estimation here rather than try to give some garbage back. Thoughts? |
I can certainly fix the maxlags issue though. Maybe make it a little more informative, if and when we can't estimate a model. |
Your comments make sense. I am developing a system that was prototyped using the R arima package, and I am seeing how far off we are from the original forecasting logic. I am not sure if returning a series (albeit not very accurate) or giving a better error message makes more sense, but I think that it would really help anyone with a similar problem going forward. Thanks for your prompt response @jseabold @josef-pkt |
@feathj Please report back with any agreement or disagreement that you find out. To your original data, just so we have an idea about use cases: Are you using ARIMA or time series analysis on very short samples? or was this just a test case? There are several places where the very small sample behavior for time series analysis hasn't been checked, because most cases in our background (economics) have at least a hundred or a few thousand observations. |
If I had to bet my money on the outcome of the ARIMA forecast in this case, I would refuse the bet. |
@josef-pkt Unfortunately, some of our datasets are that small. We are working with annual data and some of our sources do not have anything before 2008. Thankfully, human interaction occurs after the fact to fix most of the obvious issues. |
Agreed :) |
See #1149 for a discussion of the behavior I decided on. I'll wait for any comments on that PR. |
This closes #1146 and #1046. 1. We now check to make sure that we have at least one degree of freedom to estimate the problem. If so, then we try the estimation. 1. Most / all of these estimations will return garbage. We have an extra check that we can estimate stationary initial params. Usually we can't in these cases, so the usual error will be raised here asking to set start_params. This should be enough of a warning to the user that this is "odd." If in the small chance the estimation goes through for a model with 5 observations and 1 degree of freedom, it's on the user then to determine things are no good. 2. We now avoid the problem of maxlag >= nobs happening in the call to AR so this avoids the problem of #1046 that also presented itself as part of #1146.
I am not sure if this is expected behavior or not, but ARIMA models which have endog of length less than or equal to 5 will always fail due to maxlag < nobs check.
Maxlag is calculated with the following value if not provided:
Since the call to the parent "fit" method does not explicitly pass the maxlag argument, it is calculated using the formula above and results in this error:
Note that an arima call in "R" with 5 items completes successfully.
The text was updated successfully, but these errors were encountered: