Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: avoid silently dropping const column #5802

Merged
merged 1 commit into from Jun 5, 2019

Conversation

jbrockmendel
Copy link
Contributor

@josef-pkt
Copy link
Member

avoid unrelated cleanup changes and renamings in bug fix PRs

It's a pain to have to check what the actual substantive changes are and to see whether those make sense

try:
X = add_trend(X, prepend=True, trend=trend,
has_constant="raise")
except ValueError as err:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of catching and reraising to add to the exception message, we could just add an extra message as a keyword options.

only "Try specifying trend='nc'" is not generic, and the message can be issued by add_trend

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not correct. The data doesn't actually have a constant column, it is just locally constant, and add_trend is doing the wrong thing. IMO the correct thing to do is to always add exactly the same trend that was in the model, so has_constant='add'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add an extra message as a keyword options.

Please no. Expanding the API of add_trend for a single use just adds complexity to that function with nearly no gain. IMO @jbrockmendel 's solution is the most standard Pythonic way to swap an error message (although I think the approach is not correct)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many standard Pythonic ways that we avoid.
Reraising an exception just to add a few words to the exception message would be too much of an extra layer IMO.
We have control over both functions, while the "standard Pythonic way" is useful if we don't have control over the raising function.

(By the same idea we could catch a few hundred numpy, pandas and scipy exceptions and add more context information.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data doesn't actually have a constant column, it is just locally constant

Fair enough; notwithstanding the other issues, the message could be made more accurate.

Please no. Expanding the API of add_trend for a single use just adds complexity to that function with nearly no gain

Agreed.

IMO the correct thing to do is to always add exactly the same trend that was in the model, so has_constant='add'

Just tried this. I expected the fit call to raise because the RHS had collinear regressors, but it looks like this inherits OLS's behavior and allows this. So the fit call goes through, but we end up with an ARResults object with bse that's all-NaN

If I tilt my head and squint a bit this behavior kind of makes sense, but I think raising would be more reasonable.

Two other problems with raising the way this PR currently does:

  • it could raise from within select_order, where the correct behavior would be to recover gracefully and go to a smaller order.
  • Suggesting trend="nc" makes sense of the user passed trend="c", but not if the user passed trend="ct"

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 84.993% when pulling 8b07929 on jbrockmendel:arpredict into 321619e on statsmodels:master.

@coveralls
Copy link

coveralls commented May 28, 2019

Coverage Status

Coverage increased (+0.02%) to 85.032% when pulling 93a9708 on jbrockmendel:arpredict into 35b6aab on statsmodels:master.

@codecov
Copy link

codecov bot commented May 28, 2019

Codecov Report

Merging #5802 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5802      +/-   ##
==========================================
+ Coverage   82.46%   82.46%   +<.01%     
==========================================
  Files         595      595              
  Lines       93776    93796      +20     
  Branches    10353    10353              
==========================================
+ Hits        77333    77353      +20     
  Misses      14046    14046              
  Partials     2397     2397
Impacted Files Coverage Δ
statsmodels/tsa/ar_model.py 91.39% <100%> (+0.12%) ⬆️
statsmodels/tsa/tsatools.py 89.9% <100%> (+0.09%) ⬆️
statsmodels/tsa/tests/test_ar.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 321619e...8b07929. Read the comment docs.

@codecov
Copy link

codecov bot commented May 28, 2019

Codecov Report

Merging #5802 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5802      +/-   ##
==========================================
+ Coverage   82.48%   82.48%   +<.01%     
==========================================
  Files         597      597              
  Lines       94088    94103      +15     
  Branches    10402    10402              
==========================================
+ Hits        77607    77622      +15     
+ Misses      14062    14061       -1     
- Partials     2419     2420       +1
Impacted Files Coverage Δ
statsmodels/tsa/ar_model.py 91.26% <100%> (ø) ⬆️
statsmodels/tsa/tsatools.py 89.9% <100%> (+0.09%) ⬆️
statsmodels/tsa/tests/test_ar.py 100% <100%> (ø) ⬆️
statsmodels/stats/descriptivestats.py 24.34% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35b6aab...93a9708. Read the comment docs.

@bashtage
Copy link
Member

bashtage commented May 28, 2019 via email

@jbrockmendel
Copy link
Contributor Author

Changed following @bashage's suggestion. Why the warning isn't showing up in a windows py36 build is a mystery to me.

@bashtage
Copy link
Member

I just looked at this and this fix is not the actually bug. This is the bug:

            maxlag = int(round(12*(nobs/100.)**(1/4.)))

This maxlag is wrong for such a short time series. This formualtion is trying to estimate a model with 3 observations and 7 lags and 1 constant. Makes no sense.

This rule doesn't work until you get a moderate number of observations. Below is the correct formula for this problem. I fixed a related issue in adffuller in b1cf266.

            maxlag = min(int(round(12*(nobs/100.)**(1/4.))),
                         self.endog.shape[0] // 2 - (trend == 'c'))

Every expression like int(round(12*(nobs/100.)**(1/4.)) is a bug unless it is wrapped with a min()

@bashtage
Copy link
Member

The min should also have a check that maxlag > 0, and raise if it 0 or negative.

@jbrockmendel
Copy link
Contributor Author

I just looked at this and this fix is not the actually bug. This is the bug:

I think it's fair to claim that this module has multiple bugs. The behavior this PR currently changes is that k_trend is silently getting changed in some cases. Would changing the default maxlags fix that?

w/r/t the default maxlag formula, slight variations of that expression show up in a bunch of places. There should be one canonical place, with a docstring telling the reader about Gwert (1989)

@bashtage
Copy link
Member

Probably multiple bugs. Having looked at it, I think the fixes are

  • fix maxlag
  • Check X for a constant-like column, and raise if one is found, and explain that the values, when lagged, produce a column that has 0 variance and so the model cannot be estiamted when a constant is included

@jbrockmendel
Copy link
Contributor Author

@josef-pkt are you on board with Kevin's suggestion here?

@jbrockmendel
Copy link
Contributor Author

@josef-pkt gentle ping; are you on board for the proposed plan of action for this bugfix?

@jbrockmendel
Copy link
Contributor Author

@bashtage I'm increasingly thinking the default maxlag is orthogonal. After all, the user could just pass maxlag=7 and get the same behavior even if the default is changed.

@bashtage
Copy link
Member

bashtage commented Jun 4, 2019

Ideallly the max should be fixed since sm is computing this incorrectly. The function should also raise ValueError if user requested lag is impossible.

@jbrockmendel
Copy link
Contributor Author

Edited to raise instead of adding the column and silently ignoring the colinearity.

Ideallly the max should be fixed since sm is computing this incorrectly.

I agree, and will make a PR to fix this (and collect the implementations in exactly one place) before long.

@bashtage bashtage merged commit 094d531 into statsmodels:master Jun 5, 2019
@jbrockmendel jbrockmendel deleted the arpredict branch June 5, 2019 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing trend in fitted AR model params? Erroe: fcast = mu + np.dot(arparams, endog[i:i+p])
4 participants