-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: avoid silently dropping const column #5802
Conversation
avoid unrelated cleanup changes and renamings in bug fix PRs It's a pain to have to check what the actual substantive changes are and to see whether those make sense |
statsmodels/tsa/ar_model.py
Outdated
try: | ||
X = add_trend(X, prepend=True, trend=trend, | ||
has_constant="raise") | ||
except ValueError as err: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of catching and reraising to add to the exception message, we could just add an extra message as a keyword options.
only "Try specifying trend='nc'"
is not generic, and the message can be issued by add_trend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not correct. The data doesn't actually have a constant column, it is just locally constant, and add_trend is doing the wrong thing. IMO the correct thing to do is to always add exactly the same trend that was in the model, so has_constant='add'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add an extra message as a keyword options.
Please no. Expanding the API of add_trend for a single use just adds complexity to that function with nearly no gain. IMO @jbrockmendel 's solution is the most standard Pythonic way to swap an error message (although I think the approach is not correct)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many standard Pythonic ways that we avoid.
Reraising an exception just to add a few words to the exception message would be too much of an extra layer IMO.
We have control over both functions, while the "standard Pythonic way" is useful if we don't have control over the raising function.
(By the same idea we could catch a few hundred numpy, pandas and scipy exceptions and add more context information.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data doesn't actually have a constant column, it is just locally constant
Fair enough; notwithstanding the other issues, the message could be made more accurate.
Please no. Expanding the API of add_trend for a single use just adds complexity to that function with nearly no gain
Agreed.
IMO the correct thing to do is to always add exactly the same trend that was in the model, so has_constant='add'
Just tried this. I expected the fit
call to raise because the RHS had collinear regressors, but it looks like this inherits OLS
's behavior and allows this. So the fit
call goes through, but we end up with an ARResults
object with bse
that's all-NaN
If I tilt my head and squint a bit this behavior kind of makes sense, but I think raising would be more reasonable.
Two other problems with raising the way this PR currently does:
- it could raise from within
select_order
, where the correct behavior would be to recover gracefully and go to a smaller order. - Suggesting
trend="nc"
makes sense of the user passedtrend="c"
, but not if the user passedtrend="ct"
Codecov Report
@@ Coverage Diff @@
## master #5802 +/- ##
==========================================
+ Coverage 82.46% 82.46% +<.01%
==========================================
Files 595 595
Lines 93776 93796 +20
Branches 10353 10353
==========================================
+ Hits 77333 77353 +20
Misses 14046 14046
Partials 2397 2397
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #5802 +/- ##
==========================================
+ Coverage 82.48% 82.48% +<.01%
==========================================
Files 597 597
Lines 94088 94103 +15
Branches 10402 10402
==========================================
+ Hits 77607 77622 +15
+ Misses 14062 14061 -1
- Partials 2419 2420 +1
Continue to review full report at Codecov.
|
There is no need to wait to start using good practices, like not adding an
input to a function to handle a single case that can be directly fixed in
the calling code.
…On Tue, May 28, 2019, 18:55 codecov[bot] ***@***.***> wrote:
Codecov
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=h1>
Report
Merging #5802
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=desc>
into master
<https://codecov.io/gh/statsmodels/statsmodels/commit/321619e4b930b74696164b008b25344fd78fc98a?src=pr&el=desc>
will *increase* coverage by <.01%.
The diff coverage is 100%.
[image: Impacted file tree graph]
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=tree>
@@ Coverage Diff @@
## master #5802 +/- ##
==========================================
+ Coverage 82.46% 82.46% +<.01%
==========================================
Files 595 595
Lines 93776 93796 +20
Branches 10353 10353
==========================================
+ Hits 77333 77353 +20
Misses 14046 14046
Partials 2397 2397
Impacted Files
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=tree> Coverage
Δ
statsmodels/tsa/ar_model.py
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802/diff?src=pr&el=tree#diff-c3RhdHNtb2RlbHMvdHNhL2FyX21vZGVsLnB5> 91.39%
<100%> (+0.12%) ⬆️
statsmodels/tsa/tsatools.py
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802/diff?src=pr&el=tree#diff-c3RhdHNtb2RlbHMvdHNhL3RzYXRvb2xzLnB5> 89.9%
<100%> (+0.09%) ⬆️
statsmodels/tsa/tests/test_ar.py
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802/diff?src=pr&el=tree#diff-c3RhdHNtb2RlbHMvdHNhL3Rlc3RzL3Rlc3RfYXIucHk=> 100%
<100%> (ø) ⬆️
------------------------------
Continue to review full report at Codecov
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=continue>
.
*Legend* - Click here to learn more
<https://docs.codecov.io/docs/codecov-delta>
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=footer>.
Last update 321619e...8b07929
<https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=lastupdated>.
Read the comment docs <https://docs.codecov.io/docs/pull-request-comments>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5802?email_source=notifications&email_token=ABKTSRMAUA6QU2PO3KAUOXDPXVW2JA5CNFSM4HQFC2T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWM5X6Q#issuecomment-496622586>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKTSRJMZO7ZNYKAIN5TCP3PXVW2JANCNFSM4HQFC2TQ>
.
|
Changed following @bashage's suggestion. Why the warning isn't showing up in a windows py36 build is a mystery to me. |
I just looked at this and this fix is not the actually bug. This is the bug: maxlag = int(round(12*(nobs/100.)**(1/4.))) This maxlag is wrong for such a short time series. This formualtion is trying to estimate a model with 3 observations and 7 lags and 1 constant. Makes no sense. This rule doesn't work until you get a moderate number of observations. Below is the correct formula for this problem. I fixed a related issue in adffuller in b1cf266. maxlag = min(int(round(12*(nobs/100.)**(1/4.))),
self.endog.shape[0] // 2 - (trend == 'c')) Every expression like |
The min should also have a check that maxlag > 0, and raise if it 0 or negative. |
I think it's fair to claim that this module has multiple bugs. The behavior this PR currently changes is that k_trend is silently getting changed in some cases. Would changing the default maxlags fix that? w/r/t the default maxlag formula, slight variations of that expression show up in a bunch of places. There should be one canonical place, with a docstring telling the reader about Gwert (1989) |
Probably multiple bugs. Having looked at it, I think the fixes are
|
@josef-pkt are you on board with Kevin's suggestion here? |
@josef-pkt gentle ping; are you on board for the proposed plan of action for this bugfix? |
@bashtage I'm increasingly thinking the default |
Ideallly the max should be fixed since sm is computing this incorrectly. The function should also raise ValueError if user requested lag is impossible. |
Edited to raise instead of adding the column and silently ignoring the colinearity.
I agree, and will make a PR to fix this (and collect the implementations in exactly one place) before long. |
NumPy's guide.