BUG: avoid silently dropping const column #5802

jbrockmendel · 2019-05-28T16:27:10Z

closes Missing trend in fitted AR model params? #5538, closes Erroe: fcast = mu + np.dot(arparams, endog[i:i+p]) #5258
tests added / passed.
code/documentation is well formatted.
properly formatted commit message. See
NumPy's guide.

josef-pkt · 2019-05-28T16:31:29Z

avoid unrelated cleanup changes and renamings in bug fix PRs

It's a pain to have to check what the actual substantive changes are and to see whether those make sense

josef-pkt · 2019-05-28T16:36:22Z

statsmodels/tsa/ar_model.py

+            try:
+                X = add_trend(X, prepend=True, trend=trend,
+                              has_constant="raise")
+            except ValueError as err:


instead of catching and reraising to add to the exception message, we could just add an extra message as a keyword options.

only "Try specifying trend='nc'" is not generic, and the message can be issued by add_trend

I think this is not correct. The data doesn't actually have a constant column, it is just locally constant, and add_trend is doing the wrong thing. IMO the correct thing to do is to always add exactly the same trend that was in the model, so has_constant='add'

just add an extra message as a keyword options.

Please no. Expanding the API of add_trend for a single use just adds complexity to that function with nearly no gain. IMO @jbrockmendel 's solution is the most standard Pythonic way to swap an error message (although I think the approach is not correct)

There are many standard Pythonic ways that we avoid.
Reraising an exception just to add a few words to the exception message would be too much of an extra layer IMO.
We have control over both functions, while the "standard Pythonic way" is useful if we don't have control over the raising function.

(By the same idea we could catch a few hundred numpy, pandas and scipy exceptions and add more context information.)

The data doesn't actually have a constant column, it is just locally constant

Fair enough; notwithstanding the other issues, the message could be made more accurate.

Please no. Expanding the API of add_trend for a single use just adds complexity to that function with nearly no gain

Agreed.

IMO the correct thing to do is to always add exactly the same trend that was in the model, so has_constant='add'

Just tried this. I expected the fit call to raise because the RHS had collinear regressors, but it looks like this inherits OLS's behavior and allows this. So the fit call goes through, but we end up with an ARResults object with bse that's all-NaN

If I tilt my head and squint a bit this behavior kind of makes sense, but I think raising would be more reasonable.

Two other problems with raising the way this PR currently does:

it could raise from within select_order, where the correct behavior would be to recover gracefully and go to a smaller order.

Suggesting trend="nc" makes sense of the user passed trend="c", but not if the user passed trend="ct"

coveralls · 2019-05-28T17:27:19Z

Coverage increased (+0.003%) to 84.993% when pulling 8b07929 on jbrockmendel:arpredict into 321619e on statsmodels:master.

coveralls · 2019-05-28T17:27:20Z

Coverage increased (+0.02%) to 85.032% when pulling 93a9708 on jbrockmendel:arpredict into 35b6aab on statsmodels:master.

codecov · 2019-05-28T17:55:41Z

Codecov Report

Merging #5802 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #5802      +/-   ##
==========================================
+ Coverage   82.46%   82.46%   +<.01%     
==========================================
  Files         595      595              
  Lines       93776    93796      +20     
  Branches    10353    10353              
==========================================
+ Hits        77333    77353      +20     
  Misses      14046    14046              
  Partials     2397     2397

Impacted Files	Coverage Δ
statsmodels/tsa/ar_model.py	`91.39% <100%> (+0.12%)`	⬆️
statsmodels/tsa/tsatools.py	`89.9% <100%> (+0.09%)`	⬆️
statsmodels/tsa/tests/test_ar.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 321619e...8b07929. Read the comment docs.

codecov · 2019-05-28T17:55:42Z

Codecov Report

Merging #5802 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #5802      +/-   ##
==========================================
+ Coverage   82.48%   82.48%   +<.01%     
==========================================
  Files         597      597              
  Lines       94088    94103      +15     
  Branches    10402    10402              
==========================================
+ Hits        77607    77622      +15     
+ Misses      14062    14061       -1     
- Partials     2419     2420       +1

Impacted Files	Coverage Δ
statsmodels/tsa/ar_model.py	`91.26% <100%> (ø)`	⬆️
statsmodels/tsa/tsatools.py	`89.9% <100%> (+0.09%)`	⬆️
statsmodels/tsa/tests/test_ar.py	`100% <100%> (ø)`	⬆️
statsmodels/stats/descriptivestats.py	`24.34% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35b6aab...93a9708. Read the comment docs.

bashtage · 2019-05-28T18:05:53Z

There is no need to wait to start using good practices, like not adding an input to a function to handle a single case that can be directly fixed in the calling code.

…

On Tue, May 28, 2019, 18:55 codecov[bot] ***@***.***> wrote: Codecov <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=h1> Report Merging #5802 <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=desc> into master <https://codecov.io/gh/statsmodels/statsmodels/commit/321619e4b930b74696164b008b25344fd78fc98a?src=pr&el=desc> will *increase* coverage by <.01%. The diff coverage is 100%. [image: Impacted file tree graph] <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=tree> @@ Coverage Diff @@ ## master #5802 +/- ## ========================================== + Coverage 82.46% 82.46% +<.01% ========================================== Files 595 595 Lines 93776 93796 +20 Branches 10353 10353 ========================================== + Hits 77333 77353 +20 Misses 14046 14046 Partials 2397 2397 Impacted Files <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=tree> Coverage Δ statsmodels/tsa/ar_model.py <https://codecov.io/gh/statsmodels/statsmodels/pull/5802/diff?src=pr&el=tree#diff-c3RhdHNtb2RlbHMvdHNhL2FyX21vZGVsLnB5> 91.39% <100%> (+0.12%) ⬆️ statsmodels/tsa/tsatools.py <https://codecov.io/gh/statsmodels/statsmodels/pull/5802/diff?src=pr&el=tree#diff-c3RhdHNtb2RlbHMvdHNhL3RzYXRvb2xzLnB5> 89.9% <100%> (+0.09%) ⬆️ statsmodels/tsa/tests/test_ar.py <https://codecov.io/gh/statsmodels/statsmodels/pull/5802/diff?src=pr&el=tree#diff-c3RhdHNtb2RlbHMvdHNhL3Rlc3RzL3Rlc3RfYXIucHk=> 100% <100%> (ø) ⬆️ ------------------------------ Continue to review full report at Codecov <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=continue> . *Legend* - Click here to learn more <https://docs.codecov.io/docs/codecov-delta> Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=footer>. Last update 321619e...8b07929 <https://codecov.io/gh/statsmodels/statsmodels/pull/5802?src=pr&el=lastupdated>. Read the comment docs <https://docs.codecov.io/docs/pull-request-comments> . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5802?email_source=notifications&email_token=ABKTSRMAUA6QU2PO3KAUOXDPXVW2JA5CNFSM4HQFC2T2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWM5X6Q#issuecomment-496622586>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKTSRJMZO7ZNYKAIN5TCP3PXVW2JANCNFSM4HQFC2TQ> .

jbrockmendel · 2019-05-29T16:35:42Z

Changed following @bashage's suggestion. Why the warning isn't showing up in a windows py36 build is a mystery to me.

bashtage · 2019-05-29T16:55:29Z

I just looked at this and this fix is not the actually bug. This is the bug:

            maxlag = int(round(12*(nobs/100.)**(1/4.)))

This maxlag is wrong for such a short time series. This formualtion is trying to estimate a model with 3 observations and 7 lags and 1 constant. Makes no sense.

This rule doesn't work until you get a moderate number of observations. Below is the correct formula for this problem. I fixed a related issue in adffuller in b1cf266.

            maxlag = min(int(round(12*(nobs/100.)**(1/4.))),
                         self.endog.shape[0] // 2 - (trend == 'c'))

Every expression like int(round(12*(nobs/100.)**(1/4.)) is a bug unless it is wrapped with a min()

bashtage · 2019-05-29T16:56:22Z

The min should also have a check that maxlag > 0, and raise if it 0 or negative.

jbrockmendel · 2019-05-29T17:02:24Z

I just looked at this and this fix is not the actually bug. This is the bug:

I think it's fair to claim that this module has multiple bugs. The behavior this PR currently changes is that k_trend is silently getting changed in some cases. Would changing the default maxlags fix that?

w/r/t the default maxlag formula, slight variations of that expression show up in a bunch of places. There should be one canonical place, with a docstring telling the reader about Gwert (1989)

bashtage · 2019-05-29T17:09:41Z

Probably multiple bugs. Having looked at it, I think the fixes are

fix maxlag
Check X for a constant-like column, and raise if one is found, and explain that the values, when lagged, produce a column that has 0 variance and so the model cannot be estiamted when a constant is included

jbrockmendel · 2019-05-29T17:28:26Z

@josef-pkt are you on board with Kevin's suggestion here?

jbrockmendel · 2019-06-03T20:08:47Z

@josef-pkt gentle ping; are you on board for the proposed plan of action for this bugfix?

jbrockmendel · 2019-06-04T15:14:56Z

@bashtage I'm increasingly thinking the default maxlag is orthogonal. After all, the user could just pass maxlag=7 and get the same behavior even if the default is changed.

bashtage · 2019-06-04T16:58:13Z

Ideallly the max should be fixed since sm is computing this incorrectly. The function should also raise ValueError if user requested lag is impossible.

jbrockmendel · 2019-06-04T19:42:14Z

Edited to raise instead of adding the column and silently ignoring the colinearity.

Ideallly the max should be fixed since sm is computing this incorrectly.

I agree, and will make a PR to fix this (and collect the implementations in exactly one place) before long.

josef-pkt reviewed May 28, 2019

View reviewed changes

jbrockmendel force-pushed the arpredict branch from 8b07929 to 7e5cf5e Compare May 29, 2019 00:09

BUG: avoid silently dropping const column

93a9708

jbrockmendel force-pushed the arpredict branch from 7e5cf5e to 93a9708 Compare June 4, 2019 19:41

bashtage merged commit 094d531 into statsmodels:master Jun 5, 2019

bashtage added the type-bug label Jun 5, 2019

jbrockmendel deleted the arpredict branch June 5, 2019 16:29

jbrockmendel mentioned this pull request Jun 6, 2019

BUG: ARIMA fit with trend and constant exog #5833

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: avoid silently dropping const column #5802

BUG: avoid silently dropping const column #5802

jbrockmendel commented May 28, 2019

josef-pkt commented May 28, 2019

josef-pkt May 28, 2019

bashtage May 28, 2019

bashtage May 28, 2019

josef-pkt May 28, 2019

jbrockmendel May 28, 2019

coveralls commented May 28, 2019

coveralls commented May 28, 2019 •

edited

codecov bot commented May 28, 2019

codecov bot commented May 28, 2019 •

edited

bashtage commented May 28, 2019 via email

jbrockmendel commented May 29, 2019

bashtage commented May 29, 2019

bashtage commented May 29, 2019

jbrockmendel commented May 29, 2019

bashtage commented May 29, 2019

jbrockmendel commented May 29, 2019

jbrockmendel commented Jun 3, 2019

jbrockmendel commented Jun 4, 2019

bashtage commented Jun 4, 2019

jbrockmendel commented Jun 4, 2019

BUG: avoid silently dropping const column #5802

BUG: avoid silently dropping const column #5802

Conversation

jbrockmendel commented May 28, 2019

josef-pkt commented May 28, 2019

josef-pkt May 28, 2019

Choose a reason for hiding this comment

bashtage May 28, 2019

Choose a reason for hiding this comment

bashtage May 28, 2019

Choose a reason for hiding this comment

josef-pkt May 28, 2019

Choose a reason for hiding this comment

jbrockmendel May 28, 2019

Choose a reason for hiding this comment

coveralls commented May 28, 2019

coveralls commented May 28, 2019 • edited

codecov bot commented May 28, 2019

Codecov Report

codecov bot commented May 28, 2019 • edited

Codecov Report

bashtage commented May 28, 2019 via email

jbrockmendel commented May 29, 2019

bashtage commented May 29, 2019

bashtage commented May 29, 2019

jbrockmendel commented May 29, 2019

bashtage commented May 29, 2019

jbrockmendel commented May 29, 2019

jbrockmendel commented Jun 3, 2019

jbrockmendel commented Jun 4, 2019

bashtage commented Jun 4, 2019

jbrockmendel commented Jun 4, 2019

coveralls commented May 28, 2019 •

edited

codecov bot commented May 28, 2019 •

edited