New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Markov-switching regression and autoregression models (non-state space) #2980

Merged
merged 57 commits into from Jun 20, 2016

Conversation

Projects
None yet
6 participants
@ChadFulton
Member

ChadFulton commented Jun 1, 2016

This is a cleaned up / rewritten version of the Markov switching models from my 2013 GSOC. Allows:

  • Markov regression models
  • Markov autoregression models
  • Fitting via EM algorithm as well as by scoring
  • Tests against Stata / Eviews

My thought is to finish developing this alongside @ValeryTyumen who is doing the state space Markov switching models so that we can maintain a constant interface, etc.

@coveralls

This comment has been minimized.

coveralls commented Jun 1, 2016

Coverage Status

Coverage increased (+0.0005%) to 85.506% when pulling 988623f on ChadFulton:markov-tsa into e62551c on statsmodels:master.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 4, 2016

partially in response to #2921 (comment)

@ChadFulton @ValeryTyumen Would it help to merge this into master?

Given that this has good test coverage and looks pretty complete, we could merge it, possibly with a Warning that this is currently under refactoring during GSOC.

On the other hand #2921 has another unmerged PR in between.

General policy could be to allow merges of master into a feature branch and remove it again during an interactive rebase before the final review and merge. The main problem for reviewing is that the merged code shows up as part of the PR, AFAIK and makes the change set less useful.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 7, 2016

Lots of improvements and bug fixes, along with example notebooks that can be found here:

With reference to the question about merging, this can probably be merged. The only features missing right now are predict and forecast, but otherwise they are pretty complete models.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 7, 2016

@ValeryTyumen Sorry about this, but due to a name change I suggested for you, I renamed the transition to regime_transition in these models, so the MarkovSwitchingParams class changed slightly (instead of transition, the transition probability-related parameters should be regime_transition now).

@ChadFulton ChadFulton force-pushed the ChadFulton:markov-tsa branch from 2e5d5b2 to 95714ca Jun 7, 2016

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 7, 2016

Rebased on master.

@coveralls

This comment has been minimized.

coveralls commented Jun 7, 2016

Coverage Status

Coverage decreased (-1.1%) to 84.386% when pulling 2e5d5b2 on ChadFulton:markov-tsa into e62551c on statsmodels:master.

@coveralls

This comment has been minimized.

coveralls commented Jun 7, 2016

Coverage Status

Changes Unknown when pulling 95714ca on ChadFulton:markov-tsa into * on statsmodels:master*.

@ChadFulton ChadFulton force-pushed the ChadFulton:markov-tsa branch from 95714ca to 46f1acb Jun 18, 2016

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 18, 2016

Rebased on master. Probably not ready to be merged in as "stable", although the only thing it is actually missing right now is out-of-sample forecasting.

It could be included in 0.8 if there was a way to indicate that it was still new / unstable.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 18, 2016

If it's safe enough, then it would be good to get it in.

We can add an "experimental" or similar status qualifier to the docstring.
I need to do this also for some other parts. I would also like to add a blanket warning to the release notes that newly merged code is often not stable or not guaranteed to be stable.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 18, 2016

I will plan to merge it this weekend then, but I will improve the test coverage first.

# Inherited parameters
param_names = np.array(
markov_regression.MarkovRegression.param_names.fget(self),
dtype=object)

This comment has been minimized.

@josef-pkt

josef-pkt Jun 18, 2016

Member

why an object array and not a list of strings?

This comment has been minimized.

@ChadFulton

ChadFulton Jun 18, 2016

Member

It's because below I use indexing with lists of integers (the self.parameters[i, 'autoregressive'])

attributes = ['regime_transition', 'initial_probabilities',
'conditional_likelihoods',
'predicted_joint_probabilities',

This comment has been minimized.

@josef-pkt

josef-pkt Jun 18, 2016

Member

we use tvalues in general even if the distribution is normal (use_t=False)

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 18, 2016

This looks fine based on a very very brief skimming. I leave the details to you and @ValeryTyumen

You refer to EViews in some of the unit tests, but as far as I could see, there is no origin or source for the reference numbers mentioned in other tests. Can you add some comment, so we can find it again later?

@josef-pkt josef-pkt added this to the 0.8 milestone Jun 18, 2016

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 18, 2016

This errors on appveyor but not on TravisCI

several files in tests/results cannot be found, e.g.
OSError: File b'C:\\projects\\statsmodels\\statsmodels\\tsa\\regime_switching\\tests\\results\\mar_filardo.csv' does not exist

(BTW: current master has the failing imputation test disabled, and PRs should run green after rebase.)

@coveralls

This comment has been minimized.

coveralls commented Jun 18, 2016

Coverage Status

Changes Unknown when pulling 404f065 on ChadFulton:markov-tsa into * on statsmodels:master*.

@coveralls

This comment has been minimized.

coveralls commented Jun 18, 2016

Coverage Status

Coverage decreased (-1.2%) to 84.302% when pulling 1d7fab8 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@coveralls

This comment has been minimized.

coveralls commented Jun 18, 2016

Coverage Status

Coverage decreased (-1.2%) to 84.302% when pulling 1d7fab8 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 19, 2016

Looks green to me, but I can't figure out how to interpret the coverage details; it doesn't seem like the coverage report has anything to do with the changes in this PR.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 19, 2016

@ChadFulton add a module __init__.py to the test directory.

I think TravisCI is not running any tests here.
clues:
when the csv files were missing, then only appveyor errored but not travis
comparing Ran 4667 tests which is the same as a previous run in master https://travis-ci.org/statsmodels/statsmodels/jobs/138652624

I don't know why appveyor and travis behave differently with respect to this.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 20, 2016

Well that would make sense. Thanks for catching that!

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 20, 2016

I don't see why now travis complains about missing results/xxx.csv files. Those should be automatically included as datafiles in the source distribution (all csv, txt and dta files in results directories)
https://travis-ci.org/statsmodels/statsmodels/jobs/138804778

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 20, 2016

@ChadFulton you could try locally setup.py sdist and look at what files are included in the archive file

I just realized that the travis error message has a b in front of the filename. Is pandas or your path composition using binary instead of string for the path/name ?

@classmethod
def setup_class(cls):
path = (current_path + os.sep + 'results' + os.sep +
'results_predict_rgnp.csv')

This comment has been minimized.

@josef-pkt

josef-pkt Jun 20, 2016

Member

try using os.path.join instead of explicit composition, which is cleaner anyway

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 20, 2016

Since appveyor doesn't have problems, it's os (Linux) specific os.path issue, not a missing file in the sdist

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 20, 2016

Thanks for debugging this, I haven't run into problems like this before.

@coveralls

This comment has been minimized.

coveralls commented Jun 20, 2016

Coverage Status

Coverage increased (+0.3%) to 85.78% when pulling 588b5f2 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@coveralls

This comment has been minimized.

coveralls commented Jun 20, 2016

Coverage Status

Coverage increased (+0.3%) to 85.767% when pulling 3c19b15 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 20, 2016

I don't know why we need the __init__.py also in results. If setup.py is correctly configured, then it shouldn't be necessary.

merging given it's all green

Thanks Chad

@josef-pkt josef-pkt merged commit 7f4786b into statsmodels:master Jun 20, 2016

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.3%) to 85.767%
Details

@ChadFulton ChadFulton deleted the ChadFulton:markov-tsa branch Jul 16, 2016

@WUhailing

This comment has been minimized.

WUhailing commented May 9, 2017

predict or forecast are not included in the package? I couldn't deploy neither of them after model fitting.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented May 9, 2017

Only in-sample predictions are available right now. Out-of-sample forecasting has not yet been added, although pull requests adding that functionality would be greatly appreciated.

@WUhailing

This comment has been minimized.

WUhailing commented May 9, 2017

The one-step forward forecast of time T+1 value is not a problem, but how to forecast the hidden status of the new observation? or any suggestions on deriving hidden status from time T+1 to T' of the new observations y_T+1,t_T+2,...y_T' after building Markov autoregression model on observations y_1,y_2,... y_T?

@ValeryTyumen

This comment has been minimized.

Contributor

ValeryTyumen commented May 9, 2017

I described and implemented this algorithm in my pull request - #2921 (comment).

@WUhailing

This comment has been minimized.

WUhailing commented May 9, 2017

Thanks. I have looked at the description, if I understand them correctly, the forecast algo is for unknown regimes and unknown observations, what if the new observations are known, can we still derive the hidden regimes?

@ValeryTyumen

This comment has been minimized.

Contributor

ValeryTyumen commented May 9, 2017

what if the new observations are known, can we still derive the hidden regimes?

Sounds like a usual Kim filter routine to me.

@WUhailing

This comment has been minimized.

WUhailing commented May 9, 2017

Ok, I will take a close look at Kim Filter later, still new to this.

@NowanIlfideme

This comment has been minimized.

NowanIlfideme commented Feb 16, 2018

Am also interested in out-of-sample forecasting, both conditional on regime/exogenous and with unknown regime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment