Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Markov-switching regression and autoregression models (non-state space) #2980

Merged
merged 57 commits into from Jun 20, 2016

Conversation

ChadFulton
Copy link
Member

This is a cleaned up / rewritten version of the Markov switching models from my 2013 GSOC. Allows:

  • Markov regression models
  • Markov autoregression models
  • Fitting via EM algorithm as well as by scoring
  • Tests against Stata / Eviews

My thought is to finish developing this alongside @ValeryTyumen who is doing the state space Markov switching models so that we can maintain a constant interface, etc.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0005%) to 85.506% when pulling 988623f on ChadFulton:markov-tsa into e62551c on statsmodels:master.

@josef-pkt
Copy link
Member

partially in response to #2921 (comment)

@ChadFulton @ValeryTyumen Would it help to merge this into master?

Given that this has good test coverage and looks pretty complete, we could merge it, possibly with a Warning that this is currently under refactoring during GSOC.

On the other hand #2921 has another unmerged PR in between.

General policy could be to allow merges of master into a feature branch and remove it again during an interactive rebase before the final review and merge. The main problem for reviewing is that the merged code shows up as part of the PR, AFAIK and makes the change set less useful.

@ChadFulton
Copy link
Member Author

Lots of improvements and bug fixes, along with example notebooks that can be found here:

With reference to the question about merging, this can probably be merged. The only features missing right now are predict and forecast, but otherwise they are pretty complete models.

@ChadFulton
Copy link
Member Author

@ValeryTyumen Sorry about this, but due to a name change I suggested for you, I renamed the transition to regime_transition in these models, so the MarkovSwitchingParams class changed slightly (instead of transition, the transition probability-related parameters should be regime_transition now).

@ChadFulton
Copy link
Member Author

Rebased on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.1%) to 84.386% when pulling 2e5d5b2 on ChadFulton:markov-tsa into e62551c on statsmodels:master.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 95714ca on ChadFulton:markov-tsa into * on statsmodels:master*.

@ChadFulton
Copy link
Member Author

Rebased on master. Probably not ready to be merged in as "stable", although the only thing it is actually missing right now is out-of-sample forecasting.

It could be included in 0.8 if there was a way to indicate that it was still new / unstable.

@josef-pkt
Copy link
Member

If it's safe enough, then it would be good to get it in.

We can add an "experimental" or similar status qualifier to the docstring.
I need to do this also for some other parts. I would also like to add a blanket warning to the release notes that newly merged code is often not stable or not guaranteed to be stable.

@ChadFulton
Copy link
Member Author

I will plan to merge it this weekend then, but I will improve the test coverage first.

# Inherited parameters
param_names = np.array(
markov_regression.MarkovRegression.param_names.fget(self),
dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why an object array and not a list of strings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because below I use indexing with lists of integers (the self.parameters[i, 'autoregressive'])

@josef-pkt
Copy link
Member

This looks fine based on a very very brief skimming. I leave the details to you and @ValeryTyumen

You refer to EViews in some of the unit tests, but as far as I could see, there is no origin or source for the reference numbers mentioned in other tests. Can you add some comment, so we can find it again later?

@josef-pkt josef-pkt added this to the 0.8 milestone Jun 18, 2016
@josef-pkt
Copy link
Member

This errors on appveyor but not on TravisCI

several files in tests/results cannot be found, e.g.
OSError: File b'C:\\projects\\statsmodels\\statsmodels\\tsa\\regime_switching\\tests\\results\\mar_filardo.csv' does not exist

(BTW: current master has the failing imputation test disabled, and PRs should run green after rebase.)

@coveralls
Copy link

coveralls commented Jun 18, 2016

Coverage Status

Changes Unknown when pulling 404f065 on ChadFulton:markov-tsa into * on statsmodels:master*.

@coveralls
Copy link

coveralls commented Jun 18, 2016

Coverage Status

Coverage decreased (-1.2%) to 84.302% when pulling 1d7fab8 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jun 18, 2016

Coverage Status

Coverage decreased (-1.2%) to 84.302% when pulling 1d7fab8 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@ChadFulton
Copy link
Member Author

Looks green to me, but I can't figure out how to interpret the coverage details; it doesn't seem like the coverage report has anything to do with the changes in this PR.

@josef-pkt
Copy link
Member

@ChadFulton add a module __init__.py to the test directory.

I think TravisCI is not running any tests here.
clues:
when the csv files were missing, then only appveyor errored but not travis
comparing Ran 4667 tests which is the same as a previous run in master https://travis-ci.org/statsmodels/statsmodels/jobs/138652624

I don't know why appveyor and travis behave differently with respect to this.

@ChadFulton
Copy link
Member Author

Well that would make sense. Thanks for catching that!

@josef-pkt
Copy link
Member

I don't see why now travis complains about missing results/xxx.csv files. Those should be automatically included as datafiles in the source distribution (all csv, txt and dta files in results directories)
https://travis-ci.org/statsmodels/statsmodels/jobs/138804778

@josef-pkt
Copy link
Member

@ChadFulton you could try locally setup.py sdist and look at what files are included in the archive file

I just realized that the travis error message has a b in front of the filename. Is pandas or your path composition using binary instead of string for the path/name ?

@classmethod
def setup_class(cls):
path = (current_path + os.sep + 'results' + os.sep +
'results_predict_rgnp.csv')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try using os.path.join instead of explicit composition, which is cleaner anyway

@josef-pkt
Copy link
Member

Since appveyor doesn't have problems, it's os (Linux) specific os.path issue, not a missing file in the sdist

@ChadFulton
Copy link
Member Author

Thanks for debugging this, I haven't run into problems like this before.

@coveralls
Copy link

coveralls commented Jun 20, 2016

Coverage Status

Coverage increased (+0.3%) to 85.78% when pulling 588b5f2 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jun 20, 2016

Coverage Status

Coverage increased (+0.3%) to 85.767% when pulling 3c19b15 on ChadFulton:markov-tsa into 03f2331 on statsmodels:master.

@josef-pkt
Copy link
Member

I don't know why we need the __init__.py also in results. If setup.py is correctly configured, then it shouldn't be necessary.

merging given it's all green

Thanks Chad

@josef-pkt josef-pkt merged commit 7f4786b into statsmodels:master Jun 20, 2016
@ChadFulton ChadFulton deleted the markov-tsa branch July 16, 2016 17:17
@WUhailing
Copy link

predict or forecast are not included in the package? I couldn't deploy neither of them after model fitting.

@ChadFulton
Copy link
Member Author

Only in-sample predictions are available right now. Out-of-sample forecasting has not yet been added, although pull requests adding that functionality would be greatly appreciated.

@WUhailing
Copy link

The one-step forward forecast of time T+1 value is not a problem, but how to forecast the hidden status of the new observation? or any suggestions on deriving hidden status from time T+1 to T' of the new observations y_T+1,t_T+2,...y_T' after building Markov autoregression model on observations y_1,y_2,... y_T?

@ValeryTyumen
Copy link
Contributor

I described and implemented this algorithm in my pull request - #2921 (comment).

@WUhailing
Copy link

Thanks. I have looked at the description, if I understand them correctly, the forecast algo is for unknown regimes and unknown observations, what if the new observations are known, can we still derive the hidden regimes?

@ValeryTyumen
Copy link
Contributor

what if the new observations are known, can we still derive the hidden regimes?

Sounds like a usual Kim filter routine to me.

@WUhailing
Copy link

Ok, I will take a close look at Kim Filter later, still new to this.

@NowanIlfideme
Copy link

Am also interested in out-of-sample forecasting, both conditional on regime/exogenous and with unknown regime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants