New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Statespace: Add multivariate models (VARMAX, Dynamic Factors) #2563

Merged
merged 60 commits into from Oct 5, 2015

Conversation

Projects
None yet
3 participants
@ChadFulton
Member

ChadFulton commented Jul 29, 2015

Right now this is a first draft. It needs at least:

  • Example notebooks
  • Unit tests
  • Specialized results classes?
@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 29, 2015

I guess I should say, it adds two multivariate models: Dynamic factors and VARMAX. A couple of notes:

  • The actual model class of the dynamic factor model is StaticFactors because it is estimating the so-called static form of the dynamic factor model. This seems potentially confusing, but it also seems like we would want to reserve the DynamicFactors class for the dynamic form. (Edit: this was changed in a later commit)
  • VARMAX(p,q) models are not identified without further restrictions, which this does not do. Thus right now it issues a warning if a VARMAX(p,q) is specified (i.e. no warning in either the VARX(p) or a VMAX(q) cases, but a warning is issued if both are specified).
@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jul 29, 2015

Very good, you could send a message to the mailing list when it's ready for users to try out.

@ChadFulton In general, I think the new models should mostly be ready to merge in the second half of August. I'll try to review it then, so far I only looked at the details of unobserved components.

about identification in VARMAX:
Do you have a specific reference for this? I'm not familiar with it.
Is there a set of identifying restrictions that we could add as default?
Is it easy to add different sets of identifying restrictions, either in the model or with subclasses? I guess there might be something like the restrictions on the error correlations or long run behavior as in Structural VAR (SVAR), is there?

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 29, 2015

In general, I think the new models should mostly be ready to merge in the second half of August. I'll try to review it then, so far I only looked at the details of unobserved components.

That sounds good to me. These still have a ways to go, so that will give me time to polish things.

about identification in VARMAX:
Do you have a specific reference for this? I'm not familiar with it.
Is there a set of identifying restrictions that we could add as default?
Is it easy to add different sets of identifying restrictions, either in the model or with subclasses? I guess there might be something like the restrictions on the error correlations or long run behavior as in Structural VAR (SVAR), is there?

I've been using New Introduction to Multiple Time Series Analysis, Lütkepohl (2007) as a reference. Chapter 12 is where he talks about identification in VARMA models. See also http://faculty.chicagobooth.edu/ruey.tsay/teaching/mts/sp2011/lec6-11.pdf (this is from Ruey Tsay, who wrote the MTS package in R which you put up a note about recently).

The identification issue is basically the same as in the univariate case when there is cancellation of elements of, or factors of, the lag polynomial. Common factors in the univariate case would lead to non-identification there too, but it is a rare occurrence.

In the multivariate case, this cancellation is more subtle and problematic. Here is a quote:

"Hence, even if the orders of both [VAR and MA] operators cannot be reduced simultaneously by cancellation, it may still be possible to factor some operator from both A(L) (the VAR lag polynomial) and M(L) (the MA lag polynomial) without changing their general structure." (emphasis added)

The solution is to put the VARMA model in a particular form that guarantees a unique representation. "Echelon form" seems to be preferred (this is one way in which the MTS package allows specification of VARMA models), and the desired version can be specified by users via "Kronecker indices". In this form, the elements of the VAR and MA coefficient matrices are restricted (e.g. there are zeros, the same parameter appearing in multiple elements, etc.).

One of the downsides of this form is that I no longer know of a transformation to ensure stationarity / invertibility to prevent the optimizer from wandering.

These are my initial thoughts, I am not too familiar with these issues yet.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 29, 2015

I guess it is probably simpler to say: the identification problem is that if there no restrictions (like to Echelon form) then there are multiple sets of coefficient matrices that represent the same VARMA process.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 29, 2015

Side note: although Stata advertises VARMA models, in fact it does not have a function varma. What they mean is that you could use their sspace function to write your own VARMA state space model and estimate it. The one example they provide of a VARMA estimation (using sspace) actually uses a restricted form, although they don't discuss that.

Edit: Similarly Eviews (through current version) does not allow VARMA.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Aug 6, 2015

Some notes:

  • The VAR tests are against Stata's dfactor command, since it is the only place Stata estimates a VAR by MLE via Kalman filter.
  • I have skipped the test for the standard errors in the dynamic factor model, because they do not match Stata's. They are as follows:
param Statsmodels Stata
y1.loading 0.001564 0.002006
y2.loading 0.000521 0.0012514
y3.loading 0.000526 0.0012128
y1.sigma2 0.000336 0.0003359
y2.sigma2 0.000020 0.0000184
y3.sigma2 0.000015 0.0000141
phi1.factor 0.130555 0.1196637
phi2.factor 0.129346 0.1218577

I don't know why they are different; they should be calculated exactly the same way in both Statsmodels and Stata as they are for the other state space models, which match (or at least match better than this) (SARIMAX, UnobservedComponents, VAR)

@ChadFulton ChadFulton force-pushed the ChadFulton:multivariate branch from 8970e3c to baba8f1 Aug 6, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Aug 6, 2015

Another note: the multivariate stationarity constraint can be somewhat time-consuming (inner loops with lots of matrix algebra), so I created a Cython version which speeds things up but requires the ?trmm BLAS functions so it's only available for scipy v0.14.0 onwards (see scipy/scipy#2135). So prior to scipy v0.14.0 the functionality exists, but it's slower.

@ChadFulton ChadFulton force-pushed the ChadFulton:multivariate branch 6 times, most recently from dc30b95 to 53ea11d Aug 6, 2015

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Aug 10, 2015

(looking in as cheerleader, I won't be looking at details at least for another week)

The commit messages sound very promising.
It looks like we are getting a great new set of features. Thanks.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Aug 10, 2015

Thanks, I hope so. I'll finish up some results class-related things and then ping the mailing list to maybe get some additional testers. But it's shaping up to be done from my end (until bugs pop up) pretty soon, I think.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Aug 10, 2015

We need some good advertising for this. And, I still recommend that you write an article, so you get a bit more academic credit for it.

@ChadFulton ChadFulton force-pushed the ChadFulton:multivariate branch from 9486282 to 7bc4d68 Aug 10, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Aug 16, 2015

Recent PRs include much-improved dynamic factor model which supports:

  • Dynamic factors
  • Static factors
  • Seemingly unrelated regression

All of these optionally with (vector or not) autoregressive errors and the first two can also accommodate exogenous regressors.

Also I have added two example notebooks:

  • Dynamic factor models - goes through an example of using DFM to construct a coincident index
  • VARMAX - stub notebook that just goes through syntax and shows three short examples. I will wait to do a more involved notebook here until we have more post-estimation results, in particular impulse response functions.

I don't have anything more planned to do in this PR, except for responding to anything that comes up.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Aug 16, 2015

See:

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Aug 16, 2015

I'm going to ping the mailing list now.

@ChadFulton ChadFulton force-pushed the ChadFulton:multivariate branch from d41a28d to 24afab2 Sep 9, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Sep 9, 2015

Rebased on master.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Sep 9, 2015

I just checked coveralls (and fixed the file path again)

varmax.py is at a good 92% coverage, but dynamic_factor has some gaps and is at 82% coverage.
https://coveralls.io/builds/3518370/source?filename=%2Fhome%2Ftravis%2Fminiconda%2Fenvs%2Fstatsmodels-test%2Flib%2Fpython2.7%2Fsite-packages%2Fstatsmodels-0.8.0-py2.7-linux-x86_64.egg%2Fstatsmodels%2Ftsa%2Fstatespace%2Fdynamic_factor.py

This is a huge PR, and I think I won't be able to do more than a very superficial review. Hopefully it will get enough exposure until an 0.8 release so we have time to discover potential issues.
I didn't try to follow much in this PR during GSOC.

Did you check how much overlap in the code and features there is with vector_ar VAR?
Some utility functions like pacf will also be very useful as general tools.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Sep 27, 2015

@ChadFulton Can you rebase again, since now there are several other of your PRs in master.

I'm not quiet and patient enough to review this at the moment, so I would like to just merge it.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Sep 28, 2015

related aside: https://mailman.stat.ethz.ch/pipermail/r-sig-finance/2015q3/013548.html (unsafe https)
a user announces recipes for statespace models in R

@ChadFulton ChadFulton force-pushed the ChadFulton:multivariate branch from 24afab2 to 5fe7f8f Oct 5, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Oct 5, 2015

Sorry this took a while. Rebased against master. Also I added additional coverage tests for dynamic factors, which I think puts the coverage up around 98-99%.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Oct 5, 2015

@ChadFulton Thanks, (I'm still bouncing around across topics, with short attention span for anything.)

I'll merge when Travis is green, and we can send a message to the mailing list for users to try it out and use it, and provide feedback for it.

Do you know of any specific issues for the models in this PR that we should look at post-merge? (besides pandas wrapper code.) Is there a related todo list for future enhancements, independently of whether you will have time and interest for it?

This is the last PR in this series.

BTW: I went over some older PRs and didn't see any PRs from you first GSOC, the nonlinear time series models. Can you open PRs for those, with maybe a brief comment about whether you want to go back to it or whether they are up for grabs?
I was recently remembering the Andrew/Ploberger/Bai type structural change estimation.

josef-pkt added a commit that referenced this pull request Oct 5, 2015

Merge pull request #2563 from ChadFulton/multivariate
ENH: Statespace: Add multivariate models (VARMAX, Dynamic Factors)

@josef-pkt josef-pkt merged commit fcb31d8 into statsmodels:master Oct 5, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.4%) to 84.933%
Details
@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Oct 6, 2015

Are the results from the DynamicFactor.fitcall being wrapped correctly? Working through that notebook

res = mod.fit(initial_res.params)

returns a statsmodels.tsa.statespace.mlemodel.MLEResultsWrapper, not a DynamicFactorResults. Some of the methods in that notebook aren't working because of that. I haven't dug too deeply yet.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Oct 7, 2015

Thanks @TomAugspurger! You're absolutely right, the multivariate models need to override the smooth method to return the right results class. I'll have a PR in a moment.

Thanks for taking a look.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Oct 7, 2015

See #2643 for issue, #2644 for PR.

@ChadFulton ChadFulton deleted the ChadFulton:multivariate branch Jan 16, 2016

@josef-pkt josef-pkt referenced this pull request Feb 9, 2016

Closed

release 0.8 #2176

@josef-pkt josef-pkt added this to the 0.8 milestone Apr 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment