ENH: Statespace: Add multivariate models (VARMAX, Dynamic Factors) #2563
Conversation
I guess I should say, it adds two multivariate models: Dynamic factors and VARMAX. A couple of notes:
|
Very good, you could send a message to the mailing list when it's ready for users to try out. @ChadFulton In general, I think the new models should mostly be ready to merge in the second half of August. I'll try to review it then, so far I only looked at the details of unobserved components. about identification in VARMAX: |
That sounds good to me. These still have a ways to go, so that will give me time to polish things.
I've been using New Introduction to Multiple Time Series Analysis, Lütkepohl (2007) as a reference. Chapter 12 is where he talks about identification in VARMA models. See also http://faculty.chicagobooth.edu/ruey.tsay/teaching/mts/sp2011/lec6-11.pdf (this is from Ruey Tsay, who wrote the MTS package in R which you put up a note about recently). The identification issue is basically the same as in the univariate case when there is cancellation of elements of, or factors of, the lag polynomial. Common factors in the univariate case would lead to non-identification there too, but it is a rare occurrence. In the multivariate case, this cancellation is more subtle and problematic. Here is a quote: "Hence, even if the orders of both [VAR and MA] operators cannot be reduced simultaneously by cancellation, it may still be possible to factor some operator from both A(L) (the VAR lag polynomial) and M(L) (the MA lag polynomial) without changing their general structure." (emphasis added) The solution is to put the VARMA model in a particular form that guarantees a unique representation. "Echelon form" seems to be preferred (this is one way in which the MTS package allows specification of VARMA models), and the desired version can be specified by users via "Kronecker indices". In this form, the elements of the VAR and MA coefficient matrices are restricted (e.g. there are zeros, the same parameter appearing in multiple elements, etc.). One of the downsides of this form is that I no longer know of a transformation to ensure stationarity / invertibility to prevent the optimizer from wandering. These are my initial thoughts, I am not too familiar with these issues yet. |
I guess it is probably simpler to say: the identification problem is that if there no restrictions (like to Echelon form) then there are multiple sets of coefficient matrices that represent the same VARMA process. |
Side note: although Stata advertises VARMA models, in fact it does not have a function Edit: Similarly Eviews (through current version) does not allow VARMA. |
Some notes:
I don't know why they are different; they should be calculated exactly the same way in both Statsmodels and Stata as they are for the other state space models, which match (or at least match better than this) (SARIMAX, UnobservedComponents, VAR) |
Another note: the multivariate stationarity constraint can be somewhat time-consuming (inner loops with lots of matrix algebra), so I created a Cython version which speeds things up but requires the ?trmm BLAS functions so it's only available for scipy v0.14.0 onwards (see scipy/scipy#2135). So prior to scipy v0.14.0 the functionality exists, but it's slower. |
dc30b95
to
53ea11d
(looking in as cheerleader, I won't be looking at details at least for another week) The commit messages sound very promising. |
Thanks, I hope so. I'll finish up some results class-related things and then ping the mailing list to maybe get some additional testers. But it's shaping up to be done from my end (until bugs pop up) pretty soon, I think. |
We need some good advertising for this. And, I still recommend that you write an article, so you get a bit more academic credit for it. |
Recent PRs include much-improved dynamic factor model which supports:
All of these optionally with (vector or not) autoregressive errors and the first two can also accommodate exogenous regressors. Also I have added two example notebooks:
I don't have anything more planned to do in this PR, except for responding to anything that comes up. |
See:
|
I'm going to ping the mailing list now. |
Rebased on master. |
I just checked coveralls (and fixed the file path again) varmax.py is at a good 92% coverage, but dynamic_factor has some gaps and is at 82% coverage. This is a huge PR, and I think I won't be able to do more than a very superficial review. Hopefully it will get enough exposure until an 0.8 release so we have time to discover potential issues. Did you check how much overlap in the code and features there is with vector_ar VAR? |
@ChadFulton Can you rebase again, since now there are several other of your PRs in master. I'm not quiet and patient enough to review this at the moment, so I would like to just merge it. |
related aside: https://mailman.stat.ethz.ch/pipermail/r-sig-finance/2015q3/013548.html (unsafe https) |
Useful when the model is subclassed and separate_params=True.
- Dynamic factors - Static factors - SUR - (vector) autoregressive errors
Sorry this took a while. Rebased against master. Also I added additional coverage tests for dynamic factors, which I think puts the coverage up around 98-99%. |
@ChadFulton Thanks, (I'm still bouncing around across topics, with short attention span for anything.) I'll merge when Travis is green, and we can send a message to the mailing list for users to try it out and use it, and provide feedback for it. Do you know of any specific issues for the models in this PR that we should look at post-merge? (besides pandas wrapper code.) Is there a related todo list for future enhancements, independently of whether you will have time and interest for it? This is the last PR in this series. BTW: I went over some older PRs and didn't see any PRs from you first GSOC, the nonlinear time series models. Can you open PRs for those, with maybe a brief comment about whether you want to go back to it or whether they are up for grabs? |
ENH: Statespace: Add multivariate models (VARMAX, Dynamic Factors)
Are the results from the res = mod.fit(initial_res.params) returns a |
Thanks @TomAugspurger! You're absolutely right, the multivariate models need to override the Thanks for taking a look. |
Right now this is a first draft. It needs at least: