New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Statespace: Add unobserved components. #2432

Merged
merged 60 commits into from Sep 23, 2015

Conversation

Projects
None yet
3 participants
@ChadFulton
Member

ChadFulton commented Jun 2, 2015

PR corresponding to #2416.

This is a pretty good first draft, but it needs work in the following areas:

  • Example notebooks
  • Improve starting parameter estimation
  • Unit tests
  • Code cleanup / style improvements
  • Improve model specification if possible (e.g. allow string names for the trend component as described in the table in #2416)
  • Add results class with result options specific to the UC model framework (i.e. syntactic sugar for getting the level, trend, etc. components from the filtered_states matrix)
  • In Summary: change overall title to "Unobserved Components Results" so that the "Model:" line can be more specific (e.g. Deterministic level + Cycle, etc)

Here's a notebook showing an example usage (the Nile models considered there are from http://www.jstatsoft.org/v41/i02/paper):

http://nbviewer.ipython.org/gist/ChadFulton/9ec4aaa13841bbdbe4df/

@coveralls

This comment has been minimized.

coveralls commented Jun 2, 2015

Coverage Status

Coverage remained the same at 83.91% when pulling 1f921ea on ChadFulton:uc-models into 4b55fa4 on statsmodels:master.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 3, 2015

Notebook looks very nice. I haven't looked at the PR itself.

Is it possible to add an outlier dummy for the year 1913?
General question: Why is there little persistence in the filtered level? Break in level is abrupt, and the 1913 outlier doesn't seem to distort neighboring years.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 3, 2015

Nevermind, I shouldn't just look at pretty plots and read some of the code also.

@ChadFulton ChadFulton changed the title from ENH: Added unobserved components class. to ENH: Statespace: Add unobserved components. Jun 3, 2015

@josef-pkt josef-pkt added the comp-tsa label Jun 23, 2015

@josef-pkt josef-pkt modified the milestone: 0.8 Jun 23, 2015

@josef-pkt josef-pkt referenced this pull request Jun 24, 2015

Open

SUMM: statespace additions - GSOC 2015 #2467

3 of 7 tasks complete
@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 24, 2015

Question: in 1502e11 I added specification of the level / trend component via a string name; for example the deterministic trend model (see #2416) can be specified as:

mod = UnobservedComponents(endog, irregular=True, level=True, trend=True)

or as:

mod = UnobservedComponents(endog, 'dtrend')

The question I have is that I used the string specifications that Stata uses. Stata didn't come up with the names themselves, which are pretty standard in the UCM literature, but they did come up with these specific abbreviations.

There are other (maybe better) abbreviations (e.g. I personally like "local_level" more than "llevel"). So can we / should we use the Stata names, or should we use our own (possibly more verbose) names?

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 24, 2015

There are other (maybe better) abbreviations (e.g. I personally like "local_level" more than "llevel"). So can we / should we use the Stata names, or should we use our own (possibly more verbose) names?

Oh, or we could support both.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 24, 2015

Oh, or we could support both.

I think I'll move forward with supporting them both, since there are a couple of specifications for which Stata does not have a name.

I plan to primarily use more verbose names ('llevel' -> 'local_level') but then also accept Stata's names.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 24, 2015

In general I also prefer names that are not easy to misspell. With abbreviations I often have to guess how the word is abbreviated.

question (I haven't looked at the details yet): Does this stay with trend is constant or linear trend or can this be extended to quadratic or higher order polynomial trend?
That's a question whether you want local_constant and local_trend as separate keywords, or similar to ARIMA, local_trend='c' or 'ct'.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 25, 2015

question (I haven't looked at the details yet): Does this stay with trend is constant or linear trend or can this be extended to quadratic or higher order polynomial trend?

Trend has a bit of a different meaning here, so the "level" refers to a possibly time-varying intercept, and the "trend" refers to a possibly time-varying slope, on time. If I think of the intercept as a "degree 1 trend" and the slope as a "degree 2 trend" then it is possible to have higher-order trends (and in fact R's package KFAS allows this), but I've never seen it in practice, and it isn't referenced in the standard texts on UC models.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 25, 2015

If you look at the trend component from #2416, the degree 3 trend would be allowing the degree 2 trend trend (beta) to have a new component in it (lets call it gamma) that evolves according to a random walk. And higher order trends are added in the same way.

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 25, 2015

about #2416 in the table you have (local) trend and stochastic trend separately, but the 2 equations look to me like a stochastic trend. stochastic drift that is a random walk.

which is the deterministic trend and which is the stochastic trend ?
And how is a mean specified?

if beta = 0, then the mean mu is also zero
if beta = constant (sigma_beta = 0), then mu is a random walk with constant drift
if beta = random, then mu is a random walk with random walk drift

???

my initial analogy with local trend that I thought about was local polynomial regression (like in lowess or similar kernel regression)

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 25, 2015

about #2416 in the table you have (local) trend and stochastic trend separately, but the 2 equations look to me like a stochastic trend. stochastic drift that is a random walk.

The distinction is whether the stochastic part is allowed or not. So, for example, if you had a trend but not a stochastic trend, that would mean that zeta_{t+1} = 0 for all t, or another way to put it is that sigma_{zeta}^2 = 0.

And how is a mean specified?

If beta = 0, then mu is not necessarily zero, although it is constant. This is how a mean component can be specified.

For example, take a look at the notebook I provided for the Kalman smoother: http://nbviewer.ipython.org/gist/ChadFulton/c3bc720c2fa31c564844/. If you look at the "Deterministic level + stochastic cycle" model, the smoothed level component is non-zero (although here there's also a regression effect, but nonetheless the mu term is non-zero).


Stata has a nice page in its ucm manual showing the formulas for several of the specifications. See http://www.stata.com/manuals13/tsucm.pdf, page 20

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jun 25, 2015

Ok, I guess I understand the stochastic distinction
stochastic_level=False, stochastic_trend=False specifies whether the sigma is 0 or estimated, if the corresponding level or trend is True.
About the keywords: if I want stochastic_trend=True, do I have to set, level, trend also to true, and also stochastic_level to true if I also want sigma_eta to be nonzero? 4 Trues?

about the mean: if beta = 0, then the mean is equal to the initial condition.
Is the intitial value of mu, mu_0, estimated jointly with the other parameters?

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jun 25, 2015

Ok, I guess I understand the stochastic distinction
stochastic_level=False, stochastic_trend=False specifies whether the sigma is 0 or estimated, if the corresponding level or trend is True.

Yes, that's right. There may be a better way to set this up? But maybe people will mostly just use the keywords anyway.

About the keywords: if I want stochastic_trend=True, do I have to set, level, trend also to true, and also stochastic_level to true if I also want sigma_eta to be nonzero? 4 Trues?

Right now:

  1. If you set stochastic_trend=True, that currently has no effect unless trend=True (and otherwise it "fails" silently)
  2. If you set trend=True with level=False, then it issues a warning "Trend component specified without level component; deterministic level component added.") and sets level=True and stochastic_level=False.
  3. However, you do not need to have stochastic_level=True in order to have a trend or a stochastic trend.

This behavior is not quite consistent, so maybe instead of (1), above, we should have:

1(b). If you set stochastic_trend=True, then it issues a warning "Stochastic trend component specified without trend component; trend component added" and sets trend=True

about the mean: if beta = 0, then the mean is equal to the initial condition.
Is the intitial value of mu, mu_0, estimated jointly with the other parameters?

If beta=0, then it is just an intercept estimated by recursive OLS. If level=True and trend=True but neither are stochastic, then it is an intercept and the slope of a time trend, both estimated by recursive OLS. The initialization is diffuse (mean=0, variance = 1e6) which allows it to pretty quickly pick out the OLS estimate (e.g. the red line settles down relatively quickly in the graph in the notebook I mentioned above).

You could also estimate the intercept (or even the slope) parameter by MLE (which is what we do in SARIMAX, which allows arbitrary time trends), but that is not typically done in the UC models because the stochastic extensions allow time-varying-parameters.

I suppose if you wanted arbitrary (non-time-varying) time-trends, you could add them as regressors in the exog argument.

@ChadFulton ChadFulton referenced this pull request Jun 29, 2015

Closed

SUMM: follow-up state space kalman filter merge #2252

3 of 10 tasks complete

@ChadFulton ChadFulton force-pushed the ChadFulton:uc-models branch from a53688a to 71a1dfc Jul 9, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 10, 2015

I've started adding the results class. Here's a sample of what I currently have the summary looking like:

                               Unobserved Components Results                                
============================================================================================
Dep. Variable:                     US monetary base   No. Observations:                  241
Model:             local linear deterministic trend   Log Likelihood                 786.765
                          + damped stochastic cycle   AIC                          -1563.531
Date:                              Fri, 10 Jul 2015   BIC                          -1546.107
Time:                                      12:00:36   HQIC                         -1556.511
Sample:                                  01-01-1948                                         
                                       - 01-01-2008                                         
Covariance Type:                                opg                                         
====================================================================================
                       coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------
sigma2.irregular  1.196e-12   6.26e-06   1.91e-07      1.000     -1.23e-05  1.23e-05
sigma2.level      7.201e-05   1.66e-05      4.349      0.000      3.96e-05     0.000
sigma2.cycle      5.057e-16   1.94e-05    2.6e-11      1.000     -3.81e-05  3.81e-05
frequency.cycle      0.1309      0.352      0.372      0.710        -0.559     0.821
damping.cycle        0.7866      0.157      5.021      0.000         0.480     1.094
====================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients.

Here I allow multiple lines for the "Model:" specification, because the string specification is potentially very long. Does this seem like a reasonable approach?

@ChadFulton ChadFulton force-pushed the ChadFulton:uc-models branch 2 times, most recently from 401900d to 05e28c1 Jul 14, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 14, 2015

The last commit (tentatively) adds a convenience function for plotting the components of the model. For example, the code:

y = sm.datasets.nile.load_pandas().data
y.index = pd.date_range('1871', '1970', freq='AS')

mod = structural.UnobservedComponents(y['volume'], 'random walk', cycle=True, damped_cycle=True, stochastic_cycle=True)
res = mod.fit(method='powell')
print res.summary()
res.plot_components(figsize=(13,5));

produces the following summary table:

                            Unobserved Components Results                            
=====================================================================================
Dep. Variable:                        volume   No. Observations:                  100
Model:                           random walk   Log Likelihood                -618.646
                   + damped stochastic cycle   AIC                           1245.292
Date:                       Tue, 14 Jul 2015   BIC                           1255.713
Time:                               15:12:07   HQIC                          1249.510
Sample:                           01-01-1871                                         
                                - 01-01-1970                                         
Covariance Type:                         opg                                         
===================================================================================
                      coef    std err          z      P>|z|      [95.0% Conf. Int.]
-----------------------------------------------------------------------------------
sigma2.level      676.0566    483.558      1.398      0.162      -271.699  1623.812
sigma2.cycle     1.702e+04   3053.499      5.574      0.000       1.1e+04   2.3e+04
frequency.cycle     0.5236      1.286      0.407      0.684        -1.997     3.044
damping.cycle       0.2257      0.147      1.533      0.125        -0.063     0.514
===================================================================================

and the following graph:

plot_components

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 14, 2015

I'm still not sure about when adding convenience plotting functions is appropriate, so we can remove this if it is not in the right place.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 15, 2015

Failure due to time-out on coverage run.

@ChadFulton ChadFulton force-pushed the ChadFulton:uc-models branch from 73c86e7 to 85f5bc6 Jul 15, 2015

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 16, 2015

With the last four commits, I believe I have exhausted everything I had planned to do with this model. A decision needs to be made on plot_components, but other than that I "think" it is pretty well ready to go.

@ChadFulton

This comment has been minimized.

Member

ChadFulton commented Jul 16, 2015

There is one other thing I wanted to check on here, actually: right now, as with all state space models, the default optimization method is l_bfgs. The unobserved components models don't always have the best starting parameters though (I think that start_params are actually pretty good a lot of the time even for complicated models with seasonals and cycles, but they're still not nearly generically as good relative to the optimum as something like ARMA).

For that reason, I have found that doing a first round with powell and then a second round with l_bfgs, starting from powell's parameters, does the best job of optimizing.

Is this something I should set up in the fit method, either as a default or an option? Or should I just make a note in the Notes section that sometimes powell can be helpful?

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Jul 17, 2015

about the two method optimization:

We have plans to add this more generally, Skipper has a PR to allow sequences in the optimization method.
My experience so far using "nm" as starting optimizer: nm is pretty slow and needs many iterations. In a large number of cases, either newton or bfgs work well, and using automatically nm for those cases would make it much slower. In those cases, I recommend that users use it if the default fails.

In GLM with the scipy optimizers, we use by default a few steps of IRLS that is controlled by a fit keyword max_start_irls=3, which produces in most cases good staring values for the optimizer.
(IRLS is less general and we need to set the number of iterations with it to zero for penalized estimation.)

Is this something I should set up in the fit method, either as a default or an option? Or should I just make a note in the Notes section that sometimes powell can be helpful?

That depends on the tradeoffs between speed in nice cases and how often a preliminary more robust optimizer is necessary.
How many iterations with powell do you usually need? (I didn't like powell because a few years ago it didn't work very well on Windows, but I haven't tried or used it much since then.)

You could, for example, add a max_start_powell option in fit.

It would also be useful to add a link to a difficult example case in #1649
Stata often uses a two methods optimization, and we still need better defaults in several cases, but so far we have mostly "anecdotal" evidence.

@ChadFulton ChadFulton force-pushed the ChadFulton:uc-models branch from e8a1767 to 277a1af Sep 21, 2015

@josef-pkt

This comment has been minimized.

Member

josef-pkt commented Sep 23, 2015

I didn't see (check) that you have rebased the branch.

Please add a comment, so it sends out a notification.

merging now

Thanks @ChadFulton

josef-pkt added a commit that referenced this pull request Sep 23, 2015

Merge pull request #2432 from ChadFulton/uc-models
ENH: Statespace: Add unobserved components.

@josef-pkt josef-pkt merged commit f307757 into statsmodels:master Sep 23, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.07%) to 84.568%
Details

@ChadFulton ChadFulton deleted the ChadFulton:uc-models branch Jan 16, 2016

@josef-pkt josef-pkt referenced this pull request Feb 9, 2016

Closed

release 0.8 #2176

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment