Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/ENH: major upgrades to MixedLM including variance components #2363

Merged
merged 28 commits into from Jul 10, 2015

Conversation

kshedden
Copy link
Contributor

This PR is a major overhaul of MixedLM. The main feature added here is support for a second type of random effect that we call a "variance component". This allows for many types of more complex models involving, for example, nested random effects that could not be fit before. The PR also includes many optimizations that should lead to substantially faster performance than what we had before.

There are likely still some issues to be resolved so I marked it WIP, but it has been pretty thoroughly tested. We may also need to discuss terminology and naming conventions. The design matrices for variance components are passed to init in a form that does not match a pattern used in any other model. I don't see an obviously better choice but it is definitely work thinking about this some more before committing to something. Using formulas is much easier than using the traditional "array interface".

This supercede #2124 which I will close.

@kshedden
Copy link
Contributor Author

Here is a notebook illustrating the use of the new type of random effect:

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/bqw1k8q4nt4tc05yrtwypgqlf9c3zkud.ipynb

@kshedden
Copy link
Contributor Author

There are some minor deprecation issues here. I took out the EM and steepest ascent pre-optimizers since they were never useful and are hard to adapt to the new fitting procedure that profiles out the fixed effects parameters. The MixedLMParams object has slightly different behavior but that's generally not a user-facing class (it is not private since starting values can be passed in as a MixedLMParams instance).

@kshedden
Copy link
Contributor Author

Possible issues or directions for further enhancement:

  • Speed is better than before, but still considerably slower than R (lme4). I haven't done careful benchmarking but it appears to be maybe 3-5x slower. One remaining possibility for optimization would be to allow sparse matrices for the fe/re design matrices (which are often sparse in practice, especially random intercepts). The basic issue is that there are a huge number of rather small linear algebra computations (e.g. 30K matrix inversions on a small-ish example that I profiled). We handle these efficiently using the SMW approach. But there is still a lot of allocating/garbage of workspaces for small-scale linear algebra.
  • When using formulas, we call Patsy to process a formula on every group. This may be expensive. We can't process the formula just once for the whole data set because we want the random effects structure to be "reset" for each group. Maybe there is a better way.
  • We still don't support crossed random effects of any type, except through the inefficient approach of having a single group and specifying everything through the random effects matrices.
  • We have a method for constructing profile likelihoods for one parameter, it would be good to have a 2-parameter version, like lmer's.
  • Besides crossed effects, there are two main extensions to the model structure. One is to allow time-series like covariances at the "top level" (e.g. the residual errors become an AR(1) process). This would not be extremely difficult to add. I think people would really like to have this because it is not supported in lme4 (it was in the older nlme). Another important extension would be to allow heteroscedasticity. I don't have a clear design in mind for this. Frequency weights for groups and/or individuals would also be good to have, probably not too difficult.

@josef-pkt
Copy link
Member

@saketkc this is interesting for you also

@kshedden
Copy link
Contributor Author

A document describing the model and some notes on implementation:

https://umich.box.com/shared/static/t194ygil4bkiadot52q0nte6lmnqgdve.pdf

@josef-pkt
Copy link
Member

The test failure looks like a nan, or possibly an inf, in the score calculations. It only happens with scipy 0.11 and 0.12. (some version of LAPACK/Blas libraries check for nan and inf, others crash or hang)

@josef-pkt
Copy link
Member

A large number of small linear algebra calculations forced the use of cython and c-pointers to LAPACK.BLAS in the statespace model and Kalman Filter.
We might have to go to the same cython approach here or see whether we can combine many small operations into larger sparse operations (which makes it faster in matlab, but I don't have much experience with this in python).
I wouldn't go to cython until the design is settled, and the essential enhancements are in.
Looping over the groups is simple for parallel processing, but spreading it over multiple processes is only worth it if each job is large enough.

We still don't support crossed random effects of any type, except through the inefficient approach of having a single group and specifying everything through the random effects matrices.

Do you have an example for this? For example with crossed variance components (IIUC)
It looks to me that would be a good reference for the general crossed effects model, and we need to figure out the sparse handling within each group.

@josef-pkt
Copy link
Member

from the notebook

vca = patsy.dmatrix("0 + C(classroom)", df, return_type='dataframe')
vcb = patsy.dmatrix("0 + C(classroom):priortest", df, return_type='dataframe')

IIUC, this could be combined into a single formula if patsy were to create the two full dummy terms
"0 + C(classroom) + C(classroom):priortest"

I talked with Nathaniel at pycon about an option not to drop a reference category
pydata/patsy#60

(but I haven't figure out exactly what the variance components are and how the nesting works.)

@josef-pkt
Copy link
Member

Is there a remarkable speed difference between REML and MLE?
Bates mentioned that he has MLE as default in his Julia version.

@josef-pkt
Copy link
Member

Ok, I think I understand the model and what the variance components mean (I got confused because they can also refer to slopes not just dummies).

That looks good. That covers one case that I was trying to get at with examples around the time we merged MixedLM.
I'm also starting to see more the connection to Bates/DebRoy and to the Stata description.

I haven't looked much at the code changes yet.

About performance: Does the current speed difference to R hurt us much? If there are no more low hanging fruits for speed improvements, I would start to worry more about memory consumption in larger problems, when moving from many small individuals to a few large groups but many subgroups nested in them.

I think this could already handle pretty the case with a single group but 2 factor variance components that both don't have too many levels. (100 countries, and 6 5-year dummies, for example).

@saketkc
Copy link
Contributor

saketkc commented Apr 19, 2015

@josef-pkt I checked out the PR locally. Playing around with it, a bit slowly though.

@kshedden
Copy link
Contributor Author

Based on one example ML is about 25% faster than REML.

@kshedden
Copy link
Contributor Author

Crossed effects work better than I thought they would, at least for models with only a few parameters. Here is an example:

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/rxdtbw8p3tyzmstsan2ic4hzawv5hbq5.ipynb

This model with n=1000 fits in around 3 seconds.

This type of model produces very sparse cov_re matrices so one route to optimization would be to exploit this somehow (currently everything is treated as dense).

@kshedden
Copy link
Contributor Author

A design comment for future reference:

The code attempts to separate the details of a specific model parameterization from the generic parts that would be present for any mixed model. At a high level, a mixed model involves a mean mu and a covariance matrix V for each group. The likelihood can be computed from mu and V without knowing any other model details. The core code in the loglike ,score_full, and hessian methods handles these generic calculations.

But we also need to work with the specific model parameters, which is complicated by the fact that there are 4 different types of parameters (fixed effects, random effects, variance components, and the scale parameter) that each have their own idiosyncracies. The function gen_dV_dPar handles this to some extent. It cycles through the random effects parameters, and returns the derivative matrix of each element of V with respect to a given parameter. Since these derivative matrices are usually low rank we return them in factored form (so the derivative matrix is matl * matr'). In addition, we need V^{-1} * matl, etc. so we pre-compute these in the generator to avoid solving the same system multiple times.

If we ever want to extend the model, say by modeling heteroscedasticity, or allowing AR-type residual errors, a lot of the work would be confined to extending this function. In addition we would have to extend smw_solver to deal with a more general class of covariance matrices. Outside these two functions most of the core computational code would remain unchanged.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 83.93% when pulling 536eb86 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 83.93% when pulling 3d9b922 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 83.93% when pulling 3d9b922 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 83.93% when pulling 829a3a1 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

@kshedden
Copy link
Contributor Author

smw_solver doesn't need B, so we don't need to create it in augment_cov_re. It's deep in some loops so may save a bit of time.

@josef-pkt josef-pkt modified the milestone: 0.7 Jun 23, 2015
@josef-pkt
Copy link
Member

@kshedden I still have the 0.7 milestone for this.
Do you think we should merge this (or part of it) before tagging 0.7?

I'm planning to start merging for 0.8 within the next to weeks, assuming I have sufficient internet connection and a working computer in Europe. (I expect to have both.)

@kshedden
Copy link
Contributor Author

kshedden commented Jul 3, 2015

@josef-pkt I think it's solid enough to go in. The test coverage is decent
and I have been using it on several projects.

On Sat, Jul 4, 2015 at 3:23 AM, Josef Perktold notifications@github.com
wrote:

@kshedden https://github.com/kshedden I still have the 0.7 milestone
for this.
Do you think we should merge this (or part of it) before tagging 0.7?

I'm planning to start merging for 0.8 within the next to weeks, assuming I
have sufficient internet connection and a working computer in Europe. (I
expect to have both.)


Reply to this email directly or view it on GitHub
#2363 (comment)
.

josef-pkt added a commit that referenced this pull request Jul 10, 2015
REF/ENH: major upgrades to MixedLM including variance components
@josef-pkt josef-pkt merged commit b56dc22 into statsmodels:master Jul 10, 2015
@kshedden kshedden mentioned this pull request Jun 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants