New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/ENH: major upgrades to MixedLM including variance components #2363
Conversation
Here is a notebook illustrating the use of the new type of random effect: http://nbviewer.ipython.org/urls/umich.box.com/shared/static/bqw1k8q4nt4tc05yrtwypgqlf9c3zkud.ipynb |
There are some minor deprecation issues here. I took out the EM and steepest ascent pre-optimizers since they were never useful and are hard to adapt to the new fitting procedure that profiles out the fixed effects parameters. The MixedLMParams object has slightly different behavior but that's generally not a user-facing class (it is not private since starting values can be passed in as a MixedLMParams instance). |
Possible issues or directions for further enhancement:
|
@saketkc this is interesting for you also |
A document describing the model and some notes on implementation: https://umich.box.com/shared/static/t194ygil4bkiadot52q0nte6lmnqgdve.pdf |
The test failure looks like a |
A large number of small linear algebra calculations forced the use of cython and c-pointers to LAPACK.BLAS in the statespace model and Kalman Filter.
Do you have an example for this? For example with crossed variance components (IIUC) |
from the notebook
IIUC, this could be combined into a single formula if patsy were to create the two full dummy terms I talked with Nathaniel at pycon about an option not to drop a reference category (but I haven't figure out exactly what the variance components are and how the nesting works.) |
Is there a remarkable speed difference between REML and MLE? |
Ok, I think I understand the model and what the variance components mean (I got confused because they can also refer to slopes not just dummies). That looks good. That covers one case that I was trying to get at with examples around the time we merged MixedLM. I haven't looked much at the code changes yet. About performance: Does the current speed difference to R hurt us much? If there are no more low hanging fruits for speed improvements, I would start to worry more about memory consumption in larger problems, when moving from many small individuals to a few large groups but many subgroups nested in them. I think this could already handle pretty the case with a single group but 2 factor variance components that both don't have too many levels. (100 countries, and 6 5-year dummies, for example). |
@josef-pkt I checked out the PR locally. Playing around with it, a bit slowly though. |
Based on one example ML is about 25% faster than REML. |
Crossed effects work better than I thought they would, at least for models with only a few parameters. Here is an example: http://nbviewer.ipython.org/urls/umich.box.com/shared/static/rxdtbw8p3tyzmstsan2ic4hzawv5hbq5.ipynb This model with n=1000 fits in around 3 seconds. This type of model produces very sparse cov_re matrices so one route to optimization would be to exploit this somehow (currently everything is treated as dense). |
A design comment for future reference: The code attempts to separate the details of a specific model parameterization from the generic parts that would be present for any mixed model. At a high level, a mixed model involves a mean But we also need to work with the specific model parameters, which is complicated by the fact that there are 4 different types of parameters (fixed effects, random effects, variance components, and the scale parameter) that each have their own idiosyncracies. The function gen_dV_dPar handles this to some extent. It cycles through the random effects parameters, and returns the derivative matrix of each element of V with respect to a given parameter. Since these derivative matrices are usually low rank we return them in factored form (so the derivative matrix is matl * matr'). In addition, we need V^{-1} * matl, etc. so we pre-compute these in the generator to avoid solving the same system multiple times. If we ever want to extend the model, say by modeling heteroscedasticity, or allowing AR-type residual errors, a lot of the work would be confined to extending this function. In addition we would have to extend smw_solver to deal with a more general class of covariance matrices. Outside these two functions most of the core computational code would remain unchanged. |
|
15952d1
to
34693a3
Compare
@kshedden I still have the 0.7 milestone for this. I'm planning to start merging for 0.8 within the next to weeks, assuming I have sufficient internet connection and a working computer in Europe. (I expect to have both.) |
@josef-pkt I think it's solid enough to go in. The test coverage is decent On Sat, Jul 4, 2015 at 3:23 AM, Josef Perktold notifications@github.com
|
8604a49
to
bc9c672
Compare
REF/ENH: major upgrades to MixedLM including variance components
This PR is a major overhaul of MixedLM. The main feature added here is support for a second type of random effect that we call a "variance component". This allows for many types of more complex models involving, for example, nested random effects that could not be fit before. The PR also includes many optimizations that should lead to substantially faster performance than what we had before.
There are likely still some issues to be resolved so I marked it WIP, but it has been pretty thoroughly tested. We may also need to discuss terminology and naming conventions. The design matrices for variance components are passed to init in a form that does not match a pattern used in any other model. I don't see an obviously better choice but it is definitely work thinking about this some more before committing to something. Using formulas is much easier than using the traditional "array interface".
This supercede #2124 which I will close.