WIP/ENH: major upgrades to MixedLM including variance components #2363

kshedden · 2015-04-18T22:38:17Z

This PR is a major overhaul of MixedLM. The main feature added here is support for a second type of random effect that we call a "variance component". This allows for many types of more complex models involving, for example, nested random effects that could not be fit before. The PR also includes many optimizations that should lead to substantially faster performance than what we had before.

There are likely still some issues to be resolved so I marked it WIP, but it has been pretty thoroughly tested. We may also need to discuss terminology and naming conventions. The design matrices for variance components are passed to init in a form that does not match a pattern used in any other model. I don't see an obviously better choice but it is definitely work thinking about this some more before committing to something. Using formulas is much easier than using the traditional "array interface".

This supercede #2124 which I will close.

kshedden · 2015-04-18T22:41:30Z

Here is a notebook illustrating the use of the new type of random effect:

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/bqw1k8q4nt4tc05yrtwypgqlf9c3zkud.ipynb

kshedden · 2015-04-18T22:44:49Z

There are some minor deprecation issues here. I took out the EM and steepest ascent pre-optimizers since they were never useful and are hard to adapt to the new fitting procedure that profiles out the fixed effects parameters. The MixedLMParams object has slightly different behavior but that's generally not a user-facing class (it is not private since starting values can be passed in as a MixedLMParams instance).

kshedden · 2015-04-18T22:58:59Z

Possible issues or directions for further enhancement:

Speed is better than before, but still considerably slower than R (lme4). I haven't done careful benchmarking but it appears to be maybe 3-5x slower. One remaining possibility for optimization would be to allow sparse matrices for the fe/re design matrices (which are often sparse in practice, especially random intercepts). The basic issue is that there are a huge number of rather small linear algebra computations (e.g. 30K matrix inversions on a small-ish example that I profiled). We handle these efficiently using the SMW approach. But there is still a lot of allocating/garbage of workspaces for small-scale linear algebra.
When using formulas, we call Patsy to process a formula on every group. This may be expensive. We can't process the formula just once for the whole data set because we want the random effects structure to be "reset" for each group. Maybe there is a better way.
We still don't support crossed random effects of any type, except through the inefficient approach of having a single group and specifying everything through the random effects matrices.
We have a method for constructing profile likelihoods for one parameter, it would be good to have a 2-parameter version, like lmer's.
Besides crossed effects, there are two main extensions to the model structure. One is to allow time-series like covariances at the "top level" (e.g. the residual errors become an AR(1) process). This would not be extremely difficult to add. I think people would really like to have this because it is not supported in lme4 (it was in the older nlme). Another important extension would be to allow heteroscedasticity. I don't have a clear design in mind for this. Frequency weights for groups and/or individuals would also be good to have, probably not too difficult.

josef-pkt · 2015-04-18T23:32:54Z

@saketkc this is interesting for you also

kshedden · 2015-04-19T01:20:45Z

A document describing the model and some notes on implementation:

https://umich.box.com/shared/static/t194ygil4bkiadot52q0nte6lmnqgdve.pdf

josef-pkt · 2015-04-19T02:38:15Z

The test failure looks like a nan, or possibly an inf, in the score calculations. It only happens with scipy 0.11 and 0.12. (some version of LAPACK/Blas libraries check for nan and inf, others crash or hang)

josef-pkt · 2015-04-19T02:50:30Z

A large number of small linear algebra calculations forced the use of cython and c-pointers to LAPACK.BLAS in the statespace model and Kalman Filter.
We might have to go to the same cython approach here or see whether we can combine many small operations into larger sparse operations (which makes it faster in matlab, but I don't have much experience with this in python).
I wouldn't go to cython until the design is settled, and the essential enhancements are in.
Looping over the groups is simple for parallel processing, but spreading it over multiple processes is only worth it if each job is large enough.

We still don't support crossed random effects of any type, except through the inefficient approach of having a single group and specifying everything through the random effects matrices.

Do you have an example for this? For example with crossed variance components (IIUC)
It looks to me that would be a good reference for the general crossed effects model, and we need to figure out the sparse handling within each group.

josef-pkt · 2015-04-19T02:59:32Z

from the notebook

vca = patsy.dmatrix("0 + C(classroom)", df, return_type='dataframe')
vcb = patsy.dmatrix("0 + C(classroom):priortest", df, return_type='dataframe')

IIUC, this could be combined into a single formula if patsy were to create the two full dummy terms
"0 + C(classroom) + C(classroom):priortest"

I talked with Nathaniel at pycon about an option not to drop a reference category
pydata/patsy#60

(but I haven't figure out exactly what the variance components are and how the nesting works.)

josef-pkt · 2015-04-19T03:10:27Z

Is there a remarkable speed difference between REML and MLE?
Bates mentioned that he has MLE as default in his Julia version.

josef-pkt · 2015-04-19T03:31:25Z

Ok, I think I understand the model and what the variance components mean (I got confused because they can also refer to slopes not just dummies).

That looks good. That covers one case that I was trying to get at with examples around the time we merged MixedLM.
I'm also starting to see more the connection to Bates/DebRoy and to the Stata description.

I haven't looked much at the code changes yet.

About performance: Does the current speed difference to R hurt us much? If there are no more low hanging fruits for speed improvements, I would start to worry more about memory consumption in larger problems, when moving from many small individuals to a few large groups but many subgroups nested in them.

I think this could already handle pretty the case with a single group but 2 factor variance components that both don't have too many levels. (100 countries, and 6 5-year dummies, for example).

saketkc · 2015-04-19T06:54:11Z

@josef-pkt I checked out the PR locally. Playing around with it, a bit slowly though.

kshedden · 2015-04-19T13:17:26Z

Based on one example ML is about 25% faster than REML.

kshedden · 2015-04-19T15:51:06Z

Crossed effects work better than I thought they would, at least for models with only a few parameters. Here is an example:

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/rxdtbw8p3tyzmstsan2ic4hzawv5hbq5.ipynb

This model with n=1000 fits in around 3 seconds.

This type of model produces very sparse cov_re matrices so one route to optimization would be to exploit this somehow (currently everything is treated as dense).

kshedden · 2015-04-19T16:07:28Z

A design comment for future reference:

The code attempts to separate the details of a specific model parameterization from the generic parts that would be present for any mixed model. At a high level, a mixed model involves a mean mu and a covariance matrix V for each group. The likelihood can be computed from mu and V without knowing any other model details. The core code in the loglike ,score_full, and hessian methods handles these generic calculations.

But we also need to work with the specific model parameters, which is complicated by the fact that there are 4 different types of parameters (fixed effects, random effects, variance components, and the scale parameter) that each have their own idiosyncracies. The function gen_dV_dPar handles this to some extent. It cycles through the random effects parameters, and returns the derivative matrix of each element of V with respect to a given parameter. Since these derivative matrices are usually low rank we return them in factored form (so the derivative matrix is matl * matr'). In addition, we need V^{-1} * matl, etc. so we pre-compute these in the generator to avoid solving the same system multiple times.

If we ever want to extend the model, say by modeling heteroscedasticity, or allowing AR-type residual errors, a lot of the work would be confined to extending this function. In addition we would have to extend smw_solver to deal with a more general class of covariance matrices. Outside these two functions most of the core computational code would remain unchanged.

coveralls · 2015-04-27T04:02:13Z

Coverage increased (+0.02%) to 83.93% when pulling 536eb86 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

coveralls · 2015-04-30T23:48:50Z

Coverage increased (+0.03%) to 83.93% when pulling 3d9b922 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

coveralls · 2015-05-01T00:27:10Z

Coverage increased (+0.03%) to 83.93% when pulling 3d9b922 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

coveralls · 2015-05-01T04:32:29Z

Coverage increased (+0.03%) to 83.93% when pulling 829a3a1 on kshedden:lme-profile-beta2 into a1f53b0 on statsmodels:master.

kshedden · 2015-06-19T01:13:27Z

smw_solver doesn't need B, so we don't need to create it in augment_cov_re. It's deep in some loops so may save a bit of time.

josef-pkt · 2015-07-03T19:23:08Z

@kshedden I still have the 0.7 milestone for this.
Do you think we should merge this (or part of it) before tagging 0.7?

I'm planning to start merging for 0.8 within the next to weeks, assuming I have sufficient internet connection and a working computer in Europe. (I expect to have both.)

kshedden · 2015-07-03T20:33:31Z

@josef-pkt I think it's solid enough to go in. The test coverage is decent
and I have been using it on several projects.

On Sat, Jul 4, 2015 at 3:23 AM, Josef Perktold notifications@github.com
wrote:

@kshedden https://github.com/kshedden I still have the 0.7 milestone
for this.
Do you think we should merge this (or part of it) before tagging 0.7?

I'm planning to start merging for 0.8 within the next to weeks, assuming I
have sufficient internet connection and a working computer in Europe. (I
expect to have both.)

—
Reply to this email directly or view it on GitHub
#2363 (comment)
.

REF/ENH: major upgrades to MixedLM including variance components

kshedden mentioned this pull request Apr 18, 2015

ENH Handle mean structure parameters in MixedLM by profiling #2124

Closed

kshedden mentioned this pull request May 7, 2015

ENH: From classic to floor division for k_re2, removes NumPy DeprecationWarning #2390

Merged

kshedden mentioned this pull request Jun 17, 2015

Mixed LM optimization GSOC 2015 #2452

Open

kshedden force-pushed the lme-profile-beta2 branch from 15952d1 to 34693a3 Compare June 21, 2015 19:51

josef-pkt modified the milestone: 0.7 Jun 23, 2015

josef-pkt mentioned this pull request Jul 8, 2015

GEE stationary covariance structure #2289

Merged

kshedden added 2 commits July 8, 2015 20:12

Initial commit

16ef121

Add variance components support to MixedLMParams

17d4a27

kshedden added 23 commits July 8, 2015 20:12

Add formula support

bb1ddce

Add test for variance components

6e3acce

Added another example

be90e10

Optimizations involving smw_solver

0935d55

Extended profile_re to handle variance components

49b1c1f

Fix directory issue for data file

3b43292

Allow better control over optimization method

f42e782

Fix naming defaults for formulas

b714c04

Rename _gen_dV_dPar for clarity [skip ci]

496ac9c

Added predict and t_test

7d4becb

Working on travis-only failures

d2275b0

Resolve corner case for 1d model

c88f53f

Attempting to resolve failure in older numpy

1b8a6a3

Don't run test that fails on old scipy

9a501dc

Added aic/bic

f06aa9d

Docstring work

27e3fc7

Adjust df for fixed parameters

64aa724

Various small optimizations

d4463d5

Optimizations of variance components calculations

a392efb

Docstring edit [ci skip]

bb8de0d

Added fittedvalues and resid

5d36e19

Prepare for supporting sparse matrices internally

483a038

Added support for sparse exog_vc

bc9c672

kshedden force-pushed the lme-profile-beta2 branch from 8604a49 to bc9c672 Compare July 9, 2015 00:13

josef-pkt added a commit that referenced this pull request Jul 10, 2015

Merge pull request #2363 from kshedden/lme-profile-beta2

b56dc22

REF/ENH: major upgrades to MixedLM including variance components

josef-pkt merged commit b56dc22 into statsmodels:master Jul 10, 2015

josef-pkt added type-enh comp-regression type-refactor labels Jul 10, 2015

kshedden mentioned this pull request Jun 20, 2016

release 0.8 #2176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP/ENH: major upgrades to MixedLM including variance components #2363

WIP/ENH: major upgrades to MixedLM including variance components #2363

kshedden commented Apr 18, 2015

kshedden commented Apr 18, 2015

kshedden commented Apr 18, 2015

kshedden commented Apr 18, 2015

josef-pkt commented Apr 18, 2015

kshedden commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

saketkc commented Apr 19, 2015

kshedden commented Apr 19, 2015

kshedden commented Apr 19, 2015

kshedden commented Apr 19, 2015

coveralls commented Apr 27, 2015

coveralls commented Apr 30, 2015

coveralls commented May 1, 2015

coveralls commented May 1, 2015

kshedden commented Jun 19, 2015

josef-pkt commented Jul 3, 2015

kshedden commented Jul 3, 2015

WIP/ENH: major upgrades to MixedLM including variance components #2363

WIP/ENH: major upgrades to MixedLM including variance components #2363

Conversation

kshedden commented Apr 18, 2015

kshedden commented Apr 18, 2015

kshedden commented Apr 18, 2015

kshedden commented Apr 18, 2015

josef-pkt commented Apr 18, 2015

kshedden commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

josef-pkt commented Apr 19, 2015

saketkc commented Apr 19, 2015

kshedden commented Apr 19, 2015

kshedden commented Apr 19, 2015

kshedden commented Apr 19, 2015

coveralls commented Apr 27, 2015

coveralls commented Apr 30, 2015

coveralls commented May 1, 2015

coveralls commented May 1, 2015

kshedden commented Jun 19, 2015

josef-pkt commented Jul 3, 2015

kshedden commented Jul 3, 2015