BUG: Make sure RE names are properly handled in MixedLM. #2097

Merged
merged 9 commits into from Nov 20, 2014

Projects

None yet

4 participants

@jseabold
Member

Fixes the issue mentioned on the mailing list about RE names. This needs tests. It would be great if someone else wrote those tests, because I don't know what the correct output is necessarily supposed to be. This matches the approach in the code before I refactored it so that the wrappers work, there was a thinko, and the stuff in from_formula needs to be handled differently for now unfortunately. MWE

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

data = sm.datasets.get_rdataset('dietox', 'geepack').data

data['constant'] = 1
mod = sm.MixedLM(data['Weight'], data[['constant', 'Time']],
                 groups=data['Pig'], exog_re=data[['constant', 'Time']])
modf = mod.fit()
print modf.summary()
print modf.params


mod  = smf.mixedlm('Weight ~ Time', data, groups = 'Pig', re_formula = '~Time')
modf = mod.fit()
print modf.summary()
print modf.params

Closes #2099.

@jseabold jseabold changed the title from BUG: Make sure RE names are properly handled. to BUG: Make sure RE names are properly handled in MixedLM. Nov 17, 2014
@jseabold jseabold added this to the 0.6.1 milestone Nov 17, 2014
@josef-pkt
Member

You can add a call to summary() in one of the existing test cases as smoke test

I haven't gone through MixedLM yet to add all the generic tests AFAIR.

@jseabold
Member

I'm more concerned that the names are correct re: the parameters they line up with, and I don't want to/can't spend a ton of time figuring it out.

@jseabold
Member

That should do it for a cleanup, though there are still issues.

I added one of @kshedden's notebook to the repo bc. it doesn't work as is on the wiki, needed a cleanup to be a bit more Pythonic (I already applied these cleanups to the release notes and other examples), and I wanted to get an idea of the actual R syntax, so I could compare. I still can't replicate the profile likelihood example in the original notebook [1], but it doesn't work for me on the old code either. Failures are non-deterministic, so I guess the optimization just isn't that robust for this case. Maybe we should use a different example?

The two problems were 1) the dist_low resulted in a negative first leading principal minor. I added a check for this. and 2) it estimates non-PD cov_re which leads to NaNs for the likelihood.

[1] http://nbviewer.ipython.org/urls/umich.box.com/shared/static/mmgp8ztqnt1kvhmw1kd9.ipynb

@coveralls

Coverage Status

Coverage remained the same when pulling 28bbf96 on jseabold:fix-mixedlm into b88f500 on statsmodels:master.

@jseabold
Member

I guess I'm just going to remove the profile likelihood stuff from the notebook and merge this unless someone wants to look in to it.

@coveralls

Coverage Status

Coverage increased (+0.05%) when pulling 5762d79 on jseabold:fix-mixedlm into b88f500 on statsmodels:master.

@jseabold jseabold merged commit b878d72 into statsmodels:master Nov 20, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@jseabold jseabold deleted the jseabold:fix-mixedlm branch Nov 20, 2014
@jseabold jseabold added a commit that referenced this pull request Dec 2, 2014
@jseabold jseabold Backport PR #2097: BUG: Make sure RE names are properly handled in Mi…
…xedLM.

Fixes the issue mentioned on the mailing list about RE names. This needs tests. It would be great if someone else wrote those tests, because I don't know what the correct output is necessarily supposed to be. This matches the approach in the code before I refactored it so that the wrappers work, there was a thinko, and the stuff in from_formula needs to be handled differently for now unfortunately. MWE

    import pandas as pd
    import statsmodels.api as sm
    import statsmodels.formula.api as smf

    data = sm.datasets.get_rdataset('dietox', 'geepack').data

    data['constant'] = 1
    mod = sm.MixedLM(data['Weight'], data[['constant', 'Time']],
                     groups=data['Pig'], exog_re=data[['constant', 'Time']])
    modf = mod.fit()
    print modf.summary()
    print modf.params

    mod  = smf.mixedlm('Weight ~ Time', data, groups = 'Pig', re_formula = '~Time')
    modf = mod.fit()
    print modf.summary()
    print modf.params

Closes #2099.
e698f30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment