ENH: Glm add score_obs #1781

Merged
merged 8 commits into from Jul 9, 2014

Projects

None yet

3 participants

@josef-pkt
Member

so far just adds analytic (glm generic) score_factor and score_obs

see #1775 for score_factor
usage
#1753 score/LM test
#1738 robust cov

related #1726 generic numerical derivatives in LikelihoodModel

score_factor is the same as score residuals in Stata.

I checked some examples against Stata, but unittests are against discrete and OLS/explicit

add Hessian, observed and expected (information matrix) should be relatively easy to add, but not yet done. DONE
hackish: problems in varfunc and numdiff (need to open two issues)
uses numerical derivatives in varfunc and links, should be replaced with analytical derivatives

added score_test to model, not yet to results, maybe it only stays at the model.

@coveralls

Coverage Status

Coverage increased (+0.02%) when pulling ebe888b on josef-pkt:glm_score into 7df5291 on statsmodels:master.

@josef-pkt
Member

@kshedden In my last commit I added a score_test.

I haven't verified it yet against R, but it should roughly be a pattern that should work across models, (at least for cases where we can get score and hessian for new exog without creating a new model).

It can take either a constrained parameter, or add additional exog variables to the score to test for omitted variables.

@coveralls

Coverage Status

Coverage increased (+0.04%) when pulling 9ffbcb6 on josef-pkt:glm_score into 7df5291 on statsmodels:master.

@kshedden
Contributor

Great. I will look this over. Hopefully we can make this (and your
fit_constrained) work for MixedLM, GEE, and PHreg.

In the case of GEE, we would override the score test method because GEE is
not an MLE (the score test still exists, but has a nonstandard form). We
can use the same interface to make this transparent.

PHreg can probably use your score test code as-is. The wrinkle with PHreg
is computing score_obs using the partial likelihood. But I have already
done that to provide the robust covariance.

It would be nice to get my constrained fitting code out of GEE for
simplicity if it is redundant with what is now in the base.

Constrained fitting in MixedLM is more complicated because there are
"extra" (variance) parameters. However your constrained fitting should
still work if someone only wants to constrain the fixed effects
coefficients.

On Tue, Jun 24, 2014 at 11:57 PM, Josef Perktold notifications@github.com
wrote:

@kshedden https://github.com/kshedden In my last commit I added a
score_test.

I haven't verified it yet against R, but it should roughly be a pattern
that should work across models, (at least for cases where we can get score
and hessian for new exog without creating a new model).

It can take either a constrained parameter, or add additional exog
variables to the score to test for omitted variables.


Reply to this email directly or view it on GitHub
#1781 (comment)
.

@josef-pkt
Member

We are going to work slowly to a pattern that will be generic enough, or different patterns to apply to most models.

One extension for GLM will be to take different (over/under dispersed) scale estimation into account, and the extra parameters for negative binomial and gamma.

Another extension is to use auxiliary regression based on the residuals for score/LM tests diagnostic tests. I started to look at those for discrete.Poisson, but they will apply in the same/similar way for GLM and others.

Another extension for GLM now that score and Hessian are available is to add scipy optimizers as fitting methods, both SAS and Stata use Newton-Raphson as default optimizer (I'm not completely sure it's the default in SAS or just an option).

My plan is to go back to robust covariances and finish connecting them to the models (now including GLM), after looking at some issues and PRs.

@coveralls

Coverage Status

Coverage increased (+0.05%) when pulling b7e7f27 on josef-pkt:glm_score into 7df5291 on statsmodels:master.

@coveralls

Coverage Status

Coverage increased (+0.05%) when pulling 3889442 on josef-pkt:glm_score into 7df5291 on statsmodels:master.

@coveralls

Coverage Status

Coverage increased (+0.05%) when pulling 90b25bb on josef-pkt:glm_score into 7df5291 on statsmodels:master.

@josef-pkt
Member

I'm pretty much done here.

The only thing I'm still unsure is whether scale is handled correctly in
the score test, I only have Logit/Bernoulli as test case against R.

Also, score_test returns (stat, pvalue, df) instead of a results instance.

@coveralls

Coverage Status

Coverage increased (+0.06%) when pulling 07433ab on josef-pkt:glm_score into 7df5291 on statsmodels:master.

@josef-pkt josef-pkt referenced this pull request Jun 25, 2014
Closed

MAINT: GLM #1734

@josef-pkt josef-pkt added the PR label Jul 9, 2014
@josef-pkt
Member

rebased and force pushed

@coveralls

Coverage Status

Coverage increased (+0.06%) when pulling 4662fef on josef-pkt:glm_score into 9217656 on statsmodels:master.

@josef-pkt
Member

merging this

todo follow-up: use analytic instead of numeric derivatives

@josef-pkt josef-pkt merged commit ab12fc2 into statsmodels:master Jul 9, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci The Travis CI build passed
Details
@josef-pkt josef-pkt deleted the josef-pkt:glm_score branch Jul 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment