Add NominalGEE and OrdinalGEE to api #2053

Merged
merged 5 commits into from Dec 2, 2014

Projects

None yet

4 participants

@kshedden
Contributor

This PR adds NominalGEE and OrdinalGEE to the API. Some related enhancements are bundled in too, specifically:

  • Privatize the Multinomial and MultinomialLogit classes, make _Multinomial the default family for NominalGEE. Automate the setting of the number of levels.
  • Make binomial the default family for OrdinalGEE.
  • Docstring changes related to above.
@josef-pkt josef-pkt commented on an outdated diff Oct 20, 2014
statsmodels/genmod/generalized_estimating_equations.py
@@ -258,6 +254,22 @@ def unpack_cov(self, bcov):
%(example)s
"""
+_gee_family_doc = """\
+ The default is Gaussian. To specify the binomial
+ distribution use `family=sm.family.Binomial()`. Each family
+ can take a link instance as an argument. See
+ statsmodels.family.family for more information."""
+
+_gee_ordinal_family_doc = """\
+ The default is `Binomial`. Any family intended for use with
+ binary responses may be specified. The only other current
+ option is `Probit`."""
@josef-pkt
josef-pkt Oct 20, 2014 Member

Is probit a family or a link?
for GLM probit is a link
Binomial is the only binary family

@josef-pkt josef-pkt commented on an outdated diff Oct 20, 2014
statsmodels/genmod/generalized_estimating_equations.py
- >>> mod = GEE.from_formula("y ~ x1 + x2", groups, data,
- cov_struct=gor, family=family)
- >>> rslt = mod.fit()
- >>> print rslt.summary()
+ >>> model = GEE.from_formula("y ~ x1 + x2", groups, data,
@josef-pkt
josef-pkt Oct 20, 2014 Member

we also have lower case gee as alias to GEE.from_formula in statsmodels.formula.api
To follow the pattern of other models, we should also document

import statsmodels.formula.api as smf
smf.gee(...)

ordinal_gee and nominal_gee (underline or not ?) should be added to formula.api also

@josef-pkt josef-pkt commented on the diff Oct 20, 2014
statsmodels/genmod/generalized_estimating_equations.py
'example': _gee_ordinal_example})
def __init__(self, endog, exog, groups, time=None, family=None,
cov_struct=None, missing='none', offset=None,
dep_data=None, constraint=None):
+ if family is None:
+ family = families.Binomial()
@josef-pkt
josef-pkt Oct 20, 2014 Member

I just checked, using None is consistent with GLM
in other cases (maybe only RLM) we put the class directly in the signature)

@jseabold
jseabold Dec 2, 2014 Member

If the class instance is mutable, this should be fixed in RLM.

@josef-pkt
Member

looks good to me, except for missing formula.api addition

aside (maybe new issue for extension if it can be made to work):
I was checking how ncut is determined.
It looks like the endog has to be categorical and not a 2d dummy variable.
discrete.MNLogit can take both, either a categorical 1d endog, or a 2d set of dummy variables.

(mis)usecase:
similar to using Binomial/Bernoulli for fractional data (in closed interval [0, 1]) we can use MNLogit or MultiNomial for multivariate fractions, shares that add up to one, (named composition data in an R package) where endog is in simplex, instead of being at the corners of the simplex with exactly one 1.
I haven't read anything yet about how it works, but I think that's the basic idea.
GEE would already have the robust covariances, discrete MNLogit doesn't.

@josef-pkt josef-pkt added this to the 0.6 milestone Oct 20, 2014
@kshedden
Contributor

I just pushed the formula api changes.

Regarding compositional data, yes I see what you mean. The usual
multinomial probabilities exp(b_j x), j=1, ... #levels, normalized to sum
to 1, would become the means of the components, and the variances would be
mean * (1 - mean). I'm not sure whether the global odds ratio dependence
structure would make sense here. Eventually I will get back to the
categorical GEE, there are lost of developments beyond what we have now.

On Sun, Oct 19, 2014 at 10:36 PM, Josef Perktold notifications@github.com
wrote:

looks good to me, except for missing formula.api addition

aside (maybe new issue for extension if it can be made to work):
I was checking how ncut is determined.
It looks like the endog has to be categorical and not a 2d dummy variable.
discrete.MNLogit can take both, either a categorical 1d endog, or a 2d set
of dummy variables.

(mis)usecase:
similar to using Binomial/Bernoulli for fractional data (in closed
interval [0, 1]) we can use MNLogit or MultiNomial for multivariate
fractions, shares that add up to one, (named composition data in an R
package) where endog is in simplex, instead of being at the corners of the
simplex with exactly one 1.

I haven't read anything yet about how it works, but I think that's the
basic idea.
GEE would already have the robust covariances, discrete MNLogit doesn't.


Reply to this email directly or view it on GitHub
#2053 (comment)
.

@coveralls

Coverage Status

Coverage decreased (-0.0%) when pulling 66c88e7 on kshedden:categorical-gee-api into 7473965 on statsmodels:master.

@josef-pkt

import as smf
we want to establish the convention that users keep the name spaces separate
sm is general
smfis specifically formulas
(this convention sticks so far to a large extend from what I have seen browing github)

the alternative when we only need one or a few models is the explicit import
from statsmodels.formula.api import gee

@josef-pkt

this is in the sm namespace (correct)

@josef-pkt

this should use the smf namespace
(one extra line of imports we need both sm and smf in this case.)

@josef-pkt

same as above, need only smf in this case

@josef-pkt
Member

I made comments about distinguishing sm and smf directly on the commit, not the PR

@coveralls

Coverage Status

Coverage decreased (-0.0%) when pulling d912e79 on kshedden:categorical-gee-api into 7473965 on statsmodels:master.

@jseabold
Member

Can you also add to formula/api.py so we can avoid encouraging use of from_formula in examples? I had to change a bunch of these for the release already.

@kshedden
Contributor

I pushed the updated api files. If the docstrings need more work let me know.

@jseabold jseabold commented on an outdated diff Nov 21, 2014
statsmodels/genmod/generalized_estimating_equations.py
- >>> gor = GlobalOddsRatio("ordinal")
- >>> mod = OrdinalGEE(endog, exog, groups, None, family, gor)
- >>> rslt = mod.fit()
- >>> print rslt.summary()
-
- Use formulas:
-
- >>> mod = GEE.from_formula("y ~ x1 + x2", groups, data,
- cov_struct=gor, family=family)
- >>> rslt = mod.fit()
- >>> print rslt.summary()
+ Fit an ordinal regression model using GEE, with "global
+ odds ratio" dependence:
+
+ >>> import statsmodels.api as sm
+ >>> gor = sm.families.GlobalOddsRatio("ordinal")
@jseabold
jseabold Nov 21, 2014 Member

GlobalAddsRatio is in the sm.cov_struct namespace.

@jseabold jseabold commented on an outdated diff Nov 21, 2014
statsmodels/genmod/generalized_estimating_equations.py
- cov_struct=gor, family=family)
- >>> rslt = mod.fit()
- >>> print rslt.summary()
+ Fit an ordinal regression model using GEE, with "global
+ odds ratio" dependence:
+
+ >>> import statsmodels.api as sm
+ >>> gor = sm.families.GlobalOddsRatio("ordinal")
+ >>> model = sm.OrdinalGEE(endog, exog, groups, cov_struct=gor)
+ >>> result = model.fit()
+ >>> print result.summary()
+
+ Using formulas:
+
+ >>> import statsmodels.api as sm
+ >>> model = sm.OrdinalGEE.from_formula("y ~ x1 + x2", groups,
@jseabold
jseabold Nov 21, 2014 Member

Maybe use the statsmodels.formula.api namespace for these?

import statsmodels.formula.api as smf
model = smf.ordinal_gee(...)
@jseabold jseabold modified the milestone: 0.6.1, 0.6 Nov 21, 2014
@coveralls

Coverage Status

Coverage increased (+0.0%) when pulling 484edaa on kshedden:categorical-gee-api into 7473965 on statsmodels:master.

@jseabold jseabold merged commit 035d9fb into statsmodels:master Dec 2, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@jseabold jseabold added a commit that referenced this pull request Dec 2, 2014
@jseabold jseabold Backport PR #2053: Add NominalGEE and OrdinalGEE to api
This PR adds NominalGEE and OrdinalGEE to the API.  Some related enhancements are bundled in too, specifically:

* Privatize the Multinomial and MultinomialLogit classes, make _Multinomial the default family for NominalGEE.  Automate the setting of the number of levels.

* Make binomial the default family for OrdinalGEE.

* Docstring changes related to above.
579d548
@kshedden kshedden deleted the kshedden:categorical-gee-api branch Feb 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment