ENH add GEE to genmod #1255

Merged
merged 57 commits into from Dec 19, 2013

Projects

None yet

4 participants

@josef-pkt
Member

this is a rebased version of PR #928 by @kshedden
see #928 for discussion
additional cleanups from PR #1215 by @vincentarelbundock

I will do another rebase and force push just before merging.

The PR doesn't have any example scripts and examples need to be extracted from unittests or simulations.
I haven't tried building the docs yet. (master has currently already many doc build warnings.)
No systematic testing across python and dependency versions yet.

This is a very large enhancement, and followup review and code adjustments are to be expected.

@josef-pkt
Member

@kshedden @vincentarelbundock
Do you have any code relying on this branch?
I'm planning to do another rebase, which will make any branches based on this branch difficult to merge.

I'm planning to merge this within a few hours, after another brief look and running some examples.

@vincentarelbundock
Member

I don't. Go ahead.

@coveralls

Coverage Status

Coverage remained the same when pulling 6affdc6 on gee_rebased into 01fd4a2 on master.

@josef-pkt josef-pkt commented on an outdated diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ Returns
+ -------
+ smry : Summary instance
+ this holds the summary tables and text, which can be
+ printed or converted to various output formats.
+
+ See Also
+ --------
+ statsmodels.iolib.summary.Summary : class to hold summary
+ results
+
+ """
+
+ top_left = [('Dep. Variable:', None),
+ ('Model:', None),
+ ('Method:', ['Generalized Estimating Equations']),
@josef-pkt
josef-pkt Dec 19, 2013 Member

name is too long, makes top table too wide
shorten or break into two lines

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/dependence_structures/covstruct.py
+ raise ValueError(
+ "Autoregressive: unable to find right bracket")
+
+ from scipy.optimize import brent
+ self.dep_params = brent(fitfunc, brack=[b_lft, b_ctr, b_rgt])
+
+
+ def covariance_matrix(self, endog_expval, index):
+ ngrp = len(endog_expval)
+ if self.dep_params == 0:
+ return np.eye(ngrp, dtype=np.float64), True
+ idx = np.arange(ngrp)
+ return self.dep_params**np.abs(idx[:, None] - idx[None, :]), True
+
+
+ def summary(self):
@josef-pkt
josef-pkt Dec 19, 2013 Member

some short summary have print statement instead of returning text
code style and py3 syntax error

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ """
+ self._reset() # always reset the cache when this is called
+ #TODO: if at is not all or overall, we can also put atexog values
+ # in summary table head
+ method = method.lower()
+ at = at.lower()
+ _check_margeff_args(at, method)
+ self.margeff_options = dict(method=method, at=at)
+ results = self.results
+ model = results.model
+ params = results.params
+ exog = model.exog.copy() # copy because values are changed
+ effects_idx, const_idx = _get_const_index(exog)
+
+ if dummy:
+ _check_discrete_args(at, method)
@josef-pkt
josef-pkt Dec 19, 2013 Member

pylint not defined, same for _get_dummy_index and _get_count_index
missing imports?
missing unit tests

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ Endogeneous (response) data for the unmodified data.
+
+ n_exog : integer
+ The number of exogeneous (predictor) variables
+ """
+
+ endog_values = list(set(endog))
+ endog_values.sort()
+ ncuts = len(endog_values) - 1
+
+ return np.zeros(n_exog * ncuts, dtype=np.float64)
+
+
+import statsmodels.genmod.families.varfuncs as varfuncs
+from statsmodels.genmod.families.links import Link
+from statsmodels.genmod.families import Family
@josef-pkt
josef-pkt Dec 19, 2013 Member

import should go on top

@josef-pkt josef-pkt commented on an outdated diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ ----------
+ nlevels : integer
+ The number of distinct categories for the multinomial
+ distribution.
+ """
+
+ self.ncut = nlevels - 1
+ self.link = MultinomialLogit(self.ncut)
+
+
+
+
+from statsmodels.discrete.discrete_margins import \
+ _get_margeff_exog, _get_const_index, _check_margeff_args, \
+ _effects_at, margeff_cov_with_se, _check_at_is_all, \
+ _transform_names
@josef-pkt
josef-pkt Dec 19, 2013 Member

imports should be on top, even if they belong just to this part

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ if dummy_idx is not None:
+ from statsmodels.discrete.discrete_margins import (
+ _get_dummy_effects)
+ margeff = _get_dummy_effects(margeff, exog, dummy_idx, transform,
+ self, params)
+ return margeff
+
+
+class GEEResults(base.LikelihoodModelResults):
+
+ def __init__(self, model, params, cov_params, scale):
+
+ super(GEEResults, self).__init__(model, params,
+ normalized_cov_params=cov_params, scale=scale)
+
+ def standard_errors(self, covariance_type="robust"):
@josef-pkt
josef-pkt Dec 19, 2013 Member

follow-up to PR: consistent naming for covariance options across models #1158

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ def __init__(self, model, params, cov_params, scale):
+
+ super(GEEResults, self).__init__(model, params,
+ normalized_cov_params=cov_params, scale=scale)
+
+ def standard_errors(self, covariance_type="robust"):
+ """
+ This is a convenience function that returns the standard
+ errors for any covariance type. The value of `bse` is the
+ standard errors for whichever covariance type is specified as
+ an argument to `fit` (defaults to "robust").
+
+ Arguments:
+ ----------
+ covariance_type : string
+ One of "robust", "naive", or "bias reduced". Determines
@josef-pkt
josef-pkt Dec 19, 2013 Member

"naive" versus "nonrobust" another naming convention that we need

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ return None
+
+ if self.constraint is not None:
+ beta, bcov = self._handle_constraint(beta, bcov)
+ if beta is None:
+ warnings.warn("Unable to estimate constrained GEE "
+ "parameters.", ConvergenceWarning)
+ return None
+
+ scale = self.estimate_scale()
+
+ results = GEEResults(self, beta, bcov / scale, scale)
+
+ results.fit_history = self.fit_history
+ results.naive_covariance = ncov
+ results.robust_covariance_bc = bc_cov
@josef-pkt
josef-pkt Dec 19, 2013 Member

do we need to calculate always all three cov, bncov and bc_cov

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ """
+
+ # Check covariance_type
+ covariance_type = covariance_type.lower()
+ allowed_covariances = ["robust", "naive", "bias reduced"]
+ if covariance_type not in allowed_covariances:
+ msg = "GEE: `covariance_type` must be one of " +\
+ ", ".join(allowed_covariances)
+ raise ValueError(msg)
+
+ if covariance_type == "robust":
+ return np.sqrt(np.diag(self.cov_params()))
+ elif covariance_type == "naive":
+ return np.sqrt(np.diag(self.naive_covariance))
+ elif covariance_type == "bias_reduced":
+ return np.sqrt(np.diag(self.robust_covariance_bc))
@josef-pkt
josef-pkt Dec 19, 2013 Member

my guess is that this doesn't get the wald_test to use the desired covariance.
cov_params should use the desired covariance_type
needs review and checking.

@josef-pkt
josef-pkt Dec 19, 2013 Member

Note: my changes to using default covariances as merged into master is more recent than this code, and we still need to review consistent handling across models.

@josef-pkt josef-pkt commented on the diff Dec 19, 2013
...dels/genmod/tests/gee_categorical_simulation_check.py
+
+ ns.endog_ex, ns.exog_ex, ns.exog_ne, ns.nlevel = \
+ gee_setup_nominal(data, 0, [3,])
+
+ ns.group_ex = ns.exog_ne[:,0]
+
+ va = GlobalOddsRatio(3, "nominal")
+
+ lhs = np.array([[0., 1., 1, 0],])
+ rhs = np.r_[0.,]
+
+ return ns, va, Multinomial(3), (lhs, rhs)
+
+
+if __name__ == '__main__':
+
@josef-pkt
josef-pkt Dec 19, 2013 Member

Is this part used or just for development?
I would like to empty all if __name__ == '__main__' code in the main modules. Usually I put them into an ex_modulname.py script in the examples folder inside the source tree.

@josef-pkt
josef-pkt Dec 19, 2013 Member

my mistake, this is in a simulation file, the main module doesn't have a main

@josef-pkt
Member

gee_gaussian_simulation_check.py is outdated dparams rename and uses standard_errors as attribute not callable.

@coveralls

Coverage Status

Coverage remained the same when pulling bcacbdd on gee_rebased into 01fd4a2 on master.

@josef-pkt
Member

predict requires offset

@coveralls

Coverage Status

Coverage remained the same when pulling e2f5191 on gee_rebased into 01fd4a2 on master.

kshedden and others added some commits Jun 30, 2013
@kshedden @josef-pkt kshedden Initial GEE commit d0b56ed
@kshedden @josef-pkt kshedden Add inverse_deriv e59bfbb
@kshedden @josef-pkt kshedden renamings; GEE is subclass of model a51fc8f
@kshedden @josef-pkt kshedden minor cleanup d46ff4b
@kshedden @josef-pkt kshedden removed some local debugging code c1e2321
@kshedden @josef-pkt kshedden Renaming of exog, exog_li, etc. 8259914
@kshedden @josef-pkt kshedden Comment out debugging code 99a0274
@kshedden @josef-pkt kshedden Modified summary 6941ae7
@kshedden @josef-pkt kshedden Added tests 8a88a7a
@kshedden @josef-pkt kshedden attempt to fix csv import 2fd2cbb
@kshedden @josef-pkt kshedden removed data files from genmod/tests/data, now in genmod/tests/results 8575972
@kshedden @josef-pkt kshedden Comment out debugging code 6e6673b
@kshedden @josef-pkt kshedden Added score test and constrained estimation to GEE 3d8a762
@kshedden @josef-pkt kshedden Fixed signature of estimate_scale a37a39d
@kshedden @josef-pkt kshedden Fixed error in summary method 9714839
@kshedden @josef-pkt kshedden Forgot to push some updated files 157e8a5
@kshedden @josef-pkt kshedden Standardized pandas/numpy conversions 4995f1a
@kshedden @josef-pkt kshedden Minor formatting and comment changes 7b8f795
@kshedden @josef-pkt kshedden Pulled ordinal-specific code out of GEE class into helper functions 10e04a9
@kshedden @josef-pkt kshedden Many lint fixes eb868dd
@kshedden @josef-pkt kshedden Many pylint fixes to varstrut.py 4431fe8
@kshedden @josef-pkt kshedden Switched scipy optimize functions 1e0fb18
@kshedden @josef-pkt kshedden Added simulation-based tests of nested and AR covariance structures 4188d60
@kshedden @josef-pkt kshedden added multinomial logit; eliminated cycles; other refactoring and cle…
…anup
ee43ad6
@kshedden @josef-pkt kshedden remove renamed file 85f0450
@kshedden @josef-pkt kshedden Refactored simulation check scripts 43f4211
@kshedden @josef-pkt kshedden Added Mancel DeRouen robust sandwich estimate of standard errors b5abff4
@kshedden @josef-pkt kshedden Added marginal effect estimates to GEE; seems to work but minimal tes…
…t coverage
abf1695
@kshedden @josef-pkt kshedden Set simulation tests to run under main 53a0819
@kshedden @josef-pkt kshedden Forgot to push this file e981c51
@kshedden @josef-pkt kshedden Improved reporting of convergence troubles 6c8f61f
@kshedden @josef-pkt kshedden Refactor ordinal/nominal setup code for ease of use f362b29
@kshedden @josef-pkt kshedden Refactor ordinal/nominal setup code for ease of use ca189c6
@kshedden @josef-pkt kshedden Add missing test dataset 546b1fa
@kshedden @josef-pkt kshedden Fixed error in naive covariance calculation. fa5bbba
@kshedden @josef-pkt kshedden reuse cholesky factors in fit cd59257
@kshedden @josef-pkt kshedden Use smaller test data sets 9a71987
@kshedden @josef-pkt kshedden Use smaller test data sets 2d4ba57
@kshedden @josef-pkt kshedden Add covariance type to summary 7b7ec14
@kshedden @josef-pkt kshedden Rename varstruct to covstruct, fix all errors in test_gee.py 7dcb3f8
@kshedden @josef-pkt kshedden Supplied block_diag function for sparse matrices to overcome scipy ve…
…rsion limitation.
0470264
@kshedden @josef-pkt kshedden Added covariance_type option to fit 941dedd
@kshedden @josef-pkt kshedden Attempt to fix python 3 failure in test suite dc98636
@kshedden @josef-pkt kshedden Delete varstruct.py
Renamed varstruct as covstruct.
35050f0
@kshedden @josef-pkt kshedden Improve tests 4f8a3bb
@kshedden @josef-pkt kshedden Improve tests bec0d3e
@kshedden @josef-pkt kshedden Renamed dparams to dep_params for clarity 78c1f4d
@vincentarelbundock @josef-pkt vincentarelbundock minor fix: josef comments on PR#928 af3af81
@vincentarelbundock @josef-pkt vincentarelbundock pep8 4023e33
@vincentarelbundock @josef-pkt vincentarelbundock TST: redundant imports and formula api d3a492b
@josef-pkt josef-pkt BUG/REF: missing import; refactoring victims in gee_gaussian_simulati…
…on_check
db350b7
@josef-pkt josef-pkt REF starting_params -> start_params (refactoring rename pydev) 7c43408
@josef-pkt josef-pkt REF: predict don't require offset, covstruct: don't print db63ac2
@josef-pkt josef-pkt DOC: add my try_gee script d78db20
@josef-pkt
Member

rebased and force pushed

@coveralls

Coverage Status

Coverage remained the same when pulling d78db20 on gee_rebased into 0b5ed74 on master.

@josef-pkt
Member

TravisCI came back green, merging

@kshedden Thanks a lot, that's the biggest new model in some time.
(Still some work left to do.)

@josef-pkt josef-pkt merged commit 5a731a3 into master Dec 19, 2013

1 check passed

default The Travis CI build passed
Details
@josef-pkt josef-pkt deleted the gee_rebased branch Dec 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment