Doc work GEE, GMM, sphinx warnings #1264

Merged
merged 4 commits into from Dec 24, 2013

Projects

None yet

3 participants

@josef-pkt
Member

add GEE to docs
improve GMM docs, add new classes

fix some spinx warnings
var_plots truncated year string
seealso VAR documentation gave invalid option block warning

I upgraded sphinx to version 1.2, I'm still not building notebooks and didn't check those.

GEE still needs proof reading and example in gee.rst
GMM and sandwich covariances still need improvements

@josef-pkt
Member

I would like to merge this soon, because I want to go back to sandwiches, docs and wrapper function, as far as I find time.

@coveralls

Coverage Status

Changes Unknown when pulling 011a445 on josef-pkt:doc_work into * on statsmodels:master*.

@vincentarelbundock vincentarelbundock commented on an outdated diff Dec 22, 2013
statsmodels/genmod/dependence_structures/covstruct.py
@@ -3,11 +3,11 @@
class CovStruct(object):
"""
- A base class for correlation and covariance structures of repeated
- measures data. Each implementation of this class takes the
- residuals from a regression model that has been fit to clustered
- data, and uses them to estimate the within-cluster variance and
- dependence structure of the model errors.
+ The base class for correlation and covariance structures of cluster data.
+
+ Each implementation of this class takes the residuals from a regression
+ model that has been fit to clustered data, and uses them to estimate the
@vincentarelbundock vincentarelbundock commented on the diff Dec 22, 2013
statsmodels/genmod/generalized_estimating_equations.py
@@ -909,6 +915,60 @@ def _derivative_exog(self, params, exog=None, transform='dydx',
class GEEResults(base.LikelihoodModelResults):
+ '''
+ Class to contain GEE results.
+
+ GEEResults inherits from statsmodels.LikelihoodModelResults
+
+ Parameters
+ ----------
+ See statsmodels.LikelihoodModelReesults
+
+ Returns
+ -------
+ **Attributes**
+
+ naive_covariance : ndarray
@vincentarelbundock
vincentarelbundock Dec 22, 2013 Member

Not sure if we have a standard for this, but it would make sense to me to reverse these: "covariance_naive", "covariance_robust". That way, they would all show up together when using tab completion (which usually prints in alphabetical order).

Also, can "robust_covariance_bc" just be called "robust_covariance"?

Edit: OK, I see that there's a choice below between "robust" and "robust bias reduced"

@josef-pkt
josef-pkt Dec 23, 2013 Member

I'll add this to the GEE-followup issue,
I also prefer post-fix qualifiers, covariance_xxx, resid_xxx
( how the robust cov are attached still needs to change to make t_test, wald_test work correctly)

@vincentarelbundock vincentarelbundock commented on an outdated diff Dec 22, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ robust_covariance_bc : ndarray
+ covariance of the parameter estimates that is robust and bias reduced
+ converged : bool
+ indicator for convergence of the optimization.
+ True if the norm of the score is smaller than a threshold
+ covariance_type : string
+ string indicating whether a "robust", "naive" or "robust bias reduced"
+ covariance is used as default
+ fit_history : dict
+ Contains information about the iterations. Its keys are `iterations`,
+ `deviance` and `params`.
+ fittedvalues : array
+ Linear predicted values for the fitted model.
+ dot(exog, params)
+ model : class instance
+ Pointer to GLM model instance that called fit.
@vincentarelbundock vincentarelbundock and 1 other commented on an outdated diff Dec 22, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ fittedvalues : array
+ Linear predicted values for the fitted model.
+ dot(exog, params)
+ model : class instance
+ Pointer to GLM model instance that called fit.
+ nobs : float
+ The number of observations n.
+ normalized_cov_params : array
+ See GEE docstring
+ params : array
+ The coefficients of the fitted model. Note that interpretation
+ of the coefficients often depends on the distribution family and the
+ data.
+ scale : float
+ The estimate of the scale / dispersion for the model fit.
+ See GLM.fit and GLM.estimate_scale for more information.
@josef-pkt
josef-pkt Dec 23, 2013 Member

estimate_scale in GEE has different signature and pattern than in GLM, and no "more information" in docstring.

@vincentarelbundock vincentarelbundock and 1 other commented on an outdated diff Dec 22, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ Pointer to GLM model instance that called fit.
+ nobs : float
+ The number of observations n.
+ normalized_cov_params : array
+ See GEE docstring
+ params : array
+ The coefficients of the fitted model. Note that interpretation
+ of the coefficients often depends on the distribution family and the
+ data.
+ scale : float
+ The estimate of the scale / dispersion for the model fit.
+ See GLM.fit and GLM.estimate_scale for more information.
+ score_norm : float
+ norm of the score at the end of the iterative estimation.
+ stand_errors : array
+ The standard errors of the fitted GLM. #TODO still named bse
@josef-pkt
josef-pkt Dec 23, 2013 Member

including stand_error here might be wrong, needs checking

@josef-pkt
josef-pkt Dec 23, 2013 Member

change to bse which is still the inherited attribute

@vincentarelbundock vincentarelbundock commented on the diff Dec 22, 2013
statsmodels/sandbox/regression/gmm.py
- For estimation with more options use fititer method.
+ TODO: weight and covariance arguments still need to be made consistent
@josef-pkt
josef-pkt Dec 23, 2013 Member

It's a warning that this will change, hopefully soon. based on changes to robust cov in RegressionResults

@vincentarelbundock vincentarelbundock commented on the diff Dec 22, 2013
statsmodels/sandbox/regression/gmm.py
- def fitgmm(self, start, weights=None, optim_method='bfgs', **kwds):
+ E( z * (y - x beta)) = 0
+
+ Where `y` is the dependent endogenous variable, `x` are the explanatory
+ variables and `z` are the instruments. Variables in `x` that are exogenous
+ need also be included in `z`.
+
+ Notation Warning: our name `exog` stands for the explanatory variables,
@josef-pkt
Member

Thanks Vincent, I will make the changes soon.

exog is not exogenous is really a bit awkward, Right now it's the most convenient interface (for writing the code).
I think we should be able to use formulas.
And for now I didn't like the distinction between dependent endogenous, explanatory endogenous, included exogenous and excluded exogenous, similar to ivregress, ivreg2, since the code so far doesn't need it, and GMM might never need it.

Is the current docstring roughly understandable? They still needs improvements in many places, but I'd like to work on doc examples next to get a better view on what's awkward to use.

@vincentarelbundock
Member

I think it sounds very good overall. I usually need to actually use something before I can tell for sure, but we can always improve them later on.

@josef-pkt
Member

I have two gists for GMM that I need to update,
https://gist.github.com/josef-pkt/6895915
https://gist.github.com/josef-pkt/6890383

I used them as initial examples for the rewrite and unit tests.

What I already started to try out examples that compare models, like
GEE gaussian independence cluster is the same as OLS with cluster robust (up to small sample scale factors)
the same should work for Poisson and the other models
GEE gaussian exchangeable should be the same as SUR in balanced panel. (not tried yet)

I couldn't compare GEE with GMM because I don't have cluster robust standard errors yet in GMM.

@vincentarelbundock
Member

The notebooks look nice. I could try to clean those up for you if you'd like.

Do you think we could use data from Rdatasets instead so we don't have to package more datasets with SM?

Tip: You can put equations inside dollar signs to have them render as latex in the notebooks. No need to indent or pretend they are code.

@josef-pkt
Member

I'd like to update the code in the notebooks first, so I can see what changes I made compared to the original version.

Datasets are still an open question, there are some in the unit tests of GEE and GMM, but we should get some official datasets also in statmodels, that we can use for the panel/cluster, Statas x... functions.

For other documentation we can use Rdatasets, and I also started to look at some textbook datasets like those of the two Wooldridge books. UCLA stats and Boston have worked out examples for those and I started some preliminary work to "replicate".

I definitely need lots of tips for notebooks since I still don't use them very often.

@josef-pkt josef-pkt referenced this pull request Dec 23, 2013
Open

SUMM: GEE followup #1257

@josef-pkt josef-pkt commented on an outdated diff Dec 23, 2013
statsmodels/genmod/generalized_estimating_equations.py
+ covariance of the parameter estimates that is robust and bias reduced
+ converged : bool
+ indicator for convergence of the optimization.
+ True if the norm of the score is smaller than a threshold
+ covariance_type : string
+ string indicating whether a "robust", "naive" or "robust bias reduced"
+ covariance is used as default
+ fit_history : dict
+ Contains information about the iterations. Its keys are `iterations`,
+ `deviance` and `params`.
+ fittedvalues : array
+ Linear predicted values for the fitted model.
+ dot(exog, params)
+ model : class instance
+ Pointer to GLM model instance that called fit.
+ nobs : float
@josef-pkt
josef-pkt Dec 23, 2013 Member

nobs is not available as attribute of results

@josef-pkt
Member

made changes from Vincent's review, except naming convention (needs follow-up PR) and left TODO in gmm (needs followup code changes)

@josef-pkt josef-pkt merged commit 8372093 into statsmodels:master Dec 24, 2013
@josef-pkt josef-pkt deleted the josef-pkt:doc_work branch Jul 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment