Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robust covariance, cov_type in fit #1870

Merged
merged 13 commits into from Aug 21, 2014

Conversation

Projects
None yet
2 participants
@josef-pkt
Copy link
Member

commented Aug 4, 2014

integrating robust covariances into the model fit method, see issue #1418
summary issue for robust covariances is #1158

continuing after PR #1867
now starting with discrete models to build the generic version

two possible problems using a standalone function, e.g.
get_robustcov_results(res_olsg_._results, cov_type='HC1', use_self=True)

  • cache already has content that doesn't get overwritten
  • we cannot call the standalone function with the wrapper. We need access and change the underlying Results instance.

use_self=True is only intended for use in a results.init when we can make sure we haven't used the cache yet, and internally we only have the results instance and not the wrapper.

@coveralls

This comment has been minimized.

Copy link

commented Aug 4, 2014

Coverage Status

Coverage decreased (-0.05%) when pulling e1a1917 on josef-pkt:ENH_covtype_discrete into 8709f00 on statsmodels:master.

@josef-pkt josef-pkt added this to the 0.6 milestone Aug 4, 2014

@coveralls

This comment has been minimized.

Copy link

commented Aug 5, 2014

Coverage Status

Coverage decreased (-0.04%) when pulling e2659fa on josef-pkt:ENH_covtype_discrete into 8709f00 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Aug 5, 2014

Coverage Status

Coverage decreased (-0.02%) when pulling 4741cac on josef-pkt:ENH_covtype_discrete into 8709f00 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Aug 7, 2014

Coverage Status

Coverage increased (+0.0%) when pulling 3e042b4 on josef-pkt:ENH_covtype_discrete into 8709f00 on statsmodels:master.

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Aug 7, 2014

some problems in the design:
commit: REF/ENH: add get_robustcov_results generically to LikelihoodModel/Results 3e042b4

Since we are changing some attributes in the results instance, we need to do it at the right time.

  • transform parameters in fit: Negative Binomial (and some others) change the transform_params during fit. That means that the super class fit only sees the transformed parameter version (of loglike, score/score_obs and hessian). Calculating robust covariance matrices based on transformed parameters is incorrect for the final estimate of untransformed parameters.

example NegativeBinomial nb uses log(alpha) internally but reports alpha. (This is different from models were the original parameterization is in transformed params, e.g. the link functions)

  • current implementation in the last commit, allows for different ways and different locations for adding the robust covariances.
    In LikelihoodModel and Results it is hidden behind kwargs, so that subclasses can intercept cov_type before calling the super methods.
  • use_t there were some changes where use_t is specified and added to the model. #1830 added it to the LikelihoodModel.
    TODO: currently it use_t is handled at different places and is duplicated or repeated several times.
  • new generic method of LikelihoodModel results: _get_robustcov_results. This should allow more generic code that can be overwritten in subclasses if there are specific cases of robust covariances. for example overdispersion in Poisson.
@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2014

possible bug in code so far: how do we handle scale if it separately estimated.

In a new set of test I'm comparing more GLM models to equivalent other models.
For GLM Gaussian compared to OLS the bse have the wrong scale in the current default.
I get correct answers if I set scale=1 in GLM.score_obs.

Current test cases for MLE robust covariances have scale=1, Poisson, NegativeBinomial, and GLM also agrees with Logit.

Difference between estimating equations with a canceled multiplicative term versus score/score_obs which should be the derivative of the full loglikelihood function.

TODO: add score_obs and Hessian (correctly scaled) to RegressionModels, OLS,...
get NormalMLE model back - where did I park my initial version?

@coveralls

This comment has been minimized.

Copy link

commented Aug 8, 2014

Coverage Status

Coverage decreased (-0.0%) when pulling a939030 on josef-pkt:ENH_covtype_discrete into 8709f00 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Aug 9, 2014

Coverage Status

Coverage increased (+0.0%) when pulling 389ee4d on josef-pkt:ENH_covtype_discrete into 8709f00 on statsmodels:master.

@josef-pkt josef-pkt force-pushed the josef-pkt:ENH_covtype_discrete branch from 389ee4d to 0bcc690 Aug 21, 2014

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2014

rebased and force pushed.

comment to earlier comment:

possible bug in code so far: how do we handle scale if it separately estimated.

scale wasn't a problem in the code when comparing GLM to OLS,
there was a problem somewhere else that got me confused (accessed the wrong array).

Non-native scale will or might still be a problem in overdispersed Poisson, or when scale is fixed, but not here.

Aside: score and Hessian of OLS have a division by the scale/sigma**2.
GLM uses the division by scale. For OLS, scale drops out of sandwich calculation for OLS and is not included.

@coveralls

This comment has been minimized.

Copy link

commented Aug 21, 2014

Coverage Status

Coverage increased (+0.01%) when pulling 0bcc690 on josef-pkt:ENH_covtype_discrete into ab1a57b on statsmodels:master.

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2014

TravisCI is green, I will merge this tomorrow.

(work on documentation needs to wait.)



def get_robustcov_results(self, cov_type='HC1', use_t=None, **kwds):
"""create new results instance with robust covariance as default

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

add use_self explicitly as keyword
docstring doesn't mention, attach to existing instance

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

need to make docstring into template that can be copied to model methods

the type of robust sandwich estimator to use. see Notes below
use_t : bool
If true, then the t distribution is used for inference.
If false, then the normal distribution is used.

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

use_t is now separate keyword in fit.
remove it from here ?

Returns
-------
results : results instance
This method creates a new results instance with the requested

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

outdated, prefered is use_self

needs to be in [False, 'hac', 'cluster']
TODO: Currently there is no check for extra or misspelled keywords,
except in the case of cov_type `HCx`

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

needs to be fixed

res.cov_params_default = sw.cov_white_simple(self,
use_correction=False)
elif cov_type == 'HAC':
maxlags = kwds['maxlags'] # required?, default in cov_hac_simple

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

do we use default for maxlags in HAC?
can be changed in a backwards compatible way.

kwds = {}
if 'use_t' in kwargs:
kwds['use_t'] = kwargs['use_t']
#prints for debugging

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

these kwargs are additional to any optimizer kwargs.
Not enough unit tests to see whether this works for all combinations.
refactor optimization options to optim_kwargs (later) see GMM

correction so that the robust covariance matrices match those of Stata in
some models like GLM and discrete Models.
The following covariance types and required or optional arguments are

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

TODO: add nonrobust handling in here

self.cov_type = 'nonrobust'
self.cov_kwds = {'description' : 'Standard Errors assume that the ' +
'covariance matrix of the errors is correctly ' +
'specified.'}

This comment has been minimized.

Copy link
@josef-pkt

josef-pkt Aug 21, 2014

Author Member

nonrobust doesn't define cov_params_default
move to get_robustcov_results or not.
OLS AFAIR the nonrobust case doesn't change the current calls and code path in OLS, except for adding this information.

@josef-pkt

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2014

merging, see followup in #1922

josef-pkt added a commit that referenced this pull request Aug 21, 2014

Merge pull request #1870 from josef-pkt/ENH_covtype_discrete
ENH: robust covariance, cov_type in fit, in base, discrete and GLM

@josef-pkt josef-pkt merged commit e373b72 into statsmodels:master Aug 21, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci The Travis CI build passed
Details

@josef-pkt josef-pkt deleted the josef-pkt:ENH_covtype_discrete branch Aug 21, 2014

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1870 from josef-pkt/ENH_covtype_discrete
ENH: robust covariance, cov_type in fit, in base, discrete and GLM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.