Changing covariance_type on results #1760

Closed
kshedden opened this Issue Jun 15, 2014 · 8 comments

Projects

None yet

2 participants

@kshedden
Contributor

The summary method of the GEE results object has a keyword argument called 'covariance_type'. The intention was to allow summary tables using different covariance types to be obtained without refitting the model.

However this turns out not to work. Whichever covariance_type is specified in the first call to summary determines the covariance type for all subsequent calls, even if a different covariance_type is subsequently specified as an argument.

The reason seems to be that the summary method automatically calls bse when building the table, which caches the result from the first call.

Any ideas for a work-around?

@josef-pkt
Member

the backdoor is to reset the cache, but that only creates problems because it's difficult to keep track of what to reset and we would need a reset method.
Any method that needs some of the cached attributes or the user accessing results.bse will trigger the caching.
(I wanted to add eg. also wald_test or t_test, but that actually doesn't access bse or similar, it's calculating everything from cov_params. My guess is that we don't want summary to calculate everything from scratch.)

The "cleanest" solution I came up with is to create a new results instance
http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html
if we want to change the cov_type without re-estimating the model.

@josef-pkt
Member

a partial substitute for summary
calling t_test(np.eye(len(params)), cov_p=my_robust_cov)
also produces a summary table using the given cov_p

@josef-pkt
Member

Followup: we could calculate and swap the params_table in summary() for a given covariance matrix cov_p.
However, in other models with the generic robust covariances cov_type wouldn't be a good argument for summary() because the robust covariances are computed on demand (lazily) and not pre-computed (*), and require a list of options or additional arguments (small sample correction, type of correlation, group indices, time indices, ...)

(*) however they are currently precomputed in some models in a similar way to GEE, IIRC

@kshedden
Contributor

I asked because I am working on this for Cox models with dependent events.
There, I wait to compute the robust covariance until it is needed. Then,
I just want to flip a switch that controls what gets printed in the summary
table. But the table is hard-wired to call bse which caches the result.
If the user asked for the naive covariance first, I can't get the summary
table to switch to the robust version.

I think it is summary2.summary_params that builds the table, but I haven't
looked into that code yet to see how it works.

On Sun, Jun 15, 2014 at 9:21 PM, Josef Perktold notifications@github.com
wrote:

Followup: we could calculate and swap the params_table in summary() for a
given covariance matrix cov_p.
However, in other models with the generic robust covariances cov_type
wouldn't be a good argument for summary() because the robust covariances
are computed on demand (lazily) and not pre-computed (*), and require a
list of options or additional arguments (small sample correction, type of
correlation, group indices, time indices, ...)

(*) however they are currently precomputed in some models in a similar way
to GEE, IIRC


Reply to this email directly or view it on GitHub
#1760 (comment)
.

@josef-pkt
Member

The pattern that we agreed upon (kind of) is to put the cov_type into the model.fit method, and have one cov_params_default for each results instance.
In that case summary doesn't need to access the bse for a "foreign" covariance.

Did you try whether you can add a method like the RegressionResults get_robustcov_results that creates a new results instance?

If the different covariances are not precomputed, then there is not much to gain computationally by avoiding a new results instance. (and it makes the code structure simpler and easier to keep track of.)

@josef-pkt
Member

There are still some loose ends in the design for and integration of robust covariances. I got stuck in my PR when I couldn't make up my mind how to handle small sample corrections for inference (t_test, ...).
My unittests started to fail, because I didn't _cache something before changing the definition of an attribute. (In my case df_resid has two meanings/usages: for the tests we need small sample correction, degrees of freedom for t and F distribution, for the variance calculation we use the standard nobs - k_params.)

@kshedden
Contributor

I will do it this way for now, should work.

On Sun, Jun 15, 2014 at 10:11 PM, Josef Perktold notifications@github.com
wrote:

The pattern that we agreed upon (kind of) is to put the cov_type into the
model.fit method, and have one cov_params_default for each results
instance.
In that case summary doesn't need to access the bse for a "foreign"
covariance.

Did you try whether you can add a method like the RegressionResults
get_robustcov_results that creates a new results instance?

If the different covariances are not precomputed, then there is not much
to gain computationally by avoiding a new results instance. (and it makes
the code structure simpler and easier to keep track of.)


Reply to this email directly or view it on GitHub
#1760 (comment)
.

@josef-pkt josef-pkt added this to the 0.6 milestone Aug 23, 2014
@josef-pkt
Member

I didn't remember this, should have gotten prio-high.

That's exactly the issue that got me started in #1906
Fix/refactoring was merged in #1924

What we don't have yet is a simple way to recreate a new results instance without calling fit again.
(Hint: t_test creates the params_table and takes a user given covariance matrix.)

@josef-pkt josef-pkt closed this Aug 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment