I don't know how sandwich covariance matrices, that require arguments should be attached, relevant for cluster/groups and panel/groups.
It would be good to save the covariance for further use for bse calculation, summary(), tests.
Users might call the cov_cluster with different group definitions.
disallow redefining groups: groups is an attribute of data, for example res.set_groups(groups) then it's just a cached property
save last call only: store last result, if user calls cov_cluster again with different groups then we reset the cached version
memoize: cache with argument (groups) as a key.
looks to messy, because we wouldn't know what to use for other calculations - out
has the problem that the user can reset the groups. If we have a setter method, then we can force recalculation when the groups are reset (empty the cache). This would have the same behavior as 2) but without an argument to the method (would be just cached attribute)
-> I'm currently leaning towards this with set_options(use_cov=xxx) from below
related question: arguments everywhere or letting user set a default?
then in the call to t_test, f_test and summary we can use the cov_hac instead of the standard OLS cov_params
One possible problem would be when we return only numbers because then the user has to remember which cov is used. That's not a problem for t_test f_test and summary, because we can add the type ('hac', 'HC0', ...) to the return. But tvalues, pvalues, ... would depend on the use_cov setting, without reminding the user.
alternative: user has to specify which cov should be used as argument in t_test, f_test and summary
standard usage in most cases will have fixed group, so resetting groups will not happen very often, I expect.
I can't think of a case where I'd want to change the groups. Can you give an example? I was playing with gretl today and I noticed they have a global option for setting HAC covariance everywhere, but they also let you set them when you do the estimation.
The main one I can think of are nested clusters:
firm level time series, do we cluster by firm, industry or geography?
school level time series, do we cluster by school or type of school or district?
cross country time series, do we cluster by country or by development level?
for short panels, time should also be treated as a group (unweighted aggregate instead of bartlett kernel): in this case time and cross-section would form two groups. It's not clear, whether a user wants to use both or either one.
Stata requires vcv to be specified with regress, and uses globals (last estimate) and memoizes for repeated calls to regress with same model. AFAIU, reading slowly through the User's Guide.
(I don't know how much Stata recalculates if you call regress repeatedly with different vcv options)
Being able to set the cov options only once (for a result instance) would be convenient, for example Stock/Watson textbook uses HC by default, unless otherwise specified.
One argument in favor of setting a result option in this case is that most users that use specific robust standard errors will be aware of and used to it, so I guess they won't get confused if
changes many of their later results.
I'm closing this. I have settled on creating new instances in the "official" usage.
cov_type is chosen in fit method, or could be chosen by get_robust_cov_results from an already existing instance.
The latter is not yet implemented for the models except OLS because it requires automatic results creation (cloning with adjustments), which is not possible for most models.