ENH: accept non-int definitions of cluster groups #1437
I need to look at this more carefully. The change looks innocent enough, but I would like to keep the group conversion in the wrapper code.
For the conversion to int, using
I think the np.unique in here is supposed to be optimized away, if we use the group utilities in the wrapper class. (is duplicate and only needed for small sample correction IIRC)
I didn't know about the optional arguments to np.unique, this is fixed in the last commit.
... but I'm sorry I wasn't able to follow the rest of your comment. If you expect some more changes from me in this branch, please point me at the meaning of "wrapper code".
My first version to include it in a model is in the linear regression Results
From a quick check, it seems that it might also assume integer in cluster option.
Similar code will be needed in several different model classes, but I haven't gotten around yet to see how this can be rewritten so it also works for other models, like the MLE models in discrete.
I'm still only partway to get the pieces to fit together https://github.com/josef-pkt/statsmodels/compare/REF_covtype_fit
What I wanted to say to this PR is: it looks fine, however eventually all this should go into the more general parts, so that the pure sandwich functions don't have to worry about any of this, and we don't need to have code duplication for this.
I'm a bit unclear, because it's not clear yet to me either.