New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error with GEE on missing data. #1877
Comments
I think this is mainly a disagreement about who is doing the missing value handling. However, AFAICS, In this example we need to raise an exception that Note using pandas dropna to drop the missing observations from the entire dataframe including from df['grps'] works (but breaks in fit with numpy.linalg.linalg.LinAlgError: Singular matrix)
|
I see. It does seem like something could be done in when from_formula is used. model = sm.GEE.from_formula("status ~ fake", df, groups = df['grps'],
cov_struct=Independence(), family=sm.families.Binomial()) gives the same error. |
I will look into this soon. If I recall I previously checked for empty groups but took that out because I thought it was superfluous. |
@kshedden I don't think you have to fix empty groups unless we find an example for it. (I guess it is not possible to have them.) The "bug" is that this example doesn't raise an exception with incorrect The extension that we don't have yet in any model, is to disable patsy's missing value handling, or to cooperate with patsy's missing value handling. issue #805 |
I still haven't had a chance to look into this, but from_formula should work with missing data. I will figure out what is going on there... |
I think I have fixed this now. I was using groups instead of self.groups after the missing data was already processed upstream. This is fixed in my gee_sensitivity3 branch. I also added a test for this. All tests run clean locally, waiting for travis to return. However the example from @brentp will still fail with a linear algebra error, because after dropping the cases with missing values, the independent variable becomes constant. |
Travis returns green. |
My mistake. This is not going to work out for the time being. I was misled because it worked with an old version of Patsy, but that was a fluke. GEE called from the API should handle missing data properly, but GEE using formulas will not work with missing data until this issue is addressed upstream. |
reopen, this has to be fixed in base |
this should still get a closer look before a 0.6 release |
Closed by #2034. |
Self-contained script below. This is because the missing data removes entire groups and then it attempts to index an array with an empty array in
cluster_list
Here is the output + traceback:
The text was updated successfully, but these errors were encountered: