New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/ENH: Saving a model with remove_data=True
breaks .summary()
#9147
Comments
see for example #6887 you need to call This behavior is intentional. |
closing as duplicate there should be unit tests for this for all core models, but likely not for more recently added models. |
Thanks @josef-pkt for pointing that out. Unfortunately, that doesn't seem to work for the example above:
Same for:
|
Sounds like it needs another look. |
ok, that's a bug fails in computing pseudo_rsquared, which cannot be cached because it's a method with arguments, but it only needs to use cached attributes. Problem is cache after remove data
|
bug was introduced in #4421 llf, llnull are #4421 moved from a list for things to remove to removing things by default. maybe revert most of #4421 ? |
OLS summary after remove_data also fails, but it fails in the diagnostics, unit tests The original unit tests only checked that |
correction: I'm still not sure how this currently works. In OLS, there are no cached results statistics set to None. AFAICS: 4421 introduced behavior of remove data depending on isinstance of specific cache decorator. But I still don't see why OLS is not affected. It uses the same decorator update |
@RoelVerbelen What is your usecase for summary from pickled model/results? related issue for summary_col #7368 I think we can add to the unit tests to verify that summary() works after remove_data if summary has been called before. Related: given that it came up several times |
remove_data=True
breaks .summary()
remove_data=True
breaks .summary()
remove_data=True
breaks .summary()
remove_data=True
breaks .summary()
two points here
I will open a separate issue for the ENH so this issue is just the bug fix |
Hi @josef-pkt My motivation for using Within the model evaluation quarto reports, I'd like to print out the model summary as it contains a useful short summary of the model object and key statistics. The data sets I'm working with are very large, so I'm looking for ways to reduce the size of the statsmodels saved objects. Currently they contain a lot of duplication as the data needs to be saved down alongside it for the formula API to work. That's not ideal. FYI, in case you're interested, I have the same modelling pipeline set-up in R and there I've managed to strongly reduce the size of glms/gams (mgcv) by removing all "nobs arrays" and keeping only the first row of the model frame without affecting functionality.
|
It would be great if a model saved with the data removed would still be able to produce the summary statistics. Potentially it's worthwhile keeping the minimum amount of data in order to reproduce these stats? Especially when using the formula API in which the original data is still saved, see here and here.
Output:
The text was updated successfully, but these errors were encountered: