Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ: get_prediction fails with regression models after calling remove_data #6887

Open
rrymer opened this issue Jul 17, 2020 · 3 comments · Fixed by #6888
Open

FAQ: get_prediction fails with regression models after calling remove_data #6887

rrymer opened this issue Jul 17, 2020 · 3 comments · Fixed by #6888

Comments

@rrymer
Copy link

rrymer commented Jul 17, 2020

Edit
get_prediction computes inferential statistics and will only work after remove_data if inferential attributes, specifically scale in this case, have been cached before data is removed. If cached attributes are accessed, e.g. by summary(), then they will in the cache and still be available after remove_data
see comment #6887 (comment) below.


Describe the bug

the conditional expression at L160 in v0.11 regression._prediction.py will cause an AttributeError with a simple model (exog dim=1) if remove_data has been called on the underlying model.

Code Sample, a copy-pastable example if possible

import numpy as np
import statsmodels.api as sm

# toy data
endog = [i + np.random.normal(scale=0.1) for i in range(100)]
exog = [i for i in range(100)]

# fit 
model = sm.OLS(endog, exog, weights=[1 for _ in range(100)]).fit()
model.summary() # R^2 ~= 1
# works fine
model.get_prediction(1).predicted_mean
model.get_prediction([1]).predicted_mean
model.get_prediction([[1]]).predicted_mean

model.remove_data()
# works fine
model.get_prediction(1).predicted_mean
model.get_prediction([[1]]).predicted_mean

# AttributeError
model.get_prediction([1]).predicted_mean

Note: As you can see, there are many issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates.

Note: Please be sure you are using the latest released version of statsmodels, or a recent build of master. If your problem has been fixed in an unreleased version, you might be able to use master until a new release occurs.

Note: If you are using a released version, have you verified that the bug exists in the master branch of this repository? It helps the limited resources if we know problems exist in the current master so that they do not need to check whether the code sample produces a bug in the next release.

If the issue has not been resolved, please file it in the issue tracker.

Expected Output

AttributeError: 'NoneType' object has no attribute 'ndim'

Output of import statsmodels.api as sm; sm.show_versions()

[paste the output of import statsmodels.api as sm; sm.show_versions() here below this line]
INSTALLED VERSIONS

Python: 3.6.10.final.0
OS: Linux 3.10.0-862.14.4.el7.YAHOO.20180927.19.x86_64 #1 SMP Thu Sep 27 18:19:45 UTC 2018 x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

statsmodels

Installed: 0.11.0 (/grid/6/tmp/yarn-local/usercache/rrymer/appcache/application_1594696455454_937110/container_e24_1594696455454_937110_01_000002/python36-dependencies.zip/statsmodels)

Required Dependencies

cython: Not installed
numpy: 1.16.4 (/grid/6/tmp/yarn-local/usercache/rrymer/appcache/application_1594696455454_937110/container_e24_1594696455454_937110_01_000002/python36-dependencies.zip/numpy)
scipy: 1.5.0 (/opt/python/lib/python3.6/site-packages/scipy)
pandas: 1.0.0 (/grid/6/tmp/yarn-local/usercache/rrymer/appcache/application_1594696455454_937110/container_e24_1594696455454_937110_01_000002/python36-dependencies.zip/pandas)
dateutil: 2.8.1 (/grid/6/tmp/yarn-local/usercache/rrymer/appcache/application_1594696455454_937110/container_e24_1594696455454_937110_01_000002/jup3.zip/dateutil)
patsy: 0.5.1 (/grid/6/tmp/yarn-local/usercache/rrymer/appcache/application_1594696455454_937110/container_e24_1594696455454_937110_01_000002/python36-dependencies.zip/patsy)

Optional Dependencies

matplotlib: 3.2.2 (/opt/python/lib/python3.6/site-packages/matplotlib)
backend: module://ipykernel.pylab.backend_inline
cvxopt: Not installed
joblib: 0.15.1 (/opt/python/lib/python3.6/site-packages/joblib)

Developer Tools

IPython: 6.5.0 (/grid/6/tmp/yarn-local/usercache/rrymer/appcache/application_1594696455454_937110/container_e24_1594696455454_937110_01_000002/jup3.zip/IPython)

bashtage added a commit to bashtage/statsmodels that referenced this issue Jul 17, 2020
Correct exog dimension when data has been removed

closes statsmodels#6887
bashtage added a commit to bashtage/statsmodels that referenced this issue Jul 17, 2020
Correct exog dimension when data has been removed

closes statsmodels#6887
bashtage added a commit to bashtage/statsmodels that referenced this issue Jul 17, 2020
Correct exog dimension when data has been removed

closes statsmodels#6887
@bashtage
Copy link
Member

bashtage commented Jul 17, 2020

Not sure what the right behavior here is. It happens to work after you remove data because you call get_prediction once before removing (or summary).

However, if you run

import numpy as np
import statsmodels.api as sm

# toy data
endog = [i + np.random.normal(scale=0.1) for i in range(100)]
exog = [i for i in range(100)]

# fit 
model = sm.OLS(endog, exog, weights=[1 for _ in range(100)]).fit()
model.remove_data()
# Broken now
model.get_prediction(1).predicted_mean
model.get_prediction([[1]]).predicted_mean
TypeError: unsupported operand type(s) for *: 'NoneType' and 'NoneType'

bashtage added a commit that referenced this issue Jul 17, 2020
BUG: Correct dimension when data removed
@bashtage bashtage added this to the 0.12 milestone Jul 27, 2020
ricardozago pushed a commit to ricardozago/statsmodels that referenced this issue Feb 7, 2021
Correct exog dimension when data has been removed

closes statsmodels#6887
@josef-pkt
Copy link
Member

josef-pkt commented Jul 20, 2021

This is not supposed to work. get_prediction adds inferential statistics.
Those are only available if summary or similar are called before remove_data.

predicted_mean itself does not use cov_params, but that is just a call to results.predict, that the user can do instead.

AFAICS, the failing attribute is a missing scale in the results class.

The fix for this was to keep wresid, wexog, wendog just to compute scale, which defeats the purpose of remove_data
#7494 (comment)

updated: If model.scale is called before remove data, then wendog, wexog are not needed for get_prediction([1]).predicted_mean after remove_data .

@josef-pkt
Copy link
Member

reopen as FAQ

@josef-pkt josef-pkt reopened this Aug 25, 2021
@josef-pkt josef-pkt changed the title get_prediction fails with regression models after calling remove_data and ndim=1 input FAQ: get_prediction fails with regression models after calling remove_data Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants