Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Correct dimension when data removed #6888

Merged
merged 2 commits into from
Jul 17, 2020
Merged

Conversation

bashtage
Copy link
Member

Correct exog dimension when data has been removed

closes #6887

Notes:

  • It is essential that you add a test when making code changes. Tests are not
    needed for doc changes.
  • When adding a new function, test values should usually be verified in another package (e.g., R/SAS/Stata).
  • When fixing a bug, you must add a test that would produce the bug in master and
    then show that it is fixed with the new code.
  • New code additions must be well formatted. Changes should pass flake8. If on Linux or OSX, you can
    verify you changes are well formatted by running
    git diff upstream/master -u -- "*.py" | flake8 --diff --isolated
    
    assuming flake8 is installed. This command is also available on Windows
    using the Windows System for Linux once flake8 is installed in the
    local Linux environment. While passing this test is not required, it is good practice and it help
    improve code quality in statsmodels.
  • Docstring additions must render correctly, including escapes and LaTeX.

Correct exog dimension when data has been removed

closes statsmodels#6887
@coveralls
Copy link

coveralls commented Jul 17, 2020

Coverage Status

Coverage increased (+0.02%) to 88.043% when pulling 0c2887a on bashtage:gh-6887 into 3614385 on statsmodels:master.

@@ -157,7 +157,8 @@ def get_prediction(self, exog=None, transform=True, weights=None,
row_labels = None

exog = np.asarray(exog)
if exog.ndim == 1 and (self.model.exog.ndim == 1 or
if exog.ndim == 1 and (self.model.exog is None or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work if exog is None,
we need to check if exog had only 1 column

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire remove_data and still predict is fundamentally broken since this data isn't available when data has been removed. I think probably must raise in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need model.exog for predict
remove_data was designed for predict without carrying around all the original data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that it is needed to handle edge cases like this. Once it has been removed the correct decision cannot be reasoned here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned on the mailing list, all we need to know is whether there is only 1 exog variable or more than 1.
It would be df_model except that doesn't count the const. In models without extra params, len(self.params) is sufficient

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now using params since it needs to go through np.dot

Fix prediction by retaining required attributes
Make private attribute private
@bashtage bashtage merged commit 70058d8 into statsmodels:master Jul 17, 2020
@bashtage bashtage deleted the gh-6887 branch July 17, 2020 15:49
@bashtage bashtage added this to the 0.12 milestone Jul 27, 2020
# GH6887
endog = [i + np.random.normal(scale=0.1) for i in range(100)]
exog = [i for i in range(100)]
model = OLS(endog, exog, weights=[1 for _ in range(100)]).fit()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLS shouldn't have weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FAQ: get_prediction fails with regression models after calling remove_data
3 participants