Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] MAINT Replace manual checks with check_is_fitted #13013

Merged
merged 36 commits into from May 7, 2019
Merged

[MRG+1] MAINT Replace manual checks with check_is_fitted #13013

merged 36 commits into from May 7, 2019

Conversation

@agamemnonc
Copy link
Contributor

@agamemnonc agamemnonc commented Jan 18, 2019

Reference Issues/PRs

Fixes #12991.

What does this implement/fix? Explain your changes.

Replaces manual checks with check_is_fitted utility function in various places.

Any other comments?

All modified files have been checked with flake8 and autopep8 and any formatting issues have been addressed.

sklearn/decomposition/online_lda.py Outdated Show resolved Hide resolved
Loading
@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Jan 18, 2019

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

Loading

@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented Jan 21, 2019

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

Thanks. Indeed, as per the description above, I addressed all formatting issues in the modified files so that autopep8/flake8 checks passed before submitting the PR.

Loading

Copy link
Member

@jnothman jnothman left a comment

The core parts of this PR seem okay, when I can find them

Loading

sklearn/ensemble/forest.py Outdated Show resolved Hide resolved
Loading
sklearn/ensemble/forest.py Outdated Show resolved Hide resolved
Loading
sklearn/exceptions.py Show resolved Hide resolved
Loading
@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented Jan 21, 2019

The core parts of this PR seem okay, when I can find them

Thanks for reviewing,

OK, I could revert the formatting changes if that would be preferred (your first two comments above)?
Regarding the modification in exceptions.py please see my response above.

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 28, 2019

@agamemnonc Could you revert the style changes. I can make a review then.

Loading

@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented Jan 29, 2019

@agamemnonc Could you revert the style changes. I can make a review then.

OK, done; @glemaitre please review.

Of course, there are now some flake8 warnings on the modified files (mostly due to long lines).

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 29, 2019

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.
Unfitted estimators should raise either AttributeError or ValueError.
The specific exception type NotFittedError inherits from both and can
therefore be adequately raised for that purpose.
"""
# Common test for Regressors, Classifiers and Outlier detection estimators
X, y = _boston_subset()
estimator = clone(estimator_orig)
msg = "fit"
if hasattr(estimator, 'predict'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict, X)
if hasattr(estimator, 'decision_function'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.decision_function, X)
if hasattr(estimator, 'predict_proba'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict_proba, X)
if hasattr(estimator, 'predict_log_proba'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict_log_proba, X)

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 29, 2019

@jnothman Do you know why we were checking AttributeError and ValueError instead of directly NotFittedError?

Loading

Copy link
Contributor

@glemaitre glemaitre left a comment

I will check if we don't have redundant tests but you can already address those comments.

Loading

sklearn/cluster/birch.py Outdated Show resolved Hide resolved
Loading
sklearn/decomposition/online_lda.py Outdated Show resolved Hide resolved
Loading
sklearn/decomposition/tests/test_online_lda.py Outdated Show resolved Hide resolved
Loading
sklearn/decomposition/tests/test_online_lda.py Outdated Show resolved Hide resolved
Loading
sklearn/exceptions.py Outdated Show resolved Hide resolved
Loading
sklearn/exceptions.py Outdated Show resolved Hide resolved
Loading
sklearn/linear_model/logistic.py Outdated Show resolved Hide resolved
Loading
@glemaitre glemaitre self-requested a review Jan 29, 2019
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 29, 2019

Regarding the tests, I would propose to change the following:

def check_unfitted_feature_importances(name):
assert_raises(ValueError, getattr, FOREST_ESTIMATORS[name](random_state=0),
"feature_importances_")
@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
check_unfitted_feature_importances(name)

@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
    err_msg = ('This {} instance is not fitted yet. Call 'fit' with appropriate '
               'arguments before using this method.'.format(name))
    pytest.raises(NotFittedError, match=err_msg):
         gettattr(FOREST_ESTIMATORS[name](), 'feature_importances')

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jan 29, 2019

Even if we don't touch a public API, I would add an entry in the what's new since we modified the tests and error message.

Loading

@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented Jan 30, 2019

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 1586 to 1616 in fdf2f38

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.

 Unfitted estimators should raise either AttributeError or ValueError. 
 The specific exception type NotFittedError inherits from both and can 
 therefore be adequately raised for that purpose. 
 """ 

 # Common test for Regressors, Classifiers and Outlier detection estimators 
 X, y = _boston_subset() 

 estimator = clone(estimator_orig) 

 msg = "fit" 

 if hasattr(estimator, 'predict'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict, X) 

 if hasattr(estimator, 'decision_function'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.decision_function, X) 

 if hasattr(estimator, 'predict_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_proba, X) 

 if hasattr(estimator, 'predict_log_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_log_proba, X) 

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

This is now fixed. thanks!

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Feb 7, 2019

You can use getattr(estimator, method, None) is not None: to get the default to None.
It avoids the try ... except

Loading

@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented Feb 8, 2019

Yes, that's what the code currently looks like:

msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
       "arguments".format(estimator.__class__.__name__))

for method in ('decision_function', 'predict', 'predict_proba',
               'predict_log_proba'):
    if getattr(estimator, method, None) is not None:
        assert_raises_regex(NotFittedError, msg,
                            getattr(estimator, method), X)

The problem is I am not too sure how to include the deprecation here, i.e. cover the case where an Attribute or ValueError is raised instead with the previous error message ("fit"), in order to allow that and issue a DeprecationWarning.

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Feb 8, 2019

Oh I see why you wanted a try except then. Basically use pytest.raises and check https://docs.pytest.org/en/latest/assert.html

I think something around:

with pytest.raises(NotFittedError) as excinfo:
    getattr(estimator, method), X)

if not str(excinfo.value) in msg and 'fit' in str.excinfo.value):
    # raise deprecation waring
else:
    assert 'fit' in str.excinfo.value

I wrote this pretty quickly. That might be buggy

Loading

@jnothman
Copy link
Member

@jnothman jnothman commented Mar 12, 2019

Please resolve conflicts with master.

Loading

Copy link
Member

@jnothman jnothman left a comment

I appreciate the consistent use of check_is_fitted within the library, but I'm not entirely sure we should be forcing all library developers to use the exact same error message. The default message does not, for instance, mention partial_fit.

Loading

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
Loading
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
Loading
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
Loading
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
Loading
@jnothman
Copy link
Member

@jnothman jnothman commented May 1, 2019

When you get around to resolving my comments, please also move your change log entry to v0.22.rst as version 0.21 has been released.

Loading

@glemaitre glemaitre self-assigned this May 2, 2019
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented May 2, 2019

@jnothman GaussianProcessRegressor can work without calling fit. I added a tag requires_fit but I am not sure if it would be something that we want.

WDYT?

Loading

@jnothman
Copy link
Member

@jnothman jnothman commented May 2, 2019

Loading

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
Loading
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented May 3, 2019

I think we have explicitly tried to support GPs without fit. Otherwise you upset all the bayesians.

And do you think that having the tag requires_fit (default to True) is a good idea?

Loading

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented May 3, 2019

@jnothman could you have another look. Apart of the tag I think that the PR is good to be merged.

Loading

Copy link
Member

@NicolasHug NicolasHug left a comment

A few niticks.

I think the introduction of the estimator tag is appropriate here.

I'm not pressing "Approve" because TBH I'm not entirely sure what should be done regarding the deprecation, but I'm tending towards LGTM.

Loading

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
Loading
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
Loading
doc/developers/contributing.rst Outdated Show resolved Hide resolved
Loading
doc/developers/contributing.rst Outdated Show resolved Hide resolved
Loading
@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented May 6, 2019

Thanks @NicolasHug for reviewing.

I think I have now addressed the issues you raised, otherwise please let me know.

Loading

Copy link
Member

@jnothman jnothman left a comment

Thanks @agamemnonc!

Loading

@jnothman jnothman merged commit 19192c0 into scikit-learn:master May 7, 2019
16 of 17 checks passed
Loading
@agamemnonc agamemnonc deleted the check_is_fitted_replacements branch May 7, 2019
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented May 7, 2019

Thanks @agamemnonc

Loading

@agamemnonc
Copy link
Contributor Author

@agamemnonc agamemnonc commented May 7, 2019

Thanks @agamemnonc

Thank you @glemaitre for your input and everyone else for reviewing.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants