Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] MAINT Replace manual checks with `check_is_fitted` #13013

Merged
merged 36 commits into from May 7, 2019

Conversation

@agamemnonc
Copy link
Contributor

commented Jan 18, 2019

Reference Issues/PRs

Fixes #12991.

What does this implement/fix? Explain your changes.

Replaces manual checks with check_is_fitted utility function in various places.

Any other comments?

All modified files have been checked with flake8 and autopep8 and any formatting issues have been addressed.

@adrinjalali

This comment has been minimized.

Copy link
Member

commented Jan 18, 2019

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

agamemnonc added 2 commits Jan 18, 2019
@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented Jan 21, 2019

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

Thanks. Indeed, as per the description above, I addressed all formatting issues in the modified files so that autopep8/flake8 checks passed before submitting the PR.

@jnothman
Copy link
Member

left a comment

The core parts of this PR seem okay, when I can find them

sklearn/ensemble/forest.py Outdated Show resolved Hide resolved
sklearn/ensemble/forest.py Outdated Show resolved Hide resolved
sklearn/exceptions.py Show resolved Hide resolved
@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented Jan 21, 2019

The core parts of this PR seem okay, when I can find them

Thanks for reviewing,

OK, I could revert the formatting changes if that would be preferred (your first two comments above)?
Regarding the modification in exceptions.py please see my response above.

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Jan 28, 2019

@agamemnonc Could you revert the style changes. I can make a review then.

agamemnonc added 2 commits Jan 29, 2019
@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2019

@agamemnonc Could you revert the style changes. I can make a review then.

OK, done; @glemaitre please review.

Of course, there are now some flake8 warnings on the modified files (mostly due to long lines).

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.
Unfitted estimators should raise either AttributeError or ValueError.
The specific exception type NotFittedError inherits from both and can
therefore be adequately raised for that purpose.
"""
# Common test for Regressors, Classifiers and Outlier detection estimators
X, y = _boston_subset()
estimator = clone(estimator_orig)
msg = "fit"
if hasattr(estimator, 'predict'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict, X)
if hasattr(estimator, 'decision_function'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.decision_function, X)
if hasattr(estimator, 'predict_proba'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict_proba, X)
if hasattr(estimator, 'predict_log_proba'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict_log_proba, X)

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)
@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

@jnothman Do you know why we were checking AttributeError and ValueError instead of directly NotFittedError?

@glemaitre
Copy link
Contributor

left a comment

I will check if we don't have redundant tests but you can already address those comments.

sklearn/cluster/birch.py Outdated Show resolved Hide resolved
sklearn/decomposition/online_lda.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_online_lda.py Outdated Show resolved Hide resolved
sklearn/decomposition/tests/test_online_lda.py Outdated Show resolved Hide resolved
sklearn/exceptions.py Outdated Show resolved Hide resolved
sklearn/exceptions.py Outdated Show resolved Hide resolved
sklearn/linear_model/logistic.py Outdated Show resolved Hide resolved

@glemaitre glemaitre self-requested a review Jan 29, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Regarding the tests, I would propose to change the following:

def check_unfitted_feature_importances(name):
assert_raises(ValueError, getattr, FOREST_ESTIMATORS[name](random_state=0),
"feature_importances_")
@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
check_unfitted_feature_importances(name)

@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
    err_msg = ('This {} instance is not fitted yet. Call 'fit' with appropriate '
               'arguments before using this method.'.format(name))
    pytest.raises(NotFittedError, match=err_msg):
         gettattr(FOREST_ESTIMATORS[name](), 'feature_importances')
@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Even if we don't touch a public API, I would add an entry in the what's new since we modified the tests and error message.

@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2019

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 1586 to 1616 in fdf2f38

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.

 Unfitted estimators should raise either AttributeError or ValueError. 
 The specific exception type NotFittedError inherits from both and can 
 therefore be adequately raised for that purpose. 
 """ 

 # Common test for Regressors, Classifiers and Outlier detection estimators 
 X, y = _boston_subset() 

 estimator = clone(estimator_orig) 

 msg = "fit" 

 if hasattr(estimator, 'predict'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict, X) 

 if hasattr(estimator, 'decision_function'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.decision_function, X) 

 if hasattr(estimator, 'predict_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_proba, X) 

 if hasattr(estimator, 'predict_log_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_log_proba, X) 

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

This is now fixed. thanks!

@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented Feb 8, 2019

Yes, that's what the code currently looks like:

msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
       "arguments".format(estimator.__class__.__name__))

for method in ('decision_function', 'predict', 'predict_proba',
               'predict_log_proba'):
    if getattr(estimator, method, None) is not None:
        assert_raises_regex(NotFittedError, msg,
                            getattr(estimator, method), X)

The problem is I am not too sure how to include the deprecation here, i.e. cover the case where an Attribute or ValueError is raised instead with the previous error message ("fit"), in order to allow that and issue a DeprecationWarning.

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2019

Oh I see why you wanted a try except then. Basically use pytest.raises and check https://docs.pytest.org/en/latest/assert.html

I think something around:

with pytest.raises(NotFittedError) as excinfo:
    getattr(estimator, method), X)

if not str(excinfo.value) in msg and 'fit' in str.excinfo.value):
    # raise deprecation waring
else:
    assert 'fit' in str.excinfo.value

I wrote this pretty quickly. That might be buggy

@jnothman

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

Please resolve conflicts with master.

@jnothman
Copy link
Member

left a comment

I appreciate the consistent use of check_is_fitted within the library, but I'm not entirely sure we should be forcing all library developers to use the exact same error message. The default message does not, for instance, mention partial_fit.

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved
@jnothman

This comment has been minimized.

Copy link
Member

commented May 1, 2019

When you get around to resolving my comments, please also move your change log entry to v0.22.rst as version 0.21 has been released.

@glemaitre glemaitre self-assigned this May 2, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented May 2, 2019

@jnothman GaussianProcessRegressor can work without calling fit. I added a tag requires_fit but I am not sure if it would be something that we want.

WDYT?

@jnothman

This comment has been minimized.

Copy link
Member

commented May 2, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

I think we have explicitly tried to support GPs without fit. Otherwise you upset all the bayesians.

And do you think that having the tag requires_fit (default to True) is a good idea?

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

@jnothman could you have another look. Apart of the tag I think that the PR is good to be merged.

@NicolasHug
Copy link
Contributor

left a comment

A few niticks.

I think the introduction of the estimator tag is appropriate here.

I'm not pressing "Approve" because TBH I'm not entirely sure what should be done regarding the deprecation, but I'm tending towards LGTM.

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
doc/developers/contributing.rst Outdated Show resolved Hide resolved
doc/developers/contributing.rst Outdated Show resolved Hide resolved
@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2019

Thanks @NicolasHug for reviewing.

I think I have now addressed the issues you raised, otherwise please let me know.

@jnothman
Copy link
Member

left a comment

Thanks @agamemnonc!

@jnothman jnothman merged commit 19192c0 into scikit-learn:master May 7, 2019

16 of 17 checks passed

scikit-learn.scikit-learn (macOS pylatest_conda)
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.52%)
Details
codecov/project 96.07% (+0.54%) compared to 28728f5
Details
scikit-learn.scikit-learn Build #20190506.15 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details

@agamemnonc agamemnonc deleted the agamemnonc:check_is_fitted_replacements branch May 7, 2019

@glemaitre

This comment has been minimized.

Copy link
Contributor

commented May 7, 2019

Thanks @agamemnonc

@agamemnonc

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

Thanks @agamemnonc

Thank you @glemaitre for your input and everyone else for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.