Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG Adds attributes back to check_is_fitted #15947

Merged
merged 14 commits into from Dec 27, 2019

Conversation

@thomasjpfan
Copy link
Member

thomasjpfan commented Dec 21, 2019

Reference Issues/PRs

Fixes #15845
Alternative to #15885

What does this implement/fix? Explain your changes.

Implements @jnothman's suggestion: #15845 (comment)

Any other comments?

This PR adds attributes back in as an optional keyword to check_is_fitted.

thomasjpfan added 3 commits Dec 21, 2019
Copy link
Member

jnothman left a comment

Please note in the docs that this utility is for use within the estimator not within meta estimators. Please also confirm that this is in accordance with our current usage. We can make it more strict in a later version.. Thanks

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Dec 23, 2019

Can you please leverage this change to make TfIdfVectorizer work as expected (w.r.t. #15845 (comment))?

sklearn/utils/validation.py Outdated Show resolved Hide resolved
sklearn/utils/validation.py Outdated Show resolved Hide resolved
sklearn/utils/validation.py Outdated Show resolved Hide resolved
ogrisel added 2 commits Dec 24, 2019
… introspection
@glemaitre

This comment has been minimized.

Copy link
Contributor

glemaitre commented Dec 24, 2019

It could be good to add a what's new entry to inform that we reverted the behavior?

ogrisel added 2 commits Dec 24, 2019
@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Dec 24, 2019

@glemaitre @jnothman I addressed your comments about doc changes (whatsnew and scope of function) in 129d261.

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved
sklearn/utils/validation.py Outdated Show resolved Hide resolved
ogrisel added 2 commits Dec 24, 2019
Copy link
Contributor

glemaitre left a comment

LGTM

@ogrisel ogrisel requested a review from jnothman Dec 24, 2019
attrs = all([hasattr(estimator, attr) for attr in attributes])
else:
attrs = [v for v in vars(estimator)
if v.endswith("_") and not v.startswith("__")]

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 24, 2019

Member

why do we keep not v.startswith("__")?

This comment has been minimized.

Copy link
@ogrisel

ogrisel Dec 24, 2019

Member

Because all dunder attributes end with _ and are not fit parameters.

This comment has been minimized.

Copy link
@ogrisel

ogrisel Dec 24, 2019

Member

And more generally, anything that starts with __ is weird / reserved in Python so scikit-learn attributes should never start with __.

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 24, 2019

Member

And more generally, anything that starts with __ is weird / reserved in Python so scikit-learn attributes should never start with __.

You're right, but perhaps it's better to remove, because we do not forbid users to create attributes start with __.

This comment has been minimized.

Copy link
@ogrisel

ogrisel Dec 24, 2019

Member

If we remove it we will have false positives.

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 27, 2019

Member

perhaps a clearer solution is if v.endswith("_") and not v.endswith("__")? @jnothman

This comment has been minimized.

Copy link
@rth

rth Dec 27, 2019

Member

+1 to keep excluding based on v.startswith("__"). I have seen users use private variables in __init__ in their estimators, it's not good, but we shouldn't count those as fit attributes either.

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 27, 2019

Member

I have seen users use private variables in init in their estimators, it's not good, but we shouldn't count those as fit attributes either.

but @rth we're talking about double underscore, not single underscore (private variables).

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 27, 2019

Member

Anyway I guess this is not so important.

This comment has been minimized.

Copy link
@rth

rth Dec 27, 2019

Member

but @rth we're talking about double underscore, not single underscore (private variables).

I mean technically private variables as in self.__a, not very frequent but I have seen it. Anyway, yes it's not too critical either way. I just wanted to merge this quickly to fix CI on scikit-learn-extra. We can always come back on these detail at later time if needed.

@qinhanmin2014

This comment has been minimized.

Copy link
Member

qinhanmin2014 commented Dec 24, 2019

And why do we remove all_or_any?

@ogrisel

This comment has been minimized.

Copy link
Member

ogrisel commented Dec 24, 2019

Good point.

ogrisel added 2 commits Dec 24, 2019
Copy link
Member

qinhanmin2014 left a comment

perhaps this is a good short-term solution
I still think that we should remove v.startswith("__"). (where is the false positive?)

@@ -60,6 +60,13 @@ Changelog
- |Fix| :func:`utils.check_array` now correctly converts pandas DataFrame with
boolean columns to floats. :pr:`15797` by `Thomas Fan`_.

- |Fix| :func:`utils.check_is_fifted` accepts back an explicit ``attributes``
argument to check for specific attributes as explicit markers of a fitted

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 26, 2019

Member

attributes and all_or_any

This comment has been minimized.

Copy link
@rth

rth Dec 27, 2019

Member

The attributes argument to check for specific attributes.

I think that's correct syntax highlighting, there should be no need to write each occurrence of attributes as attribute.

This comment has been minimized.

Copy link
@rth

rth Dec 27, 2019

Member

OK, I see your point, added a mention about all_or_any to what's new.

@jnothman jnothman added this to the 0.22.1 milestone Dec 27, 2019
Copy link
Member

jnothman left a comment

I think this patch is the what we've decided upon.

I must admit that this is not very comfortable. The requirement that a fitted attribute ending _ must be stored in __dict__, and not provided through a property, for instance, is weird.

"""Perform is_fitted validation for estimator.
Checks if the estimator is fitted by verifying the presence of
fitted attributes (ending with a trailing underscore) and otherwise
raises a NotFittedError with the given message.
This utility is meant to be used internally by estimators them-selves,

This comment has been minimized.

Copy link
@jnothman

jnothman Dec 27, 2019

Member
Suggested change
This utility is meant to be used internally by estimators them-selves,
This utility is meant to be used internally by estimators themselves,
rth added 3 commits Dec 27, 2019
@rth
rth approved these changes Dec 27, 2019
Copy link
Member

rth left a comment

I pushed a few commits to address last comments about documentation/what's new. LGTM now.

I would be good to merge this soon, as among other things the fact that we mistakenly removed the attributes parameter on master in check_is_fitted altogether breaks CI in scikit-learn-extra (and I imagine any other contrib project that tests nightly wheels).

@rth

This comment has been minimized.

Copy link
Member

rth commented Dec 27, 2019

OK, I think all comments are addressed. Merging, thanks!

@rth rth changed the title [MRG] BUG Adds attributes back to check_is_fitted BUG Adds attributes back to check_is_fitted Dec 27, 2019
@rth rth merged commit 9accce5 into scikit-learn:master Dec 27, 2019
21 checks passed
21 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc artifact Link to 0/doc/_changed.html
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 97.56% of diff hit (target 97.48%)
Details
codecov/project Absolute coverage decreased by -0.03% but relative coverage increased by +0.07% compared to 9408203
Details
scikit-learn.scikit-learn Build #20191227.7 succeeded
Details
scikit-learn.scikit-learn (Linting) Linting succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_pip_openblas_pandas) Linux pylatest_pip_openblas_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Linux_Runs pylatest_conda_mkl) Linux_Runs pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl_no_openmp) macOS pylatest_conda_mkl_no_openmp succeeded
Details
@alegonz

This comment has been minimized.

Copy link

alegonz commented Dec 27, 2019

Thank you everyone!

- |Fix| :func:`utils.check_is_fitted` accepts back an explicit ``attributes``
argument to check for specific attributes as explicit markers of a fitted
estimator. When no explicit ``attributes`` are provided, only the attributes
ending with a single "_" are used as "fitted" markers. The ``all_or_any``

This comment has been minimized.

Copy link
@qinhanmin2014

qinhanmin2014 Dec 27, 2019

Member

this should be that ends with a underscore and does not start with double underscore.
I'll push directly.

ogrisel added a commit to ogrisel/scikit-learn that referenced this pull request Dec 31, 2019
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel added a commit to ogrisel/scikit-learn that referenced this pull request Jan 2, 2020
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ogrisel added a commit that referenced this pull request Jan 2, 2020
* DOC fixed default values in dbscan (#15753)

* DOC fix incorrect branch reference in contributing doc (#15779)

* DOC relabel Feature -> Efficiency in change log (#15770)

* DOC fixed Birch default value (#15780)

* STY Minior change on code padding in website theme (#15768)

* DOC Fix yticklabels order in permutation importances example (#15799)

* Fix yticklabels order in permutation importances example

* STY Update wrapper width (#15793)

* DOC Long sentence was hard to parse and ambiguous in _classification.py (#15769)

* DOC Removed duplicate 'classes_' attribute in Naive Bayes classifiers (#15811)

* BUG Fixes pandas dataframe bug with boolean dtypes (#15797)

* BUG Returns only public estimators in all_estimators (#15380)

* DOC improve doc for multiclass and types_of_target (#15333)

* TST Increases tol for check_pca_float_dtype_preservation assertion (#15775)

* update _alpha_grid class in _coordinate_descent.py (#15835)

* FIX Explicit conversion of ndarray to object dtype. (#15832)

* BLD Parallelize sphinx builds on circle ci (#15745)

* DOC correct url for preprocessing (#15853)

* MNT avoid generating too many cross links in examples (#15844)

* DOC Correct wrong doc in precision_recall_fscore_support (#15833)

* DOC add comment in check_pca_float_dtype_preservation (#15819)

Documenting the changes in #15775

* DOC correct indents in docstring _split.py (#15843)

* DOC fix docstring of KMeans based on sklearn guideline (#15754)

* DOC fix docstring of AgglomerativeClustering based on sklearn guideline (#15764)

* DOC fix docstring of AffinityPropagation based on sklearn guideline (#15777)

* DOC fixed SpectralCoclustering and SpectralBiclustering docstrings following sklearn guideline (#15778)

* DOC fix FeatureAgglomeration and MiniBatchKMeans docstring following sklearn guideline (#15809)

* TST Specify random_state in test_cv_iterable_wrapper (#15829)

* DOC Include LinearSV{C, R} in models that support sample_weights (#15871)

* DOC correct some indents (#15875)

* DOC Fix documentation of default values in tree classes (#15870)

* DOC fix typo in docstring (#15887)

* DOC FIX default value for xticks_rotation in plot_confusion_matrix (#15890)

* Fix imports in pip3 ubuntu by suffixing affected files (#15891)

* MNT Raise erorr when normalize is invalid in confusion_matrix (#15888)

* [MRG] DOC Increases search results for API object results (#15574)

* MNT Ignores warning in pyamg for deprecated scipy.random (#15914)

* DOC Instructions to troubleshoot Windows path length limit (#15916)

* DOC add versionadded directive to some estimators (#15849)

* DOC clarify doc-string of roc_auc_score and add references (#15293)

* MNT Adds skip lint to azure pipeline CI (#15904)

* BLD Fixes bug when building with NO_MATHJAX=1 (#15892)

* [MRG] BUG Checks to number of axes in passed in ax more generically (#15760)

* EXA Minor fixes in plot_sparse_logistic_regression_20newsgroups.py (#15925)

* BUG Do not shadow public functions with deprecated modules (#15846)

* Import sklearn._distributor_init first (#15929)

* DOC Fix typos, via a Levenshtein-style corrector (#15923)

* DOC in canned comment, mention that PR title becomes commit me… (#15935)

* DOC/EXA Correct spelling of "Classification" (#15938)

* BUG fix pip3 ubuntu update by suffixing file (#15928)

* [MRG] Ways to compute center_shift_total were different in "full" and "elkan" algorithms. (#15930)

* TST Fixes integer test for train and test indices (#15941)

* BUG ensure that parallel/sequential give the same permutation importances (#15933)

* Formatting fixes in changelog (#15944)

* MRG FIX: order of values of self.quantiles_ in QuantileTransformer (#15751)

* [MRG] BUG Fixes constrast in plot_confusion_matrix (#15936)

* BUG use zero_division argument in classification_report (#15879)

* DOC change logreg solver in plot_logistic_path (#15927)

* DOC fix whats new ordering (#15961)

* COSMIT use np.iinfo to define the max int32 (#15960)

* DOC Apply numpydoc validation to VotingRegressor methods (#15969)

Co-authored-by: Tiffany R. Williams <Tiffany8@users.noreply.github.com>

* DOC improve naive_bayes.py documentation (#15943)

Co-authored-by: Jigna Panchal <40188288+jigna-panchal@users.noreply.github.com>

* DOC Fix default values in Perceptron documentation (#15965)

* DOC Improve default values in logistic documentation (#15966)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* DOC Improve documentation of default values for imputers (#15964)

* EXA/MAINT Simplify code in manifold learning example (#15949)

* DOC Improve default values in SGD documentation (#15967)

* DOC Improve defaults in neural network documentation (#15968)

* FIX use safe_sparse_dot for callable kernel in LabelSpreading (#15868)

* BUG Adds attributes back to check_is_fitted (#15947)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* DOC update check_is_fitted what's new

* DOC change python-devel to python3-devel for yum. (#15986)

* DOC Correct the default value of values_format in plot_confusion_matrix (#15981)

* [MRG] MNT Updates pypy to use 7.2.0 (#15954)

* FIX Add missing 'values_format' param to disp.plot() in plot_confusion_matrix (#15937)

* FIX support scalar values in fit_params in SearchCV (#15863)

* support a scalar fit param

* pep8

* TST add test for desired behavior

* FIX introduce _check_fit_params to validate parameters

* DOC update whats new

* TST tests both grid-search and randomize-search

* PEP8

* DOC revert unecessary change

* TST add test for _check_fit_params

* olivier comments

* TST fixes

* DOC whats new

* DOC whats new

* TST revert type of error

* add olivier suggestions

* address olivier comments

* address thomas comments

* PEP8

* comments olivier

* TST fix test by passing X

* avoid to call twice tocsr

* add case column/row sparse in check_fit_param

* provide optional indices

* TST check content when indexing params

* PEP8

* TST update tests to check identity

* stupid fix

* use a distribution in RandomizedSearchCV

* MNT add lightgbm to one of the CI build

* move to another build

* do not install dependencies lightgbm

* MNT comments on the CI setup

* address some comments

* Test fit_params compat without dependency on lightgbm

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* Remove abstractmethod that silently brake downstream packages (#15996)

* FIX restore BaseNB._check_X without abstractmethod decoration (#15997)

* Update v0.22 changelog for 0.22.1 (#16002)

- set the date
- move entry for quantile transformer to the 0.22.1 section
- fix alphabetical ordering of modules

* STY Removes hidden scroll bar (#15999)

* Flake8 fixes

* Fix: remove left-over lines that should have been deleted during conflict resolution when rebasing

* Fix missing imports

* Update version

* Fix test_check_is_fitted

* Make test_sag_regressor_computed_correctly deterministic (#16003)

Fix #15818.

Co-authored-by: cgsavard <claire.savard@colorado.edu>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Co-authored-by: Matt Hall <matt@agilegeoscience.com>
Co-authored-by: Kathryn Poole <kathryn.poole2@gmail.com>
Co-authored-by: lucyleeow <jliu176@gmail.com>
Co-authored-by: JJmistry <jayminm22@gmail.com>
Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>
Co-authored-by: SylvainLan <sylvain.s.lannuzel@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Hanmin Qin <qinhanmin2005@sina.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Vachan D A <vachanda@users.noreply.github.com>
Co-authored-by: Sambhav Kothari <sambhavs.email@gmail.com>
Co-authored-by: wenliwyan <12013376+wenliwyan@users.noreply.github.com>
Co-authored-by: shivamgargsya <shivam.gargshya@gmail.com>
Co-authored-by: Reshama Shaikh <rs2715@stern.nyu.edu>
Co-authored-by: Oliver Urs Lenz <oulenz@users.noreply.github.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Brian Wignall <BrianWignall@gmail.com>
Co-authored-by: Ritchie Ng <ritchieng@u.nus.edu>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: inderjeet <43402782+inder128@users.noreply.github.com>
Co-authored-by: scibol <scibol@users.noreply.github.com>
Co-authored-by: Tirth Patel <tirthasheshpatel@gmail.com>
Co-authored-by: Bibhash Chandra Mitra <bibhashm220896@gmail.com>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Tiffany R. Williams <Tiffany8@users.noreply.github.com>
Co-authored-by: Jigna Panchal <40188288+jigna-panchal@users.noreply.github.com>
Co-authored-by: @nkish <19225359+ankishb@users.noreply.github.com>
Co-authored-by: Pulkit Mehta <pulkit_mehta_work@yahoo.com>
Co-authored-by: David Breuer <DavidBreuer@users.noreply.github.com>
Co-authored-by: Niklas <niklas.sm+github@gmail.com>
Co-authored-by: Windber <guolipengyeah@126.com>
Co-authored-by: Stephen Blystone <29995339+blynotes@users.noreply.github.com>
Co-authored-by: Brigitta Sipőcz <b.sipocz@gmail.com>
leouieda added a commit to fatiando/verde that referenced this pull request Jan 16, 2020
Rollback the changes in #217 because scikit-learn reverted the deprecation of
attributes in check_is_fitted. See scikit-learn/scikit-learn#15947
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.