Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST check error consistency when calling get_feature_names_out on unfitted estimator #25223

Conversation

jpangas
Copy link
Contributor

@jpangas jpangas commented Dec 22, 2022

Reference Issues/PRs

In issue #24916 , we want to make error messages uniform when calling get_feature_names_out before fit. To adhere to the uniformity, it was agreed that all estimators should raise a NotFittedError if they are unfitted.

What does this implement/fix? Explain your changes.

To solve the issue, we first needed to identify the estimators that don't raise a NotFittedError. Therefore, this PR proposes tests that check if a NotFittedError is raised in estimators with get_feature_names_out.

Any other comments?

For a particular estimator, the test will pass if a NotFittedError is raised by get_feature_names_out and will fail if any other type of error/exception is raised.
In case the test fails, it will show the estimator, the error and which parts of the code led to the error being raised.
The command below can be used to run the tests that check errors generated when get_feature_names_out is called before fit in all estimators:
pytest -vsl sklearn/tests/test_common.py -k estimators_get_feature_names_out_error

@glemaitre glemaitre self-requested a review December 23, 2022 15:51
@glemaitre
Copy link
Member

glemaitre commented Dec 23, 2022

I think this is a good start. I would now create a whitelist such that we can correct one by one (or more than one if it comes with inheritance) the failing estimators. The patch to do so would be:

diff --git a/sklearn/tests/test_common.py b/sklearn/tests/test_common.py
index b61e2e2a93..d9352f97fe 100644
--- a/sklearn/tests/test_common.py
+++ b/sklearn/tests/test_common.py
@@ -461,16 +461,60 @@ def test_transformers_get_feature_names_out(transformer):
 ESTIMATORS_WITH_GET_FEATURE_NAMES_OUT = [
     est for est in _tested_estimators() if hasattr(est, "get_feature_names_out")
 ]
+WHITELISTED_FAILING_ESTIMATORS = [
+    "AdditiveChi2Sampler",
+    "Binarizer",
+    "DictVectorizer",
+    "GaussianRandomProjection",
+    "GenericUnivariateSelect",
+    "IterativeImputer",
+    "IsotonicRegression",
+    "KBinsDiscretizer",
+    "KNNImputer",
+    "MaxAbsScaler",
+    "MinMaxScaler",
+    "MissingIndicator",
+    "Normalizer",
+    "OrdinalEncoder",
+    "PowerTransformer",
+    "QuantileTransformer",
+    "RFE",
+    "RFECV",
+    "RobustScaler",
+    "SelectFdr",
+    "SelectFpr",
+    "SelectFromModel",
+    "SelectFwe",
+    "SelectKBest",
+    "SelectPercentile",
+    "SequentialFeatureSelector",
+    "SimpleImputer",
+    "SparseRandomProjection",
+    "SplineTransformer",
+    "StackingClassifier",
+    "StackingRegressor",
+    "StandardScaler",
+    "TfidfTransformer",
+    "VarianceThreshold",
+    "VotingClassifier",
+    "VotingRegressor",
+]
 
 
 @pytest.mark.parametrize(
     "estimator", ESTIMATORS_WITH_GET_FEATURE_NAMES_OUT, ids=_get_check_estimator_ids
 )
 def test_estimators_get_feature_names_out_error(estimator):
+    estimator_name = estimator.__class__.__name__
+    if estimator_name in WHITELISTED_FAILING_ESTIMATORS:
+        return pytest.xfail(
+            reason=f"{estimator_name} is not failing with a consistent NotFittedError"
+        )
+
     _set_checking_parameters(estimator)
 
     with ignore_warnings(category=(FutureWarning)):
-        check_get_feature_names_out_error(estimator.__class__.__name__, estimator)
+        check_get_feature_names_out_error(estimator_name, estimator)
 
 
 @pytest.mark.parametrize(

@glemaitre glemaitre changed the title ENH Add tests to check error in get_feature_names_out when called before fit TST check error consistency when calling get_feature_names_out on unfitted estimator Dec 23, 2022
@glemaitre glemaitre removed their request for review December 23, 2022 16:07
@jpangas
Copy link
Contributor Author

jpangas commented Dec 24, 2022

@glemaitre , Just included the patch in the most recent commit.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM on my side.

@jpangas
Copy link
Contributor Author

jpangas commented Dec 28, 2022

Thanks @glemaitre , should we wait for the PR to be merged or others can begin working on the failing estimators?

@glemaitre
Copy link
Member

You can always start to work by branching from this branch. You will need to merge main afterward when this PR will be merged. I will try to find a reviewer.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jpangas

@jeremiedbb jeremiedbb merged commit 1280e09 into scikit-learn:main Dec 28, 2022
@glemaitre
Copy link
Member

Thanks @jeremiedbb to be so quick ;)
@jpangas people can therefore start branching from main.

@jpangas
Copy link
Contributor Author

jpangas commented Dec 28, 2022

Thanks @glemaitre and @jeremiedbb for approving this quickly. I will now mention the estimators that need to be worked on in issue #24916 so that other contributors can join in.

@jpangas jpangas deleted the test_estimators_get_feature_names_out_error branch December 28, 2022 20:46
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 3, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants