-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG Wrong error raised for OneVsRestClassifier with a sub-estimator that doesn't allow partial_fit #28108
Comments
Interesting, I never realize this issue. When returning This is already something that we do with the scikit-learn/sklearn/pipeline.py Lines 46 to 56 in 30fcb45
With this pattern, we get the right error with the following example: from sklearn.pipeline import make_pipeline
from sklearn.tree import DecisionTreeClassifier
X, y = load_iris(return_X_y=True)
clf = make_pipeline(DecisionTreeClassifier()).fit(X, y)
clf.decision_function(X) ---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[21], line 6
4 X, y = load_iris(return_X_y=True)
5 clf = make_pipeline(DecisionTreeClassifier()).fit(X, y)
----> 6 clf.decision_function(X)
File ~/Documents/packages/scikit-learn/sklearn/utils/_available_if.py:31, in _AvailableIfDescriptor.__get__(self, obj, owner)
25 attr_err = AttributeError(
26 f"This {repr(owner.__name__)} has no attribute {repr(self.attribute_name)}"
27 )
28 if obj is not None:
29 # delegate only on instances, not the classes.
30 # this is to allow access to the docstrings.
---> 31 if not self.check(obj):
32 raise attr_err
33 out = MethodType(self.fn, obj)
File ~/Documents/packages/scikit-learn/sklearn/pipeline.py:53, in _final_estimator_has.<locals>.check(self)
51 def check(self):
52 # raise original `AttributeError` if `attr` does not exist
---> 53 getattr(self._final_estimator, attr)
54 return True
AttributeError: 'DecisionTreeClassifier' object has no attribute 'decision_function' So it means that we should change the following pattern: scikit-learn/sklearn/multiclass.py Lines 180 to 189 in 30fcb45
by something like: def _estimators_has(attr):
"""Check if self.estimator or self.estimators_[0] has attr.
If `self.estimators_[0]` has the attr, then its safe to assume that other
values has it too. This function is used together with `available_if`.
"""
def check(self):
if hasattr(self, "estimators_"):
getattr(self.estimators_[0], attr)
return True
else:
getattr(self.estimator, attr)
return True
return check
I think that we should search for the pattern @StefanieSenger do you want to solve this issue? |
Additionally, we should check if we can remove the code: if not hasattr(self.estimator, "partial_fit"):
raise ValueError(
("Base estimator {0}, doesn't have partial_fit method").format(
self.estimator
)
) since it should be handled by |
@glemaitre Yes, sure I want to solve this issue. Thank you for the directions. |
Your suggestion with the wrapper function seems to work out over several layers, @glemaitre: from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans
iris = load_iris()
sample_weight = np.ones_like(iris.target, dtype=np.float64)
# 1st layer
pipe = Pipeline([("logreg", LogisticRegression())])
clf = OneVsRestClassifier(estimator=pipe).fit(iris.data, iris.target)
clf.transform(iris.data) #AttributeError: 'OneVsRestClassifier' object has no attribute 'transform'
# 2nd layer
pipe = Pipeline([("logreg", LogisticRegression())])
clf = OneVsRestClassifier(estimator=pipe).fit(iris.data, iris.target)
clf.partial_fit(iris.data) #AttributeError: 'Pipeline' object has no attribute 'partial_fit'
# 3rd layer
pipe = Pipeline([("kmeans", KMeans())])
clf = OneVsRestClassifier(estimator=pipe).fit(iris.data, iris.target)
clf.predict_proba(iris.data) # AttributeError: 'KMeans' object has no attribute 'predict_proba' So, I would now look for more cases (searching for I'd also really like to know @thomasjpfan's opinion on that, because I have the impression that the original intention of However, I still find the default error message from |
The design of
Although As an overall solution, I would go with #28108 (comment), but also improve from sklearn.pipeline import make_pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
clf = make_pipeline(DecisionTreeClassifier()).fit(X, y)
clf.decision_function(X) raises the following: Exception---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File ~/Repos/scikit-learn-1/sklearn/utils/_available_if.py:29, in _AvailableIfDescriptor._check(self, obj, owner)
28 try:
---> 29 check_result = self.check(obj)
30 except Exception as e:
File ~/Repos/scikit-learn-1/sklearn/pipeline.py:53, in _final_estimator_has.<locals>.check(self)
51 def check(self):
52 # raise original `AttributeError` if `attr` does not exist
---> 53 getattr(self._final_estimator, attr)
54 return True
AttributeError: 'DecisionTreeClassifier' object has no attribute 'decision_function'
The above exception was the direct cause of the following exception:
AttributeError Traceback (most recent call last)
Cell In[2], line 7
5 X, y = load_iris(return_X_y=True)
6 clf = make_pipeline(DecisionTreeClassifier()).fit(X, y)
----> 7 clf.decision_function(X)
File ~/Repos/scikit-learn-1/sklearn/utils/_available_if.py:40, in _AvailableIfDescriptor.__get__(self, obj, owner)
36 def __get__(self, obj, owner=None):
37 if obj is not None:
38 # delegate only on instances, not the classes.
39 # this is to allow access to the docstrings.
---> 40 self._check(obj, owner=owner)
41 out = MethodType(self.fn, obj)
43 else:
44 # This makes it possible to use the decorated method as an unbound method,
45 # for instance when monkeypatching.
File ~/Repos/scikit-learn-1/sklearn/utils/_available_if.py:31, in _AvailableIfDescriptor._check(self, obj, owner)
29 check_result = self.check(obj)
30 except Exception as e:
---> 31 raise AttributeError(attr_err_msg) from e
33 if not check_result:
34 raise AttributeError(attr_err_msg)
AttributeError: This 'Pipeline' has no attribute 'decision_function' I opened #28198 with this proposal. |
@thomasjpfan Nice, I like it a lot! Do I understand correctly that working on this PR still makes sense? For now, there is no inner error raised for the five estimators we talk about here and there should be, right? |
You are right @StefanieSenger. We still need this PR such that the check raise an |
I merged #28198 so we can leverage it. When I asked for tests, I think the pattern used by @thomasjpfan could be used here as well: |
Closing this issue because the problem is fixed. |
Describe the bug
When using the
OneVsRestClassifier
with a sub-estimator, that doesn't havepartial_fit
implemented, a misleading error message is shown:AttributeError: This 'OneVsRestClassifier' has no attribute 'partial_fit'
.Though,
OneVsRestClassifier
does implementpartial_fit
, but the underlying estimator doesn't.There is an appropriate error raising already implemented in
scikit-learn/sklearn/multiclass.py
Line 437 in f1e8936
But it's not run, because the AttributeError from the
@available_if
decorator/descriptor thing pops up earlier.I'd like to learn about that, repair that and add a test to make sure this doesn't happen again by accident.
I will also check if other methods from the multiclass classifiers are also affected.
Is it alright if I go ahead with this?
Steps/Code to Reproduce
Expected Results
ValueError: LogisticRegression doesn't have partial_fit method
Actual Results
AttributeError: This 'OneVsRestClassifier' has no attribute 'partial_fit'
.Versions
The text was updated successfully, but these errors were encountered: