-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OneVsOneClassifier
does not accept custom input types
#23779
Comments
I have found an example of the same behaviour using just scikit-learn. The following example works for from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsOneClassifier
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
X, y = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42, return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
pipeline = Pipeline([
("preprocessing", CountVectorizer()),
("classifier", SVC()),
])
multiclass = OneVsOneClassifier(pipeline)
multiclass.fit(X_train, y_train)
multiclass.score(X_test, y_test) |
We should probably be more lenient in the validation and not force a 2D matrix. Thus, it could boil down to the following changes: diff --git a/sklearn/multiclass.py b/sklearn/multiclass.py
index b46b4bfb8b..5f10d4e6bb 100644
--- a/sklearn/multiclass.py
+++ b/sklearn/multiclass.py
@@ -659,7 +659,12 @@ class OneVsOneClassifier(MetaEstimatorMixin, ClassifierMixin, BaseEstimator):
"""
# We need to validate the data because we do a safe_indexing later.
X, y = self._validate_data(
- X, y, accept_sparse=["csr", "csc"], force_all_finite=False
+ X,
+ y,
+ accept_sparse=["csr", "csc"],
+ force_all_finite=False,
+ ensure_2d=False,
+ dtype=None,
)
check_classification_targets(y)
@@ -738,6 +743,8 @@ class OneVsOneClassifier(MetaEstimatorMixin, ClassifierMixin, BaseEstimator):
accept_sparse=["csr", "csc"],
force_all_finite=False,
reset=first_call,
+ ensure_2d=False,
+ dtype=None,
)
check_classification_targets(y)
combinations = itertools.combinations(range(self.n_classes_), 2)
@@ -806,6 +813,8 @@ class OneVsOneClassifier(MetaEstimatorMixin, ClassifierMixin, BaseEstimator):
accept_sparse=True,
force_all_finite=False,
reset=False,
+ ensure_2d=False,
+ dtype=None,
)
indices = self.pairwise_indices_
It would lead to the following results: In [1]: from sklearn.datasets import fetch_20newsgroups
...: from sklearn.feature_extraction.text import CountVectorizer
...: from sklearn.model_selection import train_test_split
...: from sklearn.svm import SVC
...: from sklearn.pipeline import Pipeline
...: from sklearn.multiclass import OneVsOneClassifier
...:
...: categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
...: X, y = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42, return_X_y=True)
...:
...: X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
...:
...: pipeline = Pipeline([
...: ("preprocessing", CountVectorizer()),
...: ("classifier", SVC()),
...: ])
...:
...: multiclass = OneVsOneClassifier(pipeline)
...:
...: multiclass.fit(X_train, y_train)
...: multiclass.score(X_test, y_test)
Out[1]: 0.7805309734513274 |
@glemaitre Your solution works for the text example, but it still transforms the input object to a NumPy array. That causes an error in the first example, as the first transformer expects the custom object. |
We could bypass the validation on @jeremiedbb do you remember anything about the need of validating |
We can call
from sklearn.utils import _safe_indexing
class CustomData:
def __init__(self):
self.items = [1, 2, 3, 4, 5, 6]
def __getitem__(self, k):
return self.items[k]
custom_data = CustomData()
_safe_indexing(custom_data, [3, 4])
# [4, 5] I suspect the desired behavior is for |
If the custom object has a Objects without |
Describe the bug
Due to the additional validation in #6626,
OneVsOneClassifier
cannot be used with custom types that work like arrays but cannot be converted to arrays. In my case I am the maintainer of scikit-fda, a functional data library that attempts to be compatible with scikit-learn. The metaestimatorsOneVsRestClassifier
andOneVsOneClassifier
should be trivially compatible with our data. CurrentlyOneVsRestClassifier
works fine, whileOneVsOneClassifier
doesn't.I think that scikit-learn has sometimes very aggressive validation that makes it difficult to extend it for custom objects, as in this case.
Steps/Code to Reproduce
I have no example using only scikit-learn, as you need custom types and classifiers / transformers to trigger it
The following code is adapted from https://fda.readthedocs.io/en/latest/auto_tutorial/plot_skfda_sklearn.html#multiclass-and-multioutput-classification-utilities
Note that the code works if you replace OneVsOneClassifier with OneVsRestClassifier, as the later does not have that aggressive validation.
Expected Results
The result of the classification.
Actual Results
Versions
The text was updated successfully, but these errors were encountered: