-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Closed
Milestone
Description
As reported over at SO, it would be nice if OneHotEncoder
could handle unknown values for categorical features at transform
time. Currently, the following throws an exception:
>>> from sklearn.preprocessing import OneHotEncoder
>>> oh = OneHotEncoder().fit([[0]])
>>> oh.transform([[1]])
Traceback (most recent call last):
File "<ipython-input-17-54f21ed7c610>", line 1, in <module>
oh.transform([[1]])
File "/home/lars/src/scikit-learn/sklearn/preprocessing.py", line 878, in transform
self.categorical_features, copy=True)
File "/home/lars/src/scikit-learn/sklearn/preprocessing.py", line 662, in _transform_selected
return transform(X)
File "/home/lars/src/scikit-learn/sklearn/preprocessing.py", line 851, in _transform
raise ValueError("Feature out of bounds. Try setting n_values.")
ValueError: Feature out of bounds. Try setting n_values.
I personally find this a bit strict, and would expect at most a warning and an appropriate number of zero columns for an unknown value.
This would be consistent with DictVectorizer
and CountVectorizer
, which ignore whatever features were not in their training set.
Metadata
Metadata
Assignees
Labels
No labels