OneHotEncoder should handle unknown values more gracefully

As reported [over at SO](http://stackoverflow.com/q/17715723/166749), it would be nice if `OneHotEncoder` could handle unknown values for categorical features at `transform` time. Currently, the following throws an exception:

```
>>> from sklearn.preprocessing import OneHotEncoder
>>> oh = OneHotEncoder().fit([[0]])
>>> oh.transform([[1]])
Traceback (most recent call last):
  File "<ipython-input-17-54f21ed7c610>", line 1, in <module>
    oh.transform([[1]])
  File "/home/lars/src/scikit-learn/sklearn/preprocessing.py", line 878, in transform
    self.categorical_features, copy=True)
  File "/home/lars/src/scikit-learn/sklearn/preprocessing.py", line 662, in _transform_selected
    return transform(X)
  File "/home/lars/src/scikit-learn/sklearn/preprocessing.py", line 851, in _transform
    raise ValueError("Feature out of bounds. Try setting n_values.")
ValueError: Feature out of bounds. Try setting n_values.
```

I personally find this a bit strict, and would expect at most a warning and an appropriate number of zero columns for an unknown value.

This would be consistent with `DictVectorizer` and `CountVectorizer`, which ignore whatever features were not in their training set.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

OneHotEncoder should handle unknown values more gracefully #2169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

OneHotEncoder should handle unknown values more gracefully #2169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions