Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] FIX DummyEstimator and a single output 2d input list/array #13545

Merged
merged 15 commits into from Apr 5, 2019
4 changes: 4 additions & 0 deletions doc/whats_new/v0.21.rst
Expand Up @@ -146,6 +146,10 @@ Support for Python 3.4 and below has been officially dropped.
float64 for the ``stratified`` strategy. :issue:`13266` by
:user:`Christos Aridas<chkoar>`.

- |Fix| Fixed a bug in :class:`dummy.DummyClassifier` where 1d dimensional y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"1d dimensional y with ndim=2" makes no sense to me. Do you mean "column vector"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it was a copy paste from the old issue, sorry. Fixed.

with ndim=2 was throwing a dimension mismatch error in prediction time.
:issue:`10786` by :user:`Nick Sorros <nsorros>` and `Adrin Jalali`_.

:mod:`sklearn.ensemble`
.......................

Expand Down
6 changes: 4 additions & 2 deletions sklearn/dummy.py
Expand Up @@ -120,7 +120,8 @@ def fit(self, X, y, sample_weight=None):
if not self.sparse_output_:
y = np.atleast_1d(y)

self.output_2d_ = y.ndim == 2
self.output_2d_ = y.ndim == 2 and y.shape[1] > 1

if y.ndim == 1:
y = np.reshape(y, (-1, 1))

Expand Down Expand Up @@ -425,7 +426,8 @@ def fit(self, X, y, sample_weight=None):
if len(y) == 0:
raise ValueError("y must not be empty.")

self.output_2d_ = y.ndim == 2
self.output_2d_ = y.ndim == 2 and y.shape[1] > 1

if y.ndim == 1:
y = np.reshape(y, (-1, 1))
self.n_outputs_ = y.shape[1]
Expand Down
16 changes: 16 additions & 0 deletions sklearn/tests/test_dummy.py
Expand Up @@ -102,6 +102,22 @@ def test_most_frequent_and_prior_strategy():
clf.class_prior_.reshape((1, -1)) > 0.5)


def test_most_frequent_and_prior_strategy_with_2d_column_y():
adrinjalali marked this conversation as resolved.
Show resolved Hide resolved
# non-regression test added in
# https://github.com/scikit-learn/scikit-learn/pull/13545
X = [[0], [0], [0], [0]]
y_1d = [1, 2, 1, 1]
y_2d = [[1], [2], [1], [1]]

for strategy in ("most_frequent", "prior"):
clf_1d = DummyClassifier(strategy=strategy, random_state=0)
clf_2d = DummyClassifier(strategy=strategy, random_state=0)

clf_1d.fit(X, y_1d)
clf_2d.fit(X, y_2d)
assert_array_equal(clf_1d.predict(X), clf_2d.predict(X))


def test_most_frequent_and_prior_strategy_multioutput():
X = [[0], [0], [0], [0]] # ignored
y = np.array([[1, 0],
Expand Down