[MRG] FIX keep at least one feature when max_features is small fraction #12388

ghost · 2018-10-15T19:34:30Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Often the max_features parameter of a Bagging estimator is set as a float, to represent a fraction of the number of features to use. To convert to an integer, this equation is currently used:
max_features = int(self.max_features * self.n_features_)

However, this often leads to a ValueError if the result is rounded down to zero. This may occur if the number of features is often unknown (for example, due to hyperparameter tuning in an earlier stage).

This PR ensures a minimum of one feature is kept in this situation:
max_features = max(1, int(self.max_features * self.n_features_) )

Any other comments?

Would be grateful to check that unit test is implemented in the right place in an appropriate manner. I've tried to be consistent with other tests.

I've tried to find the cleanest implementation that still raises a ValueError if max_features is negative, zero, too large, or not an int nor float.

… inputs

jnothman

Otherwise LGTM

jnothman · 2018-10-16T09:07:49Z

sklearn/ensemble/bagging.py

-        else:  # float
-            max_features = int(self.max_features * self.n_features_)
+        elif isinstance(self.max_features, (numbers.Real, np.float)):
+            if not self.max_features > 0.0:


Can't you use <= instead of not?

jnothman · 2018-10-16T09:09:18Z

sklearn/ensemble/bagging.py

+                raise ValueError("max_features must be in (0, n_features]")
+            max_features = max(1, int(self.max_features * self.n_features_))
+        else:
+            raise ValueError("max_features must be int or float")

        if not (0 < max_features <= self.n_features_):


Put this before the above test and you can simplify the code. Don't worry about validating that it is numeric. Comparing to a number is good enough for that unexpected case.

Thanks for the review @jnothman ! I'm struggling to find a simpler implementation that handles unexpected cases. Could you expand on your comment above?

Comparing self.max_features to a number without first casting to an integer causes an unexpected Type error if it's a string: TypeError: unorderable types: int() < str().

However, casting max_features to an integer means that 0.1 would be rounded down to 0, and hence a Value error is raised (which is the behaviour the PR is trying to avoid).

Some relevant existing unit tests:

scikit-learn/sklearn/ensemble/tests/test_bagging.py

Lines 424 to 434 in fa4de83

# Test max_features

assert_raises(ValueError,

BaggingClassifier(base, max_features=-1).fit, X, y)

assert_raises(ValueError,

BaggingClassifier(base, max_features=0.0).fit, X, y)

assert_raises(ValueError,

BaggingClassifier(base, max_features=2.0).fit, X, y)

assert_raises(ValueError,

BaggingClassifier(base, max_features=5).fit, X, y)

assert_raises(ValueError,

BaggingClassifier(base, max_features="foobar").fit, X, y)

Well... TypeError might really be the more appropriate error anyway, but let's not quibble with the tests. Why not:

if not numeric: raise ValueError if real: max_features = max_features * features if not 0 < max_features <= n_features: raise ValueError max_features = int(max_features)

but perhaps that logic is no less complicated than the present?

Good point: I could just modify that test to expect a TypeError if a string is input rather than a ValueError.

Here's a suggested modification of the logic that ensures 0.1 is not rounded down to zero:

if isinstance(self.max_features, (numbers.Integral, np.integer)): max_features = self.max_features else: # float max_features = self.max_features * self.n_features_ if not (0 < max_features <= self.n_features_): raise ValueError max_features = max(1, int(max_features))

ghost · 2018-10-16T17:04:43Z

Ah, it looks like 'foobar' > 0 raises TypeError in python 3, but is True in python 2.7. Who'd have guessed. I'll add a line to explicitly raise a ValueError if the input is not numeric.

rth

LGTM, thanks. Please add a what's new to doc/whats_new/v0.20.rst under the 0.20.1 section mentioning all estimators that are affected by this fix (excluding BaseBagging class itself).

ghost · 2018-10-26T16:13:22Z

Thanks for the review @rth . I've added a comment to the doc as requested.

rth · 2018-10-27T09:23:00Z

Thanks! (Fixed the formatting in what's new a bit).

…#12388)" This reverts commit ba94c7a.

connortann added 6 commits October 15, 2018 20:03

Added constraint max_features at least one

de10747

Added non-regression test

5f01a36

Ensured ValueError if max_features == 0.0

326971d

Improved validation of max_features to raise ValueError for deficient…

1ce6236

… inputs

Ensured lines less than 79 characters

612a40b

Filter warnings due to Logistic regression

fa4de83

jnothman reviewed Oct 16, 2018

View reviewed changes

Simplified logic

9fcc2c0

Explicitly raise ValueError for unexpected type

bc5932e

jnothman approved these changes Oct 17, 2018

View reviewed changes

rth approved these changes Oct 24, 2018

View reviewed changes

Addded comment to doc/whats_new/v0.20.rst

1439335

rth added 2 commits October 27, 2018 11:18

Fix what's new formatting

02da0a4

Use PR number in what's new

74b37ed

rth merged commit 5cef1df into scikit-learn:master Oct 27, 2018

thoo pushed a commit to thoo/scikit-learn that referenced this pull request Nov 14, 2018

FIX ensure max_features > 0 in ensemble.bagging (scikit-learn#12388)

d9389f3

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018

FIX ensure max_features > 0 in ensemble.bagging (scikit-learn#12388)

48ec526

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Nov 14, 2018

FIX ensure max_features > 0 in ensemble.bagging (scikit-learn#12388)

d6cd8e7

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX ensure max_features > 0 in ensemble.bagging (scikit-learn#12388)

ba94c7a

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX ensure max_features > 0 in ensemble.bagging (scikit-learn…

a7badfc

…#12388)" This reverts commit ba94c7a.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX ensure max_features > 0 in ensemble.bagging (scikit-learn…

a25c9bd

…#12388)" This reverts commit ba94c7a.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX ensure max_features > 0 in ensemble.bagging (scikit-learn#12388)

c4cc587

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] FIX keep at least one feature when max_features is small fraction #12388

[MRG] FIX keep at least one feature when max_features is small fraction #12388

ghost commented Oct 15, 2018 •

edited by ghost

Loading

Uh oh!

jnothman left a comment

Uh oh!

jnothman Oct 16, 2018

Uh oh!

jnothman Oct 16, 2018

Uh oh!

ghost Oct 16, 2018 •

edited by ghost

Loading

Uh oh!

jnothman Oct 16, 2018

Uh oh!

ghost Oct 16, 2018

Uh oh!

ghost commented Oct 16, 2018 •

edited by ghost

Loading

Uh oh!

rth left a comment •

edited

Loading

Uh oh!

ghost commented Oct 26, 2018

Uh oh!

rth commented Oct 27, 2018

Uh oh!

Uh oh!

	# Test max_features
	assert_raises(ValueError,
	BaggingClassifier(base, max_features=-1).fit, X, y)
	assert_raises(ValueError,
	BaggingClassifier(base, max_features=0.0).fit, X, y)
	assert_raises(ValueError,
	BaggingClassifier(base, max_features=2.0).fit, X, y)
	assert_raises(ValueError,
	BaggingClassifier(base, max_features=5).fit, X, y)
	assert_raises(ValueError,
	BaggingClassifier(base, max_features="foobar").fit, X, y)

Uh oh!

[MRG] FIX keep at least one feature when max_features is small fraction #12388

[MRG] FIX keep at least one feature when max_features is small fraction #12388

Conversation

ghost commented Oct 15, 2018 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 16, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 16, 2018

Choose a reason for hiding this comment

Uh oh!

ghost Oct 16, 2018 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman Oct 16, 2018

Choose a reason for hiding this comment

Uh oh!

ghost Oct 16, 2018

Choose a reason for hiding this comment

Uh oh!

ghost commented Oct 16, 2018 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Oct 26, 2018

Uh oh!

rth commented Oct 27, 2018

Uh oh!

Uh oh!

ghost commented Oct 15, 2018 •

edited by ghost

Loading

ghost Oct 16, 2018 •

edited by ghost

Loading

ghost commented Oct 16, 2018 •

edited by ghost

Loading

rth left a comment •

edited

Loading