Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN: Change default n_estimators to 100 for random forest #11542

Merged
merged 14 commits into from Jul 17, 2018

Conversation

@annaayzenshtat
Copy link
Contributor

@annaayzenshtat annaayzenshtat commented Jul 15, 2018

Reference Issues/PRs

Fixes #11128.

What does this implement/fix? Explain your changes.

Issues deprecation warning message for the default n_estimators parameter for the forest classifiers. Test added for the warning message when the default parameter is used.

Any other comments?

@amueller
Copy link
Member

@amueller amueller commented Jul 15, 2018

Is this based on #11172? The contributor there seems to have addressed the comments there yesterday...

@amueller
Copy link
Member

@amueller amueller commented Jul 15, 2018

though it looks like #11172 is still not right...

@@ -758,6 +763,10 @@ class RandomForestClassifier(ForestClassifier):
n_estimators : integer, optional (default=10)
The number of trees in the forest.
.. deprecated:: 0.20

This comment has been minimized.

@amueller

amueller Jul 15, 2018
Member

should be "versionchanged" not "deprecated"

This comment has been minimized.

@annaayzenshtat

annaayzenshtat Jul 15, 2018
Author Contributor

versionchanged as one long word, no spaces?

This comment has been minimized.

@amueller

amueller Jul 15, 2018
Member

yes. git grep versionchanged?

This comment has been minimized.

@annaayzenshtat

annaayzenshtat Jul 15, 2018
Author Contributor

working on that.

@@ -1228,3 +1229,23 @@ def test_min_impurity_decrease():
# Simply check if the parameter is passed on correctly. Tree tests
# will suffice for the actual working of this param
assert_equal(tree.min_impurity_decrease, 0.1)


def test_nestimators_future_warning():

This comment has been minimized.

@amueller

amueller Jul 15, 2018
Member

It might be better to you pytest.parametrize as above instead of the loop, which will run each estimator as a separate test.

This comment has been minimized.

@glemaitre

glemaitre Jul 15, 2018
Contributor

FYI:

@pytest.mark.parametrize('forest', [RandomForestClassifier(), RandomForestRegressor(),
                                    ExtraTreesClassifier(), ExtraTreesRegressor(),
                                    RandomTreesEmbedding()])
def test_n_estimators_future_warning(estimator):
    ....
    estimator.fit(X, y)
    ....

This comment has been minimized.

@rth

rth Jul 15, 2018
Member

Might be better to parametrize with classes,

@pytest.mark.parametrize('forest', [RandomForestClassifier, RandomForestRegressor,
                         [...]

then create the corresponding instances inside the test -- this works better for getting a human readable test name with pytest..

This comment has been minimized.

@glemaitre

glemaitre Jul 15, 2018
Contributor

Fair enough

@amueller
Copy link
Member

@amueller amueller commented Jul 15, 2018

though it looks like #11172 is still not right...

This looks pretty good. Ideally you'd catch also deprecation warnings if they are raised in the tests now.

Copy link
Contributor

@glemaitre glemaitre left a comment

You will also need to add an entry in the what's new file for v0.20 stating the change of behavior in the future.

@@ -242,6 +242,12 @@ def fit(self, X, y, sample_weight=None):
-------
self : object
"""

if self.n_estimators == 'warn':

This comment has been minimized.

@glemaitre

glemaitre Jul 15, 2018
Contributor

The check and validation should be done in fit instead of __init__

This comment has been minimized.

@annaayzenshtat

annaayzenshtat Jul 15, 2018
Author Contributor

So should I change back to n_estimators=10 instead of n_estimators='warn', and then change my if conditional check in the fit() method?

This comment has been minimized.

@amueller

amueller Jul 15, 2018
Member

no the warn is good, just the test should be in the other place.

This comment has been minimized.

@glemaitre

glemaitre Jul 16, 2018
Contributor

You can refer to: https://github.com/scikit-learn/scikit-learn/pull/11469/files#diff-e6faf37b13574bc591afbf0536128735R864

This is still not merged but we follow this convention: __init__ just assign the parameters to the class attributes and we do checking and validation in the fit method.

This comment has been minimized.

@annaayzenshtat

annaayzenshtat Jul 16, 2018
Author Contributor

Aren't lines 245 and 246 above inside the fit() method?

This comment has been minimized.

@glemaitre

glemaitre Jul 16, 2018
Contributor

Ups sorry it is good there. I good confused with another PR :)



def test_nestimators_future_warning():
# Test that n_estimators future warning is raised. Will be removed in 0.22

This comment has been minimized.

@glemaitre

glemaitre Jul 15, 2018
Contributor

You can use FIXME: to be removed 0.22

@@ -1228,3 +1229,23 @@ def test_min_impurity_decrease():
# Simply check if the parameter is passed on correctly. Tree tests
# will suffice for the actual working of this param
assert_equal(tree.min_impurity_decrease, 0.1)


def test_nestimators_future_warning():

This comment has been minimized.

@glemaitre

glemaitre Jul 15, 2018
Contributor

FYI:

@pytest.mark.parametrize('forest', [RandomForestClassifier(), RandomForestRegressor(),
                                    ExtraTreesClassifier(), ExtraTreesRegressor(),
                                    RandomTreesEmbedding()])
def test_n_estimators_future_warning(estimator):
    ....
    estimator.fit(X, y)
    ....
@glemaitre glemaitre changed the title Fix to Issue #11128: Create deprecation warning for default n_estimators in RandomForest EHN: Change default n_estimators to 100 for random forest Jul 16, 2018
@glemaitre glemaitre changed the title EHN: Change default n_estimators to 100 for random forest [MRG] EHN: Change default n_estimators to 100 for random forest Jul 16, 2018
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 16, 2018

FYI: I updated the title of this PR.

@massich
Copy link
Contributor

@massich massich commented Jul 16, 2018

@annaayzenshtat this is a blocker for 0.20 (which we are actively working on right now). If you don't have time to address the comments at this moment that's completely fine. Ping me and I'll take over the PR.

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 16, 2018

I'm still working on this issue

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

I committed the requested changes. Please take a look at these code changes.

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 17, 2018

Actually you need to flag the tests with pytest.mark.filterwarnings to avoid raising the future warning in the tests (typically the one that does not set n_estimators)

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

Ok, I'll change it.

Copy link
Contributor

@glemaitre glemaitre left a comment

You can check this PR as an example how to use pytest

https://github.com/scikit-learn/scikit-learn/pull/11574/files

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

I flagged the test with pytest.mark.filterwarnings.

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 17, 2018

@annaayzenshtat I am helping a bit with the failure that you got and I am filtering the warning because it seems that they are in a lot of places.

glemaitre added 2 commits Jul 17, 2018
Copy link
Member

@amueller amueller left a comment

lgtm if tests pass

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

Ok, thank you!

@amueller
Copy link
Member

@amueller amueller commented Jul 17, 2018

python2.7 test failure :-/

@amueller
Copy link
Member

@amueller amueller commented Jul 17, 2018

In SAG?!

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

Is there something I'm supposed to do to fix the Python 2.7 failure?

@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 17, 2018

Nop this is some side effect already shown and solve in #11574

@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

Ok.

@amueller amueller merged commit 2242c59 into scikit-learn:master Jul 17, 2018
0 of 4 checks passed
0 of 4 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
ci/circleci: python2 Your tests are queued behind your running builds
Details
ci/circleci: python3 Your tests are queued behind your running builds
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
@glemaitre
Copy link
Contributor

@glemaitre glemaitre commented Jul 17, 2018

@annaayzenshtat Thanks a lot for the contribution.
Feel free to take any other issue ;)

@annaayzenshtat annaayzenshtat deleted the annaayzenshtat:fix/n_estimators_100 branch Jul 17, 2018
@annaayzenshtat
Copy link
Contributor Author

@annaayzenshtat annaayzenshtat commented Jul 17, 2018

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.