New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split #8449

Merged
merged 18 commits into from Apr 3, 2017

Conversation

Projects
None yet
7 participants
@raghavrv
Member

raghavrv commented Feb 24, 2017

Fixes #8400

Also ref Gilles' comment

This PR tries to stop splitting if the weighted impurity gain after a potential split is not above a user-given threshold...

@amueller Can you try this on your use cases and see if it gives better control than min_impurity_split?

@jnothman @glouppe @nelson-liu @glemaitre @jmschrei

raghavrv added some commits Feb 24, 2017

@@ -272,10 +275,23 @@ def fit(self, X, y, sample_weight=None, check_input=True,
min_weight_leaf = (self.min_weight_fraction_leaf *
np.sum(sample_weight))
if self.min_impurity_split < 0.:
if self.min_impurity_split is not None:

This comment has been minimized.

@jmschrei

jmschrei Feb 25, 2017

Member

Is there a deprication decorator which can be used? I know there is one for depricated functions, but I'm not sure about parameters.

@jmschrei

jmschrei Feb 25, 2017

Member

Is there a deprication decorator which can be used? I know there is one for depricated functions, but I'm not sure about parameters.

This comment has been minimized.

@raghavrv

raghavrv Feb 27, 2017

Member

I think we typically use our deprecated decorator for attributes not parameters... But I'm unsure... @amueller thoughts?

@raghavrv

raghavrv Feb 27, 2017

Member

I think we typically use our deprecated decorator for attributes not parameters... But I'm unsure... @amueller thoughts?

@jmschrei

This comment has been minimized.

Show comment
Hide comment
@jmschrei

jmschrei Feb 25, 2017

Member

In general this looks good. I didn't check your test though to make sure it was correct.

Member

jmschrei commented Feb 25, 2017

In general this looks good. I didn't check your test though to make sure it was correct.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Feb 27, 2017

Member

Thanks a lot @jmschrei for the review!

Member

raghavrv commented Feb 27, 2017

Thanks a lot @jmschrei for the review!

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Feb 27, 2017

Member

Others @glouppe @amueller Reviews please :)

Member

raghavrv commented Feb 27, 2017

Others @glouppe @amueller Reviews please :)

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Feb 27, 2017

Contributor

Functionality wise this looks good to me, pending that comment about the deprecation decorator. Good work @raghavrv

Contributor

nelson-liu commented Feb 27, 2017

Functionality wise this looks good to me, pending that comment about the deprecation decorator. Good work @raghavrv

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Mar 7, 2017

Member

Thanks @nelson-liu and @jmschrei. Andy or Gilles??

Member

raghavrv commented Mar 7, 2017

Thanks @nelson-liu and @jmschrei. Andy or Gilles??

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Mar 14, 2017

Member

Or maybe @glemaitre / @ogrisel have some time for reviews?

Member

raghavrv commented Mar 14, 2017

Or maybe @glemaitre / @ogrisel have some time for reviews?

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Mar 14, 2017

Contributor

Should you mention in the docstring that min_impurity_split will be deprecated?

Contributor

glemaitre commented Mar 14, 2017

Should you mention in the docstring that min_impurity_split will be deprecated?

Show outdated Hide outdated sklearn/ensemble/forest.py
Show outdated Hide outdated sklearn/ensemble/forest.py
Show outdated Hide outdated sklearn/ensemble/forest.py
Show outdated Hide outdated sklearn/ensemble/forest.py
Show outdated Hide outdated sklearn/ensemble/forest.py
@@ -1406,7 +1417,8 @@ class GradientBoostingClassifier(BaseGradientBoosting, ClassifierMixin):
def __init__(self, loss='deviance', learning_rate=0.1, n_estimators=100,
subsample=1.0, criterion='friedman_mse', min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.,
max_depth=3, min_impurity_split=1e-7, init=None,
max_depth=3, min_impurity_decrease=0.,

This comment has been minimized.

@glemaitre

glemaitre Mar 14, 2017

Contributor

min_impurity_decrease is define at 1e-7 in the above docstring.

@glemaitre

glemaitre Mar 14, 2017

Contributor

min_impurity_decrease is define at 1e-7 in the above docstring.

This comment has been minimized.

@raghavrv

raghavrv Mar 24, 2017

Member

Thanks for the catch. I changed the doc to 0... I'm using 0 because of the EPSILON added as described here...

@raghavrv

raghavrv Mar 24, 2017

Member

Thanks for the catch. I changed the doc to 0... I'm using 0 because of the EPSILON added as described here...

Show outdated Hide outdated sklearn/ensemble/gradient_boosting.py
@@ -1790,7 +1811,8 @@ class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
def __init__(self, loss='ls', learning_rate=0.1, n_estimators=100,
subsample=1.0, criterion='friedman_mse', min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.,
max_depth=3, min_impurity_split=1e-7, init=None, random_state=None,
max_depth=3, min_impurity_decrease=0.,

This comment has been minimized.

@glemaitre

glemaitre Mar 14, 2017

Contributor

check the default value

@glemaitre

glemaitre Mar 14, 2017

Contributor

check the default value

This comment has been minimized.

@raghavrv

raghavrv Mar 24, 2017

Member

(Same as above)

@raghavrv

raghavrv Mar 24, 2017

Member

(Same as above)

Show outdated Hide outdated sklearn/tree/tree.py
Show outdated Hide outdated sklearn/tree/tree.py

@raghavrv raghavrv removed the Needs Review label Mar 24, 2017

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Mar 24, 2017

Member

Should you mention in the docstring that min_impurity_split will be deprecated?

Generally we don't mention that in docstring. We deprecate it and remove the doc for that param...

Thanks for the review. Have addressed it :) Another round?

@jnothman Could you take a look this too?

Member

raghavrv commented Mar 24, 2017

Should you mention in the docstring that min_impurity_split will be deprecated?

Generally we don't mention that in docstring. We deprecate it and remove the doc for that param...

Thanks for the review. Have addressed it :) Another round?

@jnothman Could you take a look this too?

@MechCoder

Some minor comments, looks fine otherwise.

Show outdated Hide outdated sklearn/ensemble/forest.py
Show outdated Hide outdated sklearn/tree/tests/test_tree.py
Show outdated Hide outdated sklearn/tree/tests/test_tree.py
def test_min_impurity_decrease():
# test if min_impurity_decrease ensure that a split is made only if
# if the impurity decrease is atleast that value
X, y = datasets.make_classification(n_samples=10000, random_state=42)

This comment has been minimized.

@MechCoder

MechCoder Mar 31, 2017

Member

You should test regressors also no?

@MechCoder

MechCoder Mar 31, 2017

Member

You should test regressors also no?

This comment has been minimized.

@raghavrv

raghavrv Mar 31, 2017

Member

Yes! The ALL_TREES[...] contains regressors too... Just that I use the same classification data to test the regressors too...

@raghavrv

raghavrv Mar 31, 2017

Member

Yes! The ALL_TREES[...] contains regressors too... Just that I use the same classification data to test the regressors too...

Show outdated Hide outdated sklearn/tree/_tree.pyx
Show outdated Hide outdated sklearn/ensemble/tests/test_gradient_boosting.py
@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Mar 31, 2017

Member

I agree that the behaviour of min_impurity_decrease is much more intuitive than min_impurity_split.

Member

MechCoder commented Mar 31, 2017

I agree that the behaviour of min_impurity_decrease is much more intuitive than min_impurity_split.

raghavrv added some commits Mar 31, 2017

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Mar 31, 2017

Member

It's the same expression your one with the "fractional_weight" and the one documented in the criterion file. It is just that I find the latter easier to read, but it's fine. (I meant having the extra term is right and it wasn't reflected in the documentation)

Member

MechCoder commented Mar 31, 2017

It's the same expression your one with the "fractional_weight" and the one documented in the criterion file. It is just that I find the latter easier to read, but it's fine. (I meant having the extra term is right and it wasn't reflected in the documentation)

@MechCoder

This comment has been minimized.

Show comment
Hide comment
@MechCoder

MechCoder Mar 31, 2017

Member

LGTM!

Member

MechCoder commented Mar 31, 2017

LGTM!

@MechCoder MechCoder changed the title from [MRG] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split to [MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split Mar 31, 2017

@jmschrei

LGTM. If you can address the one typesetting comment I'll go ahead and merge it.

Show outdated Hide outdated sklearn/ensemble/forest.py
@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Apr 3, 2017

Member

@jmschrei @glemaitre Thanks for pointing that out! It was not displaying correctly before but after the latest commit it should look like this

image

Member

raghavrv commented Apr 3, 2017

@jmschrei @glemaitre Thanks for pointing that out! It was not displaying correctly before but after the latest commit it should look like this

image

@jmschrei jmschrei changed the title from [MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split to [MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split Apr 3, 2017

@jmschrei jmschrei merged commit fc2f249 into scikit-learn:master Apr 3, 2017

5 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.49%)
Details
codecov/project 95.5% (+0.01%) compared to 38adb27
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Apr 3, 2017

Member

Yohoo!! Thanks for the reviews and merge @jmschrei @MechCoder and @glemaitre :)

Member

raghavrv commented Apr 3, 2017

Yohoo!! Thanks for the reviews and merge @jmschrei @MechCoder and @glemaitre :)

@raghavrv raghavrv deleted the raghavrv:min_impurity_decrease branch Apr 3, 2017

@glouppe

This comment has been minimized.

Show comment
Hide comment
@glouppe

glouppe Apr 4, 2017

Member

Nice :)

Member

glouppe commented Apr 4, 2017

Nice :)

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Apr 5, 2017

Member

Sweet, thanks!
Can I haz example?

Member

amueller commented Apr 5, 2017

Sweet, thanks!
Can I haz example?

massich added a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017

[MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopp…
…ing based on impurity; Deprecate min_impurity_split (#8449)

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split

@MechCoder MechCoder referenced this pull request May 1, 2017

Merged

Remove min_impurity_split #10

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopp…
…ing based on impurity; Deprecate min_impurity_split (#8449)

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopp…
…ing based on impurity; Deprecate min_impurity_split (#8449)

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopp…
…ing based on impurity; Deprecate min_impurity_split (#8449)

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split

sebp added a commit to sebp/scikit-survival that referenced this pull request Oct 16, 2017

sebp added a commit to sebp/scikit-survival that referenced this pull request Oct 16, 2017

sebp added a commit to sebp/scikit-survival that referenced this pull request Oct 30, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopp…
…ing based on impurity; Deprecate min_impurity_split (#8449)

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split

sebp added a commit to sebp/scikit-survival that referenced this pull request Nov 18, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] ENH/FIX Introduce min_impurity_decrease param for early stopp…
…ing based on impurity; Deprecate min_impurity_split (#8449)

[MRG+2] ENH/FIX Introduce min_impurity_decrease param for early stopping based on impurity; Deprecate min_impurity_split
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment