[MRG] Deprecate min_samples_leaf #11280

lasagnaman · 2018-06-15T17:13:33Z

Reference Issues/PRs

Fixes #10773, see also #8399

What does this implement/fix? Explain your changes.

Deprecates min_samples_leaf in sklearn.ensemble.forest, sklearn.ensemble.gradient_boosting, and sklearn.tree.tree.

Any other comments?

The parameter is slated for removal is 0.22

lasagnaman · 2018-06-15T17:14:44Z

todo

~~fix tests~~

amueller · 2018-06-15T17:16:30Z

sklearn/tree/tree.py

@@ -1367,6 +1382,9 @@ class ExtraTreeRegressor(DecisionTreeRegressor):

        .. versionchanged:: 0.18
           Added float values for fractions.
+        .. deprecated:: 0.20
+           The parameter `min_samples_leaf` is deprecated in version 0.20 and


single backticks don't do anything and should probably be double backticks.

amueller · 2018-06-16T01:41:22Z

pep8 still failing. Otherwise looks good. Did you check that the output of the tests don't have the deprecation warning any more? (I figure you did, just confirming)

lasagnaman · 2018-06-16T18:39:35Z

yep I did!

Fix (single vs double) backtick issues

lasagnaman · 2018-06-17T06:13:35Z

I'm also happy to submit 049f7a9 as a separate PR if that's more appropriate? I just got excited and decided to fix a bunch of things...

lasagnaman · 2018-06-17T06:48:51Z

I got this assertion error
E assert 0.72186695411543378 == 0.72186695411543389
in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py:188. Does this seem spurious? It passes locally....

lasagnaman · 2018-06-19T03:13:27Z

Is there a way to retrigger the travis build if I suspect the error is spurious?

amueller · 2018-06-19T23:47:25Z

We could retrigger but it's odd since the tests are deterministic. You changed a lot of pep8 stuff which makes it harder to see what the actual changes are that you did. I'll restart the test but expect it will fail again. But you didn't actually change anything, right?

lasagnaman · 2018-06-20T01:34:55Z

Sorry, I can remove that last commit and submit it separately if that makes it easier. Alternatively, you can view the first 3 commits here, which just contain the parameter deprecation. Then you can look at/verify that the last commit only makes pep8 fixes. But again, happy to reorg the PR, whatever is easiest for you.

But, as you predicted, it failed again.... let me dig a little bit further.

lasagnaman · 2018-06-20T02:10:16Z

Confirmed that the tests pass locally for me, and that there's no rebase weirdness going on (commit hashes are identical between the local and remote branch). In TravisCI the failing test is test_sample_weight_deviance. Do you have further suggestions on how I might investigate?

(sklearn) lasagnaman@lasagna3 ~/git/scikit-learn $ pytest sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py 
=============================== test session starts ===============================
platform linux -- Python 3.6.5, pytest-3.6.1, py-1.5.3, pluggy-0.6.0
rootdir: /home/lasagnaman/git/scikit-learn, inifile: setup.cfg
plugins: cov-2.5.1
collected 9 items                                                                 

sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py .........   [100%]

============================ 9 passed in 0.28 seconds =============================

(sklearn) lasagnaman@lasagna3 ~/git/scikit-learn $ git logg
* 049f7a932 (HEAD -> 10773, origin/10773) fix many flake8 issues
* e800dcdfb fix flake8
* 4d38180d1 catch deprecation warnings for min_samples_leaf
* b11a56ed9 Deprecate min_samples_leaf

jnothman · 2018-06-20T02:12:48Z

The failing Travis run operates on minimal (legacy) dependencies. so it may not be surprising that it runs fine for you. please look at the travis log. and yes, all the unrelated changes make review hard.

lasagnaman · 2018-06-20T02:22:40Z

sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py

@@ -155,7 +158,6 @@ def test_quantile_loss_function():
 def test_sample_weight_deviance():
    # Test if deviance supports sample weights.
    rng = check_random_state(13)
-    X = rng.rand(100, 2)


Ah, I guess it's this line..... I guess this call, though unused, affects rng's seed?

lasagnaman · 2018-06-20T20:15:42Z

The python 3.6.2 test fails --- looking at the travis log, the job ends after

sklearn/neighbors/tests/test_nearest_centroid.py .........               [ 37%]
sklearn/neighbors/tests/test_neighbors.py .............................. [ 38%]
..........

and.... that's the end of the log. Am I missing something? Did some test fail and kill the job or something?

The original test (which I now realize failed in 2.7, but succeeded in python 3+) is no longer failing.

jnothman · 2018-06-21T00:04:29Z

I've restarted that test, but that Travis was failing on master yesterday.

lasagnaman · 2018-06-21T14:35:15Z

thanks @jnothman . Tests look good now and I think this PR is ready for review.

lasagnaman · 2018-06-27T15:37:34Z

@amueller should I update this to use 'deprecated' as the sentinel value (as per #11283 (comment))?

jnothman · 2018-06-27T23:24:21Z

yes

lasagnaman · 2018-06-29T19:02:34Z

ready for review/merge

jnothman

The key thing that is missing is telling the user why. Perhaps just say "It was not effective for regularisation" (?) in the docstrings under deprecation

jnothman · 2018-06-30T10:28:51Z

Did you check for usage in doc/ and examples/?

jnothman · 2018-06-30T10:29:12Z

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

lasagnaman

2 sections in the doc where I had to change some of the substance --- please check if my re-writing is accpetable

lasagnaman · 2018-06-30T22:02:51Z

doc/modules/tree.rst

-    a minimum number of samples in a leaf, while ``min_samples_split`` can
-    create arbitrary small leaves, though ``min_samples_split`` is more common
-    in the literature.
+  * Use ``min_samples_split`` to control the number of samples at a leaf node.


Please check whether my rephrasing of this paragraph is acceptable.

lasagnaman · 2018-06-30T22:04:58Z

doc/modules/tree.rst

@@ -347,7 +344,7 @@ Tips on practical use
    class to the same value. Also note that weight-based pre-pruning criteria,
    such as ``min_weight_fraction_leaf``, will then be less biased toward
    dominant classes than criteria that are not aware of the sample weights,
-    like ``min_samples_leaf``.
+    like ``min_samples_split``.


My understanding leads me to believe this change is grammatical, but please check?

jnothman · 2018-07-01T08:36:24Z

hmm.. those changes look good, but you have also reminded me that we may also want to deprecate min_weight_fraction_leaf! @amueller?

lasagnaman · 2018-07-30T04:39:06Z

@jnothman @amueller thoughts on this? I see there was a lot of discussion on #11283 which was ultimately merged, so let me know if I should update this MR to any new standards/conventions that were decided, or if I can simply just rebase and fix conflicts. Happy to do a bit more work to get this over the line.

jnothman · 2018-07-30T07:48:02Z

I will admit i forgot about this one. I think it's a bit weird to deprecate this and not weight fraction if they both have the same problem

amueller · 2018-08-17T20:15:49Z

Argh, is this a blocker? Maybe for the release, maybe not for the RC?

amueller · 2018-08-17T20:21:59Z

I think min_weight_fraction_leaf should also be deprecated.

jnothman · 2018-08-18T13:49:34Z

I think at a minimum we should document that this doesn't work well as regularisation... is it worth blocking for the deprecation? I think it's acceptable to do for next release.

…

amueller · 2018-08-20T19:42:19Z

fair and we can fix the docs after the RC.

rth · 2018-08-23T05:31:44Z

Continued and fixed in #11870

amueller reviewed Jun 15, 2018

View reviewed changes

lasagnaman force-pushed the 10773 branch from 12357a4 to 5c22ec3 Compare June 15, 2018 20:01

lasagnaman changed the title ~~[WIP] Deprecate min_samples_leaf~~ [MRG] Deprecate min_samples_leaf Jun 15, 2018

lasagnaman added 2 commits June 17, 2018 02:07

Deprecate min_samples_leaf

b11a56e

Fix (single vs double) backtick issues

catch deprecation warnings for min_samples_leaf

4d38180

lasagnaman force-pushed the 10773 branch 2 times, most recently from 3e1525e to 049f7a9 Compare June 17, 2018 06:11

lasagnaman commented Jun 20, 2018

View reviewed changes

lasagnaman force-pushed the 10773 branch from 049f7a9 to e800dcd Compare June 20, 2018 02:24

fix flake8

842918b

lasagnaman force-pushed the 10773 branch from e800dcd to 842918b Compare June 20, 2018 02:27

lasagnaman mentioned this pull request Jun 20, 2018

[WIP] fix PEP8 issues in sklearn/ensemble/* #11322

Closed

use 'deprecated' as sentinel value over None

54a14d3

lasagnaman force-pushed the 10773 branch from 805c529 to 54a14d3 Compare June 29, 2018 03:39

jnothman reviewed Jun 30, 2018

View reviewed changes

lasagnaman added 2 commits June 30, 2018 17:39

Remove min_samples_leaf from doc/ and examples/

4d0d95a

Add reason for deprecation. Update whats_new

17c6b61

lasagnaman commented Jun 30, 2018

View reviewed changes

jnothman added this to the 0.20 milestone Jul 30, 2018

This was referenced Aug 20, 2018

DOC document that min_samples_leaf/min_weight_fraction_leaf are useless #11869

Closed

[MRG] Deprecate min_samples_leaf and min_weight_fraction_leaf #11870

Merged

rth closed this Aug 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Deprecate min_samples_leaf #11280

[MRG] Deprecate min_samples_leaf #11280

lasagnaman commented Jun 15, 2018 •

edited

lasagnaman commented Jun 15, 2018 •

edited

amueller Jun 15, 2018

amueller commented Jun 16, 2018

lasagnaman commented Jun 16, 2018

lasagnaman commented Jun 17, 2018

lasagnaman commented Jun 17, 2018 •

edited

lasagnaman commented Jun 19, 2018

amueller commented Jun 19, 2018

lasagnaman commented Jun 20, 2018 •

edited

lasagnaman commented Jun 20, 2018 •

edited

jnothman commented Jun 20, 2018 via email

lasagnaman Jun 20, 2018

lasagnaman commented Jun 20, 2018

jnothman commented Jun 21, 2018

lasagnaman commented Jun 21, 2018

lasagnaman commented Jun 27, 2018

jnothman commented Jun 27, 2018 via email

lasagnaman commented Jun 29, 2018

jnothman left a comment

jnothman commented Jun 30, 2018

jnothman commented Jun 30, 2018

lasagnaman left a comment

lasagnaman Jun 30, 2018

lasagnaman Jun 30, 2018

jnothman commented Jul 1, 2018 via email

lasagnaman commented Jul 30, 2018

jnothman commented Jul 30, 2018

amueller commented Aug 17, 2018

amueller commented Aug 17, 2018

jnothman commented Aug 18, 2018 via email

amueller commented Aug 20, 2018

rth commented Aug 23, 2018

[MRG] Deprecate min_samples_leaf #11280

[MRG] Deprecate min_samples_leaf #11280

Conversation

lasagnaman commented Jun 15, 2018 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

lasagnaman commented Jun 15, 2018 • edited

amueller Jun 15, 2018

Choose a reason for hiding this comment

amueller commented Jun 16, 2018

lasagnaman commented Jun 16, 2018

lasagnaman commented Jun 17, 2018

lasagnaman commented Jun 17, 2018 • edited

lasagnaman commented Jun 19, 2018

amueller commented Jun 19, 2018

lasagnaman commented Jun 20, 2018 • edited

lasagnaman commented Jun 20, 2018 • edited

jnothman commented Jun 20, 2018 via email

lasagnaman Jun 20, 2018

Choose a reason for hiding this comment

lasagnaman commented Jun 20, 2018

jnothman commented Jun 21, 2018

lasagnaman commented Jun 21, 2018

lasagnaman commented Jun 27, 2018

jnothman commented Jun 27, 2018 via email

lasagnaman commented Jun 29, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Jun 30, 2018

jnothman commented Jun 30, 2018

lasagnaman left a comment

Choose a reason for hiding this comment

lasagnaman Jun 30, 2018

Choose a reason for hiding this comment

lasagnaman Jun 30, 2018

Choose a reason for hiding this comment

jnothman commented Jul 1, 2018 via email

lasagnaman commented Jul 30, 2018

jnothman commented Jul 30, 2018

amueller commented Aug 17, 2018

amueller commented Aug 17, 2018

jnothman commented Aug 18, 2018 via email

amueller commented Aug 20, 2018

rth commented Aug 23, 2018

lasagnaman commented Jun 15, 2018 •

edited

lasagnaman commented Jun 15, 2018 •

edited

lasagnaman commented Jun 17, 2018 •

edited

lasagnaman commented Jun 20, 2018 •

edited

lasagnaman commented Jun 20, 2018 •

edited