Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Deprecate min_samples_leaf #11280

Closed
wants to merge 6 commits into from

Conversation

lasagnaman
Copy link

@lasagnaman lasagnaman commented Jun 15, 2018

Reference Issues/PRs

Fixes #10773, see also #8399

What does this implement/fix? Explain your changes.

Deprecates min_samples_leaf in sklearn.ensemble.forest, sklearn.ensemble.gradient_boosting, and sklearn.tree.tree.

Any other comments?

The parameter is slated for removal is 0.22

@lasagnaman
Copy link
Author

lasagnaman commented Jun 15, 2018

todo

  • fix tests

@@ -1367,6 +1382,9 @@ class ExtraTreeRegressor(DecisionTreeRegressor):

.. versionchanged:: 0.18
Added float values for fractions.
.. deprecated:: 0.20
The parameter `min_samples_leaf` is deprecated in version 0.20 and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single backticks don't do anything and should probably be double backticks.

@lasagnaman lasagnaman changed the title [WIP] Deprecate min_samples_leaf [MRG] Deprecate min_samples_leaf Jun 15, 2018
@amueller
Copy link
Member

pep8 still failing. Otherwise looks good. Did you check that the output of the tests don't have the deprecation warning any more? (I figure you did, just confirming)

@lasagnaman
Copy link
Author

yep I did!

@lasagnaman lasagnaman force-pushed the 10773 branch 2 times, most recently from 3e1525e to 049f7a9 Compare June 17, 2018 06:11
@lasagnaman
Copy link
Author

I'm also happy to submit 049f7a9 as a separate PR if that's more appropriate? I just got excited and decided to fix a bunch of things...

@lasagnaman
Copy link
Author

lasagnaman commented Jun 17, 2018

I got this assertion error
E assert 0.72186695411543378 == 0.72186695411543389
in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py:188. Does this seem spurious? It passes locally....

@lasagnaman
Copy link
Author

Is there a way to retrigger the travis build if I suspect the error is spurious?

@amueller
Copy link
Member

We could retrigger but it's odd since the tests are deterministic. You changed a lot of pep8 stuff which makes it harder to see what the actual changes are that you did. I'll restart the test but expect it will fail again. But you didn't actually change anything, right?

@lasagnaman
Copy link
Author

lasagnaman commented Jun 20, 2018

Sorry, I can remove that last commit and submit it separately if that makes it easier. Alternatively, you can view the first 3 commits here, which just contain the parameter deprecation. Then you can look at/verify that the last commit only makes pep8 fixes. But again, happy to reorg the PR, whatever is easiest for you.

But, as you predicted, it failed again.... let me dig a little bit further.

@lasagnaman
Copy link
Author

lasagnaman commented Jun 20, 2018

Confirmed that the tests pass locally for me, and that there's no rebase weirdness going on (commit hashes are identical between the local and remote branch). In TravisCI the failing test is test_sample_weight_deviance. Do you have further suggestions on how I might investigate?

(sklearn) lasagnaman@lasagna3 ~/git/scikit-learn $ pytest sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py 
=============================== test session starts ===============================
platform linux -- Python 3.6.5, pytest-3.6.1, py-1.5.3, pluggy-0.6.0
rootdir: /home/lasagnaman/git/scikit-learn, inifile: setup.cfg
plugins: cov-2.5.1
collected 9 items                                                                 

sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py .........   [100%]

============================ 9 passed in 0.28 seconds =============================

(sklearn) lasagnaman@lasagna3 ~/git/scikit-learn $ git logg
* 049f7a932 (HEAD -> 10773, origin/10773) fix many flake8 issues
* e800dcdfb fix flake8
* 4d38180d1 catch deprecation warnings for min_samples_leaf
* b11a56ed9 Deprecate min_samples_leaf

@jnothman
Copy link
Member

jnothman commented Jun 20, 2018 via email

@@ -155,7 +158,6 @@ def test_quantile_loss_function():
def test_sample_weight_deviance():
# Test if deviance supports sample weights.
rng = check_random_state(13)
X = rng.rand(100, 2)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess it's this line..... I guess this call, though unused, affects rng's seed?

@lasagnaman
Copy link
Author

The python 3.6.2 test fails --- looking at the travis log, the job ends after

sklearn/neighbors/tests/test_nearest_centroid.py .........               [ 37%]
sklearn/neighbors/tests/test_neighbors.py .............................. [ 38%]
..........

and.... that's the end of the log. Am I missing something? Did some test fail and kill the job or something?

The original test (which I now realize failed in 2.7, but succeeded in python 3+) is no longer failing.

@jnothman
Copy link
Member

I've restarted that test, but that Travis was failing on master yesterday.

@lasagnaman
Copy link
Author

thanks @jnothman . Tests look good now and I think this PR is ready for review.

@lasagnaman
Copy link
Author

@amueller should I update this to use 'deprecated' as the sentinel value (as per #11283 (comment))?

@jnothman
Copy link
Member

jnothman commented Jun 27, 2018 via email

@lasagnaman
Copy link
Author

ready for review/merge

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key thing that is missing is telling the user why. Perhaps just say "It was not effective for regularisation" (?) in the docstrings under deprecation

@jnothman
Copy link
Member

Did you check for usage in doc/ and examples/?

@jnothman
Copy link
Member

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

Copy link
Author

@lasagnaman lasagnaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 sections in the doc where I had to change some of the substance --- please check if my re-writing is accpetable

a minimum number of samples in a leaf, while ``min_samples_split`` can
create arbitrary small leaves, though ``min_samples_split`` is more common
in the literature.
* Use ``min_samples_split`` to control the number of samples at a leaf node.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check whether my rephrasing of this paragraph is acceptable.

@@ -347,7 +344,7 @@ Tips on practical use
class to the same value. Also note that weight-based pre-pruning criteria,
such as ``min_weight_fraction_leaf``, will then be less biased toward
dominant classes than criteria that are not aware of the sample weights,
like ``min_samples_leaf``.
like ``min_samples_split``.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding leads me to believe this change is grammatical, but please check?

@jnothman
Copy link
Member

jnothman commented Jul 1, 2018 via email

@lasagnaman
Copy link
Author

@jnothman @amueller thoughts on this? I see there was a lot of discussion on #11283 which was ultimately merged, so let me know if I should update this MR to any new standards/conventions that were decided, or if I can simply just rebase and fix conflicts. Happy to do a bit more work to get this over the line.

@jnothman jnothman added this to the 0.20 milestone Jul 30, 2018
@jnothman
Copy link
Member

I will admit i forgot about this one. I think it's a bit weird to deprecate this and not weight fraction if they both have the same problem

@amueller
Copy link
Member

Argh, is this a blocker? Maybe for the release, maybe not for the RC?

@amueller
Copy link
Member

I think min_weight_fraction_leaf should also be deprecated.

@jnothman
Copy link
Member

jnothman commented Aug 18, 2018 via email

@amueller
Copy link
Member

fair and we can fix the docs after the RC.

@rth
Copy link
Member

rth commented Aug 23, 2018

Continued and fixed in #11870

@rth rth closed this Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs state min_samples_leaf reduces size of the tree
4 participants