-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Deprecate min_samples_leaf #11280
Conversation
todo
|
sklearn/tree/tree.py
Outdated
@@ -1367,6 +1382,9 @@ class ExtraTreeRegressor(DecisionTreeRegressor): | |||
|
|||
.. versionchanged:: 0.18 | |||
Added float values for fractions. | |||
.. deprecated:: 0.20 | |||
The parameter `min_samples_leaf` is deprecated in version 0.20 and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single backticks don't do anything and should probably be double backticks.
pep8 still failing. Otherwise looks good. Did you check that the output of the tests don't have the deprecation warning any more? (I figure you did, just confirming) |
yep I did! |
Fix (single vs double) backtick issues
3e1525e
to
049f7a9
Compare
I'm also happy to submit 049f7a9 as a separate PR if that's more appropriate? I just got excited and decided to fix a bunch of things... |
I got this assertion error |
Is there a way to retrigger the travis build if I suspect the error is spurious? |
We could retrigger but it's odd since the tests are deterministic. You changed a lot of pep8 stuff which makes it harder to see what the actual changes are that you did. I'll restart the test but expect it will fail again. But you didn't actually change anything, right? |
Sorry, I can remove that last commit and submit it separately if that makes it easier. Alternatively, you can view the first 3 commits here, which just contain the parameter deprecation. Then you can look at/verify that the last commit only makes pep8 fixes. But again, happy to reorg the PR, whatever is easiest for you. But, as you predicted, it failed again.... let me dig a little bit further. |
Confirmed that the tests pass locally for me, and that there's no rebase weirdness going on (commit hashes are identical between the local and remote branch). In TravisCI the failing test is test_sample_weight_deviance. Do you have further suggestions on how I might investigate? (sklearn) lasagnaman@lasagna3 ~/git/scikit-learn $ pytest sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py
=============================== test session starts ===============================
platform linux -- Python 3.6.5, pytest-3.6.1, py-1.5.3, pluggy-0.6.0
rootdir: /home/lasagnaman/git/scikit-learn, inifile: setup.cfg
plugins: cov-2.5.1
collected 9 items
sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py ......... [100%]
============================ 9 passed in 0.28 seconds =============================
(sklearn) lasagnaman@lasagna3 ~/git/scikit-learn $ git logg
* 049f7a932 (HEAD -> 10773, origin/10773) fix many flake8 issues
* e800dcdfb fix flake8
* 4d38180d1 catch deprecation warnings for min_samples_leaf
* b11a56ed9 Deprecate min_samples_leaf |
The failing Travis run operates on minimal (legacy) dependencies. so it may
not be surprising that it runs fine for you. please look at the travis log.
and yes, all the unrelated changes make review hard.
|
@@ -155,7 +158,6 @@ def test_quantile_loss_function(): | |||
def test_sample_weight_deviance(): | |||
# Test if deviance supports sample weights. | |||
rng = check_random_state(13) | |||
X = rng.rand(100, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess it's this line..... I guess this call, though unused, affects rng
's seed?
The python 3.6.2 test fails --- looking at the travis log, the job ends after sklearn/neighbors/tests/test_nearest_centroid.py ......... [ 37%]
sklearn/neighbors/tests/test_neighbors.py .............................. [ 38%]
.......... and.... that's the end of the log. Am I missing something? Did some test fail and kill the job or something? The original test (which I now realize failed in 2.7, but succeeded in python 3+) is no longer failing. |
I've restarted that test, but that Travis was failing on master yesterday. |
thanks @jnothman . Tests look good now and I think this PR is ready for review. |
@amueller should I update this to use |
yes
|
ready for review/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key thing that is missing is telling the user why. Perhaps just say "It was not effective for regularisation" (?) in the docstrings under deprecation
Did you check for usage in doc/ and examples/? |
Please add an entry to the change log at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 sections in the doc where I had to change some of the substance --- please check if my re-writing is accpetable
a minimum number of samples in a leaf, while ``min_samples_split`` can | ||
create arbitrary small leaves, though ``min_samples_split`` is more common | ||
in the literature. | ||
* Use ``min_samples_split`` to control the number of samples at a leaf node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check whether my rephrasing of this paragraph is acceptable.
@@ -347,7 +344,7 @@ Tips on practical use | |||
class to the same value. Also note that weight-based pre-pruning criteria, | |||
such as ``min_weight_fraction_leaf``, will then be less biased toward | |||
dominant classes than criteria that are not aware of the sample weights, | |||
like ``min_samples_leaf``. | |||
like ``min_samples_split``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding leads me to believe this change is grammatical, but please check?
hmm.. those changes look good, but you have also reminded me that we may
also want to deprecate min_weight_fraction_leaf! @amueller?
|
@jnothman @amueller thoughts on this? I see there was a lot of discussion on #11283 which was ultimately merged, so let me know if I should update this MR to any new standards/conventions that were decided, or if I can simply just rebase and fix conflicts. Happy to do a bit more work to get this over the line. |
I will admit i forgot about this one. I think it's a bit weird to deprecate this and not weight fraction if they both have the same problem |
Argh, is this a blocker? Maybe for the release, maybe not for the RC? |
I think min_weight_fraction_leaf should also be deprecated. |
I think at a minimum we should document that this doesn't work well as
regularisation... is it worth blocking for the deprecation? I think it's
acceptable to do for next release.
… |
fair and we can fix the docs after the RC. |
Continued and fixed in #11870 |
Reference Issues/PRs
Fixes #10773, see also #8399
What does this implement/fix? Explain your changes.
Deprecates min_samples_leaf in sklearn.ensemble.forest, sklearn.ensemble.gradient_boosting, and sklearn.tree.tree.
Any other comments?
The parameter is slated for removal is 0.22