New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG +2 ] min_samples_split and min_samples_leaf now accept a percentage #5531
Conversation
Should be good to go. Reviewer are welcome. |
min_samples_split = max(self.min_samples_split, | ||
2 * self.min_samples_leaf) | ||
|
||
min_samples_split = max(min_samples_split, 2 * self.min_samples_leaf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt you compare with min_samples_leaf
(and not self.min_samples_leaf
), in case the latter was float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed.
Some small things, but otherwise +1 once they are fixed. Please squashed the commits as well. |
ee559bd
to
da8f8dd
Compare
I squashed everything |
" or in (0, 1], got %s" % min_samples_split) | ||
if not (0. < self.min_samples_leaf <= 0.5 or | ||
1 <= self.min_samples_leaf): | ||
raise ValueError("min_samples_leaf must be at least than 1 " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So min_samples_leaf
should be upper bounded by int(0.5 * n_samples) rather than 1 when min_samples_leaf
is an integer since you cannot have at least greater than 0.5 * n_samples in each leaf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the upperbound self.min_samples_leaf < 0.5 * n_samples
as this currently works without any problems. You just have a pre-pruning constraint that won't let you grow the tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, so the second line so should it not be
if not (.... or (1 <= self.min_samples_leaf < int(n_samples / 2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do that as it is perfectly valid to set min_samples_leaf greater than n_samples / 2. Furthermore, this is the current behavior and could be painful when performing cross-validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks !
LGTM as well |
da8f8dd
to
1514eec
Compare
…er as percentage.
57529a7
to
e10bc92
Compare
rebase to fix conflict on the doc/whats new. |
e10bc92
to
b78e6d5
Compare
b78e6d5
to
a20e37a
Compare
Merging as the error is due to GP in the doc. |
[MRG +2 ] min_samples_split and min_samples_leaf now accept a percentage
superseed #3359