Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG +2 ] min_samples_split and min_samples_leaf now accept a percentage #5531

Merged
merged 2 commits into from Oct 26, 2015

Conversation

arjoly
Copy link
Member

@arjoly arjoly commented Oct 22, 2015

superseed #3359

@arjoly arjoly changed the title [WIP] min_samples_split and min_samples_leaf now accept a percentage [MRG] min_samples_split and min_samples_leaf now accept a percentage Oct 22, 2015
@arjoly
Copy link
Member Author

arjoly commented Oct 22, 2015

Should be good to go. Reviewer are welcome.

min_samples_split = max(self.min_samples_split,
2 * self.min_samples_leaf)

min_samples_split = max(min_samples_split, 2 * self.min_samples_leaf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt you compare with min_samples_leaf (and not self.min_samples_leaf), in case the latter was float?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed.

@glouppe
Copy link
Contributor

glouppe commented Oct 23, 2015

Some small things, but otherwise +1 once they are fixed.

Please squashed the commits as well.

@arjoly arjoly force-pushed the float-min_samples branch 3 times, most recently from ee559bd to da8f8dd Compare October 23, 2015 10:17
@arjoly
Copy link
Member Author

arjoly commented Oct 23, 2015

I squashed everything

" or in (0, 1], got %s" % min_samples_split)
if not (0. < self.min_samples_leaf <= 0.5 or
1 <= self.min_samples_leaf):
raise ValueError("min_samples_leaf must be at least than 1 "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So min_samples_leaf should be upper bounded by int(0.5 * n_samples) rather than 1 when min_samples_leaf is an integer since you cannot have at least greater than 0.5 * n_samples in each leaf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the upperbound self.min_samples_leaf < 0.5 * n_samples as this currently works without any problems. You just have a pre-pruning constraint that won't let you grow the tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, so the second line so should it not be

if not (.... or (1 <= self.min_samples_leaf  < int(n_samples / 2))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do that as it is perfectly valid to set min_samples_leaf greater than n_samples / 2. Furthermore, this is the current behavior and could be painful when performing cross-validation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks !

@MechCoder
Copy link
Member

LGTM as well

@arjoly arjoly changed the title [MRG] min_samples_split and min_samples_leaf now accept a percentage [MRG +2 ] min_samples_split and min_samples_leaf now accept a percentage Oct 23, 2015
@arjoly
Copy link
Member Author

arjoly commented Oct 24, 2015

rebase to fix conflict on the doc/whats new.

@arjoly
Copy link
Member Author

arjoly commented Oct 25, 2015

I can't get what is wrong with cirecleci. The build is so verbose with many warnings... :-/ Any idea @ogrisel, @amueller ?

@arjoly
Copy link
Member Author

arjoly commented Oct 26, 2015

Merging as the error is due to GP in the doc.

arjoly added a commit that referenced this pull request Oct 26, 2015
[MRG +2 ]  min_samples_split and min_samples_leaf now accept a percentage
@arjoly arjoly merged commit d9f3277 into scikit-learn:master Oct 26, 2015
@arjoly arjoly deleted the float-min_samples branch October 26, 2015 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants