-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] MNT remove duplicated call to children_impurity() #18203
Changes from all commits
7ca3c47
f5a7581
cf33df7
2a3bbc0
78917a8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -171,7 +171,9 @@ cdef class Criterion: | |
return (- self.weighted_n_right * impurity_right | ||
- self.weighted_n_left * impurity_left) | ||
|
||
cdef double impurity_improvement(self, double impurity) nogil: | ||
cdef double impurity_improvement(self, double impurity_parent, | ||
double impurity_left, | ||
double impurity_right) nogil: | ||
"""Compute the improvement in impurity | ||
|
||
This method computes the improvement in impurity when a split occurs. | ||
|
@@ -186,24 +188,25 @@ cdef class Criterion: | |
|
||
Parameters | ||
---------- | ||
impurity : double | ||
The initial impurity of the node before the split | ||
impurity_parent : double | ||
The initial impurity of the parent node before the split | ||
|
||
impurity_left : double | ||
The impurity of the left child | ||
|
||
impurity_right : double | ||
The impurity of the right child | ||
|
||
Return | ||
------ | ||
double : improvement in impurity after the split occurs | ||
""" | ||
|
||
cdef double impurity_left | ||
cdef double impurity_right | ||
|
||
self.children_impurity(&impurity_left, &impurity_right) | ||
|
||
return ((self.weighted_n_node_samples / self.weighted_n_samples) * | ||
(impurity - (self.weighted_n_right / | ||
self.weighted_n_node_samples * impurity_right) | ||
- (self.weighted_n_left / | ||
self.weighted_n_node_samples * impurity_left))) | ||
(impurity_parent - (self.weighted_n_right / | ||
self.weighted_n_node_samples * impurity_right) | ||
- (self.weighted_n_left / | ||
self.weighted_n_node_samples * impurity_left))) | ||
|
||
|
||
cdef class ClassificationCriterion(Criterion): | ||
|
@@ -1306,7 +1309,9 @@ cdef class FriedmanMSE(MSE): | |
|
||
return diff * diff / (self.weighted_n_left * self.weighted_n_right) | ||
|
||
cdef double impurity_improvement(self, double impurity) nogil: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for this change here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. uhm it seems that with this change we will use the MSE impurity improvement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, thanks. I didn't realize Friedman MSE used a non-conventional improvement. Which makes me going through a rabbit hole, wondering whether this makes sense at all, but that's irrelevant for this PR. I put it back There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Went down the same rabbit hole. :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how did you get out? I'm still super confused about so many things. Like does friedman_mse make sense outside of GBDTs, and do we really want to allow a MAE splitting criteria when we already have the LAD loss... so many questions lol There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not planning to get out anytime soon. There isn't many references of this criterion outside of https://statweb.stanford.edu/~jhf/ftp/trebst.pdf There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah... Here are my notes so far. I'll open an issue when I have a better idea of this all, but I'm happy to sync with you prior! Does it really make sense to allow a criterion to be passed to GBDT? All trees WTF is friedman_mse?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🥕 ? |
||
cdef double impurity_improvement(self, double impurity_parent, double | ||
impurity_left, double impurity_right) nogil: | ||
# Note: none of the arguments are used here | ||
cdef double* sum_left = self.sum_left | ||
cdef double* sum_right = self.sum_right | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -434,9 +434,10 @@ cdef class BestSplitter(BaseDenseSplitter): | |||||||||
|
||||||||||
self.criterion.reset() | ||||||||||
self.criterion.update(best.pos) | ||||||||||
best.improvement = self.criterion.impurity_improvement(impurity) | ||||||||||
self.criterion.children_impurity(&best.impurity_left, | ||||||||||
&best.impurity_right) | ||||||||||
best.improvement = self.criterion.impurity_improvement( | ||||||||||
impurity, best.impurity_left, best.impurity_right) | ||||||||||
Comment on lines
437
to
+440
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, moving the call to scikit-learn/sklearn/tree/_criterion.pyx Lines 197 to 200 in 395d6c1
Am I right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM so! |
||||||||||
|
||||||||||
# Respect invariant for constant features: the original order of | ||||||||||
# element in features[:n_known_constants] must be preserved for sibling | ||||||||||
|
@@ -745,9 +746,10 @@ cdef class RandomSplitter(BaseDenseSplitter): | |||||||||
|
||||||||||
self.criterion.reset() | ||||||||||
self.criterion.update(best.pos) | ||||||||||
best.improvement = self.criterion.impurity_improvement(impurity) | ||||||||||
self.criterion.children_impurity(&best.impurity_left, | ||||||||||
&best.impurity_right) | ||||||||||
best.improvement = self.criterion.impurity_improvement( | ||||||||||
impurity, best.impurity_left, best.impurity_right) | ||||||||||
|
||||||||||
# Respect invariant for constant features: the original order of | ||||||||||
# element in features[:n_known_constants] must be preserved for sibling | ||||||||||
|
@@ -1293,9 +1295,10 @@ cdef class BestSparseSplitter(BaseSparseSplitter): | |||||||||
|
||||||||||
self.criterion.reset() | ||||||||||
self.criterion.update(best.pos) | ||||||||||
best.improvement = self.criterion.impurity_improvement(impurity) | ||||||||||
self.criterion.children_impurity(&best.impurity_left, | ||||||||||
&best.impurity_right) | ||||||||||
best.improvement = self.criterion.impurity_improvement( | ||||||||||
impurity, best.impurity_left, best.impurity_right) | ||||||||||
|
||||||||||
# Respect invariant for constant features: the original order of | ||||||||||
# element in features[:n_known_constants] must be preserved for sibling | ||||||||||
|
@@ -1504,10 +1507,10 @@ cdef class RandomSparseSplitter(BaseSparseSplitter): | |||||||||
|
||||||||||
if current_proxy_improvement > best_proxy_improvement: | ||||||||||
best_proxy_improvement = current_proxy_improvement | ||||||||||
current.improvement = self.criterion.impurity_improvement(impurity) | ||||||||||
|
||||||||||
self.criterion.children_impurity(¤t.impurity_left, | ||||||||||
¤t.impurity_right) | ||||||||||
current.improvement = self.criterion.impurity_improvement( | ||||||||||
impurity, current.impurity_left, current.impurity_right) | ||||||||||
best = current | ||||||||||
|
||||||||||
# Reorganize into samples[start:best.pos] + samples[best.pos:end] | ||||||||||
|
@@ -1521,9 +1524,10 @@ cdef class RandomSparseSplitter(BaseSparseSplitter): | |||||||||
|
||||||||||
self.criterion.reset() | ||||||||||
self.criterion.update(best.pos) | ||||||||||
best.improvement = self.criterion.impurity_improvement(impurity) | ||||||||||
self.criterion.children_impurity(&best.impurity_left, | ||||||||||
&best.impurity_right) | ||||||||||
best.improvement = self.criterion.impurity_improvement( | ||||||||||
impurity, best.impurity_left, best.impurity_right) | ||||||||||
|
||||||||||
# Respect invariant for constant features: the original order of | ||||||||||
# element in features[:n_known_constants] must be preserved for sibling | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the call that was removed.