[MRG] MNT remove duplicated call to children_impurity() #18203

NicolasHug · 2020-08-19T15:45:17Z

In the tree Splitter we call impurity_improvement() and then children_impurity().

But impurity_improvement() itself will call children_impurity() internally so this PR removes the duplicated work.

NicolasHug · 2020-08-19T15:45:36Z

sklearn/tree/_criterion.pyx

-        cdef double impurity_left
-        cdef double impurity_right
-
-        self.children_impurity(&impurity_left, &impurity_right)


This is the call that was removed.

alfaro96

Thank you @NicolasHug!

I think that with this PR, we can avoid duplicate calculations.

sklearn/tree/_criterion.pyx

alfaro96 · 2020-08-19T19:02:28Z

sklearn/tree/_splitter.pyx

            self.criterion.children_impurity(&best.impurity_left,
                                             &best.impurity_right)
+            best.improvement = self.criterion.impurity_improvement(
+                impurity, best.impurity_left, best.impurity_right)


IIUC, moving the call to children_impurity before impurity_improvement allow to pass the impurity of the children. Therefore, we avoid these duplicate calculations:

scikit-learn/sklearn/tree/_criterion.pyx

Lines 197 to 200 in 395d6c1

cdef double impurity_left

cdef double impurity_right

self.children_impurity(&impurity_left, &impurity_right)

Am I right?

Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>

glemaitre · 2020-08-20T11:55:22Z

sklearn/tree/_criterion.pyx

@@ -1305,23 +1308,3 @@ cdef class FriedmanMSE(MSE):
                self.weighted_n_left * total_sum_right)

        return diff * diff / (self.weighted_n_left * self.weighted_n_right)
-
-    cdef double impurity_improvement(self, double impurity) nogil:


What is the reason for this change here?
Are we calling only the proxy_impurity_improvement?

uhm it seems that with this change we will use the MSE impurity improvement

Good catch, thanks. I didn't realize Friedman MSE used a non-conventional improvement. Which makes me going through a rabbit hole, wondering whether this makes sense at all, but that's irrelevant for this PR. I put it back

Went down the same rabbit hole. :)

how did you get out? I'm still super confused about so many things. Like does friedman_mse make sense outside of GBDTs, and do we really want to allow a MAE splitting criteria when we already have the LAD loss... so many questions lol

I am not planning to get out anytime soon. There isn't many references of this criterion outside of https://statweb.stanford.edu/~jhf/ftp/trebst.pdf

Yeah... Here are my notes so far. I'll open an issue when I have a better idea of this all, but I'm happy to sync with you prior!

Does it really make sense to allow a criterion to be passed to GBDT? All trees
should be built via LS anyway. Friedman does mention e.g. for LAD that trees
could be built with LAD criterion but LS is just much faster.

WTF is friedman_mse?

was introduced here [MRG] Gradient Boosting enhancements #2570

it used to be the hardcoded and non-overidable default for all GBDT

Then [MRG+3] Add mean absolute error splitting criterion to DecisionTreeRegressor #6667 introduced LAD
criterion for all trees, and started exposing a criterion param to all
models.

Does it really make sense to allow it for all tree models??

It was introduced in the context of multiclass GB.

Also WTF are the weights? Our implementation completely differs from what the
paper defines. the weights in the paper aren't sample weights, they're the hessians

🥕 ?
@NicolasHug I was thinking/doubting about friedman_mse myself and would appreciate an issue if you come up with one.

…k_what_im_doing

…n into idk_what_im_doing

thomasjpfan

LGTM

thomasjpfan · 2020-08-20T16:29:03Z

sklearn/tree/_criterion.pyx

@@ -1305,23 +1308,3 @@ cdef class FriedmanMSE(MSE):
                self.weighted_n_left * total_sum_right)

        return diff * diff / (self.weighted_n_left * self.weighted_n_right)
-
-    cdef double impurity_improvement(self, double impurity) nogil:


Went down the same rabbit hole. :)

lorentzenchr

LGTM. @NicolasHug Nice catch!

lorentzenchr · 2020-08-20T21:23:50Z

@NicolasHug Would you like a what's new entry? It's supposedly a slight performance improvement.

glemaitre · 2020-08-21T08:31:57Z

@NicolasHug Would you like a what's new entry? It's supposedly a slight performance improvement.

I think we are fine merging as it is.

@NicolasHug Feel free to open an issue on the Friedman MSE. They are some relics in the code that only some old wizard knows about :) (even git blame does not help there)

…t-learn#18203) Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>

remove duplicated call

7ca3c47

NicolasHug commented Aug 19, 2020

View reviewed changes

github-actions bot added the module:tree label Aug 19, 2020

alfaro96 reviewed Aug 19, 2020

View reviewed changes

Apply suggestions from code review

f5a7581

Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>

alfaro96 approved these changes Aug 19, 2020

View reviewed changes

glemaitre reviewed Aug 20, 2020

View reviewed changes

NicolasHug added 3 commits August 20, 2020 09:30

putback friedman mse overrided method

cf33df7

Merge branch 'master' of github.com:scikit-learn/scikit-learn into id…

2a3bbc0

…k_what_im_doing

Merge branch 'idk_what_im_doing' of github.com:NicolasHug/scikit-lear…

78917a8

…n into idk_what_im_doing

thomasjpfan approved these changes Aug 20, 2020

View reviewed changes

lorentzenchr approved these changes Aug 20, 2020

View reviewed changes

glemaitre merged commit 22f232e into scikit-learn:master Aug 21, 2020

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

MNT remove duplicated call to children_impurity() in tree code (sciki…

5997cdd

…t-learn#18203) Co-authored-by: Juan Carlos Alfaro Jiménez <JuanCarlos.Alfaro@uclm.es>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] MNT remove duplicated call to children_impurity() #18203

[MRG] MNT remove duplicated call to children_impurity() #18203

NicolasHug commented Aug 19, 2020

NicolasHug Aug 19, 2020

alfaro96 left a comment

alfaro96 Aug 19, 2020

NicolasHug Aug 19, 2020

alfaro96 Aug 19, 2020

glemaitre Aug 20, 2020

glemaitre Aug 20, 2020

NicolasHug Aug 20, 2020

thomasjpfan Aug 20, 2020

NicolasHug Aug 20, 2020

thomasjpfan Aug 20, 2020

NicolasHug Aug 20, 2020

lorentzenchr Aug 20, 2020

thomasjpfan left a comment

thomasjpfan Aug 20, 2020

lorentzenchr left a comment

lorentzenchr commented Aug 20, 2020

glemaitre commented Aug 21, 2020

	cdef double impurity_left
	cdef double impurity_right

	self.children_impurity(&impurity_left, &impurity_right)

[MRG] MNT remove duplicated call to children_impurity() #18203

[MRG] MNT remove duplicated call to children_impurity() #18203

Conversation

NicolasHug commented Aug 19, 2020

Choose a reason for hiding this comment

alfaro96 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr left a comment

Choose a reason for hiding this comment

lorentzenchr commented Aug 20, 2020

glemaitre commented Aug 21, 2020