-
-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Gradient Boosting enhancements #2570
Conversation
Moved verbose output code to VerboseReporter class. Cosmit: logical structured code blocks Refactored: moved init state to method Added partial_fit to GradientBoosting Fix: is_classification not available; n_features not n_features_ Tests Add ``partial_fit`` to docs
…rion.children_impurity. Pass partition impurity to stack to avoid re-computation (saves some runtime). New tree parameter: complete; specifies whether complete binary trees are grows or if a greedy branch of max_depth is grown with at most max_depth + 1 leafs.
max_leaf_nodes instead of complete parameter
Some timings against master to show that there is no performance regression for Forests.
Remarks: training timings are similar (PR slightly lower but IMHO not relevant) |
I looked into the difference between the results on solar: parameters (incl. random states) are the same for master and PR. Of all 20 trees there is only one tree that differs between the two versions. It seems that there are minor differences in the impurity scores (because now dont compute them when processing a node but pass them down from the parent split node) -- if impurity scores are below a threshold (1e-7) we do not try to split the node since its almost pure -- I tracked the number of times this is the case and it differs quite a bit between master and this PR. If this is indeed the case it should be affected by the setting of |
Here is a quick benchmark of sklearn vs gbm on covertype:: I used the same parameters for both (min_samples_leaf=5, n_estimators=100, num_leafs=4, learning rate=0.1)
f and c stand for fortran or c-style memory layout of X. |
Looks really good. Impressive! |
@GaelVaroquaux @glouppe I've updated the GBRT benchmark - I modified our tree code to support both fortran and c-style inputs (previously there was only c-style support) - will update the PR in a minute. As expected fortran layout is significantly better -- but we assume that when you use random feature sampling the advantage diminishes - I'll keep you posted. |
Performance after the above changes:
@glouppe this might interest you |
@pprett That's great! Handling both c-contiguous and fortan-style without making a new copy is really great! Thanks for working on this :) Regarding the new
|
@@ -1459,9 +1682,9 @@ cdef class Tree: | |||
return sizet_ptr_to_ndarray(self.n_node_samples, self.node_count) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? For easier inspection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you referring to? the return value of pop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was refering to the max_depth
attribute. But is seems to have disappeared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, maybe you meant why I set max_depth at the end of Tree.build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. self.max_depth = max_depth_seen
. Dunno why github messed with my comments :s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I am not against this addition, I was just surprised to see it. You could rename it to max_depth_
to enforce that it is learned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just convenience to study the nature of the learned trees -- do you think we should keep it (renamed) or throw it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for recording max_depth_
, it's very useful IMHO.
@glouppe just to make sure I understand correctly
Suppose you create a new split node (nodeid |
assert_almost_equal(0.84652100667116, score) | ||
# est.fit(X_train, y_train) | ||
# score = est.score(X_test, y_test) | ||
# assert_almost_equal(score, 0.86908506408880637) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well.. the honest answer is: because when you uncomment it it fails but I guess you know that...
truth is trees give slightly different results on 32bit vs 64bit arch - during the sprint in paris i did a change that made them be equal at the expense of 64bit precision. I'd rather see the test removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing the test. It is vain to try to have the exact same results on different architectures, always, in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it pass with assert_almost_equal(score, 0.869, 2)
?
|
@mblondel if you have a pipeline with a CV object and want to do grid-searches, you will always do a nested cross-validation, i.e. GridSearchCV will do a train/test split and the CV object will do another split. But you actually just want to do one cross-validation where GridSearchCV can take advantage of some path algorithm (as used by the CV objects). See #1626. |
remove troublesome 32bit test
@ogrisel I've addressed the two issues with the uninformative exceptions. I also removed the 32bit test - the difference is actually quite significant (64bit: 0.84758663315474814, 32bit: 0.82891483480560135). If you're fine with that I'd merge |
"%r" % max_leaf_nodes) | ||
if -1 < max_leaf_nodes < 2: | ||
raise ValueError(("max_leaf_nodes %r must be either smaller than 0 or " | ||
"larger than 1").format(max_leaf_nodes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this is not tested as it would raise a formatting exception: it should be:
raise ValueError("max_leaf_nodes {0} must be either smaller than 0 or "
"larger than 1".format(max_leaf_nodes))
Note that beside the %r
/ {0}
placeholder fix, you also should not need the additional pair of parens. But better add a test first to check that this is actually a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the coveralls reports this line as being tested but the test does not cover the result of the formating:
>>> "a{0}a".format('a')
'aaa'
>>> "a%ra".format('a')
'a%ra'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use:
from sklearn.utils.testing import assert_raise_message
assert_raise_message(ValueError, expected_msg, my_callable, args)
to check the content of a message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it doesn't raise a formatting exception because the line is covered -- thanks for the additional pair of eye-balls, I completely missed that and I rather not alter the whole test suite to check the message of each exception...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok not to rewrite the whole test suite to use assert_raise_message
but it's better to only use assert_raise_message
instead of assert_raise
from now on whenever the message is stable to make the test suite more explicit (easier to read by making the motivation explicit).
This PR looks good to me. +1 for merging on my side. |
[MRG] Gradient Boosting enhancements
Phew... finally! Great work Peter :) Next time, let's make our changes smaller ;) |
and thanks all for the reviews. |
It only took two months. Many PRs are open for longer. |
yeah - thanks for the throughout reviews! will do a smaller one next time... added stuff successively... bad style |
Well done @pprett !!! 🍻 |
@PPRET really happy to see these changes I was hoping for last year; all really nice improvements! nice work |
This PR contains a number of enhancements for GBRT:
warm_start
argument now allows to add additional trees to an already trained model.monitor
allows to inject a callback into the learning procedure that can implement early stopping, evaluate training progress on held-out, creating snap-shots, ...max_leaf_nodes
as a stopping criterion.Setting
max_leaf_nodes > 0
will grow the tree in a best-first fashing. Nodes are pushed on a PriorityQueue (impl. as a binary heap) and the node with the highest impurity improvement is expanded next.If
max_leaf_nodes < 0
then trees are grown in depth-first fashion by using a stack instead of the PriorityQueue.Trees grown with
max_leaf_nodes > 0
have at most depthmax_leaf_nodes - 1
and thus can model interactions of (at most)max_leaf_nodes - 1
features. Individual trees might, however, be shallower thenmax_leaf_nodes - 1
.ZeroEstimator
if you want to start your GBRT model from scratch.X
Tree ensemble estimators don't enforce a specific layout of
X
. Benchmarks indicate that fortran is indeed faster -- even forExtraTrees
.