Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upDecisionTreeClassifier has random behaviour with splitter="best"? #2386
Comments
This comment has been minimized.
This comment has been minimized.
During the search for the best split, features are shuffled at each node. If I set the
|
This comment has been minimized.
This comment has been minimized.
Yeah, I get deterministic behaviour too (and the same output as you) when setting the random state. I am a bit surprised to see any random behaviour at all in the default DecisionTreeClassifier. It could be that there are tied variables. Good point. Maybe the variables should not be shuffled? Any thoughts on that? |
This comment has been minimized.
This comment has been minimized.
Hmm... or maybe it is a nice feature to shuffle the variables at each node and I am just used to a more deterministic implementation elsewhere... If this really is where the random behaviour is manifesting itself, then I think I'm fine with it. I suppose shuffling the features reduces the greediness of the algorithm somewhat. |
This comment has been minimized.
This comment has been minimized.
I think Arnaud is right - the order in which variables are searched is 2013/8/23 Noel Dawe notifications@github.com
Peter Prettenhofer |
This comment has been minimized.
This comment has been minimized.
Yes Arnaud is right, this comes from ties on the features to split on (this happens more often than one may think). |
This comment has been minimized.
This comment has been minimized.
OK, thanks for the confirmation! I'm happy with the feature shuffling. Closing this issue. |
See below:
output:
ping @glouppe