DecisionTreeClassifier max_leaf_nodes tree.tree_.value is set to 0 for non leaf nodes. Normal behaviour? #4644

robdempsey · 2015-04-29T10:54:57Z

Hi

>>> sklearn.__version__
'0.15.2'

If I leave out max_leaf_nodes or set it to None when accessing tree.tree_.value all non leaf nodes values are set to zero and leaf node is correct. If I set the value to something like 10 the values are present?

dtree = tree.DecisionTreeClassifier(criterion = "entropy", min_samples_leaf = 1000, max_depth=4, max_leaf_nodes = None)

values    = tree.tree_.value

Iterating through a branch

[[ 0.  0.]]
[[ 0.  0.]]
[[ 0.  0.]]
[[ 1767.  6014.]]
18

dtree = tree.DecisionTreeClassifier(criterion = "entropy", min_samples_leaf = 1000, max_depth=4, max_leaf_nodes = 10)

values    = tree.tree_.value

Iterating through a branch

[[  45455.  166666.]]
[[  41958.  142857.]]
[[  8392.  28572.]]
18

Rob

The text was updated successfully, but these errors were encountered:

amueller · 2015-04-29T20:10:17Z

I think that is because different splitting stragegies are used depending on whether you set max_leaf_nodes or not. Some of them don't store the internal counts, as they are not needed for prediction.

glouppe · 2015-04-30T07:53:26Z

The reason was only to speed things up.

@arjoly I wonder however if this was indeed needed at all. It seems that these counts should be there already (since we need them to compute the impurity decrease), hence storing them shouldnt hurt much. What do you think?

arjoly · 2015-04-30T08:17:55Z

If benchmarks show that the tree growing procedure is not significantly slower, I am +1.
Note that there are more efficient memory-wise alternatives to store the node values for multi-output decision tree.

since we need them to compute the impurity decrease

At the moment, we don't need those. Could you clarify?

robdempsey · 2015-04-30T11:25:32Z

They are a useful output if you want to present the results to a non-technical audience and don’t want to explain entropy :-)

amueller · 2015-04-30T14:55:23Z

well you could reconstruct them from the output you have, there is a 1:1 mapping between leaves and path ;)

trevorstephens · 2015-04-30T16:37:34Z

#3735 ... just sayin... ;-)

andosa · 2015-04-30T20:53:56Z

PR created #4655

glouppe · 2015-05-04T16:20:15Z

This is now fixed :)

arjoly mentioned this issue Apr 30, 2015

Option to return full decision paths when predicting with decision trees or random forest #2937

Closed

andosa mentioned this issue Apr 30, 2015

[MRG + 3] Store values for all nodes instead of just leaves for decision trees #4655

Closed

glouppe closed this as completed May 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DecisionTreeClassifier max_leaf_nodes tree.tree_.value is set to 0 for non leaf nodes. Normal behaviour? #4644

DecisionTreeClassifier max_leaf_nodes tree.tree_.value is set to 0 for non leaf nodes. Normal behaviour? #4644

robdempsey commented Apr 29, 2015

amueller commented Apr 29, 2015

glouppe commented Apr 30, 2015

arjoly commented Apr 30, 2015

robdempsey commented Apr 30, 2015

amueller commented Apr 30, 2015

trevorstephens commented Apr 30, 2015

andosa commented Apr 30, 2015

glouppe commented May 4, 2015

DecisionTreeClassifier max_leaf_nodes tree.tree_.value is set to 0 for non leaf nodes. Normal behaviour? #4644

DecisionTreeClassifier max_leaf_nodes tree.tree_.value is set to 0 for non leaf nodes. Normal behaviour? #4644

Comments

robdempsey commented Apr 29, 2015

amueller commented Apr 29, 2015

glouppe commented Apr 30, 2015

arjoly commented Apr 30, 2015

robdempsey commented Apr 30, 2015

amueller commented Apr 30, 2015

trevorstephens commented Apr 30, 2015

andosa commented Apr 30, 2015

glouppe commented May 4, 2015