Skip to content

tree.export_graphviz() returns wrong count of examples in leaf nodes when using example weighting #3794

@dragoljub

Description

@dragoljub

When using the DecisionTreeClassifier() sample_weight parameter, and weighting examples by class lable such as 2:1 for class A versus class B, the nvalues in the leaves of the tree produced by tree.export_graphviz() shows a duplicate count for the number of examples for the weighted class.

This is misleading because the sum of each classes nvalues should equal the total number of examples in the parent node. It seems that there is a bug where the feature weighting factor is not removed when exporting the decision tree.

Here is the classifier with weighting:

clf = tree.DecisionTreeClassifier(
    criterion='gini', splitter='best', max_leaf_nodes=10,
    min_samples_split=10, min_samples_leaf=2, max_features=None,
    random_state=None).fit(X, y, sample_weight=df.LABEL.map({'A':2, 'B':1})) 

Here is the produced tree.dot file with the duplicate values shown in bold:

digraph Tree {
0 [label="X[106] <= 0.0203\ngini = 0.0731494237327\nsamples = 119377", shape="box"] ;
1 [label="X[34] <= 1266.6650\ngini = 0.0661567536396\nsamples = 118307", shape="box"] ;
0 -> 1 ;
3 [label="gini = 0.0578\nsamples = 111941\nvalue = [ 210926. 6478.]", shape="box"] ;
1 -> 3 ;
4 [label="gini = 0.2103\nsamples = 6366\nvalue = [ 10016. 1358.]", shape="box"] ;
1 -> 4 ;
2 [label="X[83] <= 6728.4551\ngini = 0.386307949063\nsamples = 1070", shape="box"] ;
0 -> 2 ;
5 [label="gini = 0.4994\nsamples = 364\nvalue = [ 254. 237.]", shape="box"] ;
2 -> 5 ;
6 [label="gini = 0.1669\nsamples = 706\nvalue = [ 68. 672.]", shape="box"] ;
2 -> 6 ;
}

Here is what the tree looks like with duplicate leaf example counts for the weighted class:

img

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions