tree.export_graphviz() returns wrong count of examples in leaf nodes when using example weighting

When using the DecisionTreeClassifier() sample_weight parameter, and weighting examples by class lable such as 2:1 for class A versus class B, the **nvalues** in the leaves of the tree produced by tree.export_graphviz() shows a duplicate count for the number of examples for the weighted class.

This is misleading because the sum of each classes **nvalues** should equal the total number of examples in the parent node. It seems that there is a bug where the feature weighting factor is not removed when exporting the decision tree.

Here is the classifier with weighting:

``` python
clf = tree.DecisionTreeClassifier(
    criterion='gini', splitter='best', max_leaf_nodes=10,
    min_samples_split=10, min_samples_leaf=2, max_features=None,
    random_state=None).fit(X, y, sample_weight=df.LABEL.map({'A':2, 'B':1})) 
```

Here is the produced tree.dot file with the duplicate values shown in **bold**:

digraph Tree {
0 [label="X[106] <= 0.0203\ngini = 0.0731494237327\nsamples = 119377", shape="box"] ;
1 [label="X[34] <= 1266.6650\ngini = 0.0661567536396\nsamples = 118307", shape="box"] ;
0 -> 1 ;
3 [label="gini = 0.0578\nsamples = 111941\nvalue = [ **210926**.    6478.]", shape="box"] ;
1 -> 3 ;
4 [label="gini = 0.2103\nsamples = 6366\nvalue = [ **10016**.   1358.]", shape="box"] ;
1 -> 4 ;
2 [label="X[83] <= 6728.4551\ngini = 0.386307949063\nsamples = 1070", shape="box"] ;
0 -> 2 ;
5 [label="gini = 0.4994\nsamples = 364\nvalue = [ **254**.  237.]", shape="box"] ;
2 -> 5 ;
6 [label="gini = 0.1669\nsamples = 706\nvalue = [  **68**.  672.]", shape="box"] ;
2 -> 6 ;
}

Here is what the tree looks like with duplicate leaf example counts for the weighted class:

![img](https://cloud.githubusercontent.com/assets/2701562/4742046/e052576a-5a1b-11e4-8855-f017ea75facc.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tree.export_graphviz() returns wrong count of examples in leaf nodes when using example weighting #3794

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

tree.export_graphviz() returns wrong count of examples in leaf nodes when using example weighting #3794

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions