Skip to content

Commit

Permalink
updated doc
Browse files Browse the repository at this point in the history
  • Loading branch information
manishamde committed Jun 4, 2014
1 parent adc7315 commit 8e44ab8
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions docs/mllib-decision-tree.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,17 @@ bins if the condition is not satisfied.

**Categorical features**

For `$M$` categorical features, one could come up with `$2^(M-1)-1$` split candidates. For
For `$M$` categorical feature values, one could come up with `$2^(M-1)-1$` split candidates. For
binary classification, we can reduce the number of split candidates to `$M-1$` by ordering the
categorical feature values by the proportion of labels falling in one of the two classes (see
Section 9.2.4 in
[Elements of Statistical Machine Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) for
details). For example, for a binary classification problem with one categorical feature with three
categories A, B and C with corresponding proportion of label 1 as 0.2, 0.6 and 0.4, the categorical
features are ordered as A followed by C followed B or A, B, C. The two split candidates are A \| C, B
and A , B \| C where \| denotes the split.
and A , B \| C where \| denotes the split. A similar ordering using impurity is performed
for categorical feature values in multiclass classification when `$2^(M-1)-1$` is
greater than the number of bins.

### Stopping rule

Expand Down

0 comments on commit 8e44ab8

Please sign in to comment.