# Decision Tree

$$
G(x) = \sum_{t=1}^T q_t(x) \cdot g_t(x)
$$

- base hypothesis $ g_t(x) $ : leaf at end of path t, a constant. (有些算法可能用 lin/regression 當 leaf)
- condition $ q_t(x) $: x 是在 path t 上嗎？
- 通常在分支處，用 simple internal nodes.

### Recursive View

$$
G(x) = \sum_{c=1}^C \big[ b(x) = c \big]_{boolean} \cdot G_c(x)
$$

- G(x): full-tree hypothesis
- b(x): branching criteria
- $ G_c(x) $: sub-tree hypothesis at the c<sup>th</sup> branch.

就如同遞迴函數一樣。

### Basic Decision Tree Algorithm

function DecisionTree = $ \Big( \mathcal{D} = \big\{ (x_n, y_n) \big\}_{n=1}^N \Big) $

Classification and Regression Tree (C&amp;RT)

function DecisionTree = $ \Big( \mathcal{D} = \big\{ (x_n, y_n) \big\}_{n=1}^N \Big) $

if termination criteria met:  
   - return base hypothesis $ g_t(x) $  

else ...
   - split D to C parts $ D_c = \big\{ (x_n, y_n) : b(x_n) = c \big\}$ 

##### 2 simple choices:

- C = 2 (binary tree)
- $ g_t(x) = E_{in} - $ optimal constant

##### Branching in C&amp;RT : Purifying

- simple internal node for C = 2: {1,2} : output decision stump

$$
b(x) = argmin_{\text{decision stump } h(x)} \sum_{c=1}^2 \ \ \big| \ D_c \ with \ h \ \big| \ \
\cdot \text{ impurity} ( D_c \ with \ h)
$$

用所有的 decision stump 組合找出違反量 impurity 最小的 $ g_t $

### Impurity Functions

##### Regression Error:

impurity(D) = $ \frac{1}{N} \sum_{n=1}^N (y_n - \overline{y})^2 $, average of $ y_n = \overline{y} $

##### Classification Error:

impurity(D) = $ \frac{1}{N} \sum_{n=1}^N \big[ y_n \ne y^{*} \big]_{boolean} $ ,
$ y^{*} = $ majority of $ \{ y_n \} $

for classification, Gini index:

$$
1 - \sum_{k=1}^K \Big( \frac{\sum_{n=1}^N \big[ y_n = k \big]_{boolean}}{N} \Big)^2
$$

Gini index 可以處理在多分類時，給出現較多次的分類相應較大的權重

#### Termination in C&amp;RT

- 所有的 $ y_n $ 都一樣: impurity = 0
- 所有的 $ x_n $ 都一樣, no decision stumps.

如果處理到最後，$ E_{in} = 0 $, 就是 full grown tree.

可能會有 overfit 的問題，要用 regularization 來控制。

$$
argmin_{\text{all possible G}} E_{in}(G) + \lambda \Omega(G)
$$

利用上式刪除掉一些 leaf, 就是 pruned decision tree.  
但是找出所有 G 可能的樣子不容易，  
所以做出 $ G^{(0)} $ = full grown tree, 之後，  
摘掉一片葉子的各種組合為 $ G^{(1)} $,  
如此獲得各種 $ G^{(i)} $, 來做 regularization.

#### Categorical Features

如果不是線性的數值結果，而是多個無次序的類別結果

可以將 decision stump 換成 decision subset 來做 branch: b(x)

$$
b(x) = \big[ x_i \in S \big]_{boolean} + 1 \\
S \subset \{ 1,2,...,K \}
$$

> another popular decision tree algorithm: C4.5