# Julia 機器學習：DecisionTree 決策樹

本範例需要使用到的套件有 DecisionTree、ScikitLearn，請在執行以下範例前先安裝。

```
] add DecisionTree
] add ScikitLearn
```

In [1]:
using Pkg
Pkg.add("DecisionTree")
Pkg.add("ScikitLearn")
using DecisionTree
using ScikitLearn.CrossValidation: cross_val_score

[32m[1m   Updating[22m[39m registry at `C:\Users\Andy Chen\.julia\registries\General`

[?25l


[32m[1m   Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`




[32m[1m  Resolving[22m[39m package versions...
[32m[1m  Installed[22m[39m DecisionTree ─ v0.10.1
[32m[1m   Updating[22m[39m `C:\Users\Andy Chen\.julia\environments\v1.4\Project.toml`
 [90m [7806a523][39m[92m + DecisionTree v0.10.1[39m
[32m[1m   Updating[22m[39m `C:\Users\Andy Chen\.julia\environments\v1.4\Manifest.toml`
 [90m [7806a523][39m[92m + DecisionTree v0.10.1[39m
[32m[1m  Resolving[22m[39m package versions...
[32m[1m  Installed[22m[39m PDMats ─────────────────────── v0.9.12
[32m[1m  Installed[22m[39m Rmath_jll ──────────────────── v0.2.2+0
[32m[1m  Installed[22m[39m IterTools ──────────────────── v1.3.0
[32m[1m  Installed[22m[39m CommonSubexpressions ───────── v0.2.0
[32m[1m  Installed[22m[39m Zygote ─────────────────────── v0.4.20
[32m[1m  Installed[22m[39m StaticArrays ───────────────── v0.12.3
[32m[1m  Installed[22m[39m Documenter ─────────────────── v0.24.11
[32m[1m  Installed[22m[39m Distances ────────────────────

## 載入資料

In [2]:
features, labels = DecisionTree.load_data("iris");

## Casting

In [3]:
features = float.(features)
labels = string.(labels)

150-element Array{String,1}:
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 "Iris-setosa"
 ⋮
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"
 "Iris-virginica"

## 決策樹模型

In [4]:
model = DecisionTree.DecisionTreeClassifier(max_depth=2)

DecisionTreeClassifier
max_depth:                2
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  nothing
root:                     nothing

可用模型:

* `DecisionTreeClassifier`
* `DecisionTreeRegressor`
* `RandomForestClassifier`
* `RandomForestRegressor`
* `AdaBoostStumpClassifier`

## 訓練

In [5]:
DecisionTree.fit!(model, features, labels)

DecisionTreeClassifier
max_depth:                2
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
root:                     Decision Tree
Leaves: 3
Depth:  2

## 印出決策樹

In [6]:
DecisionTree.print_tree(model, 5)

Feature 4, Threshold 0.8
L-> Iris-setosa : 50/50
R-> Feature 4, Threshold 1.75
    L-> Iris-versicolor : 49/54
    R-> Iris-virginica : 45/46


## 預測

In [7]:
new_iris = [5.9, 3.0, 5.1, 1.9]
DecisionTree.predict(model, new_iris)

"Iris-virginica"

In [8]:
DecisionTree.predict_proba(model, new_iris)

3-element Array{Float64,1}:
 0.0
 0.021739130434782608
 0.9782608695652174

## `predict_proba` 對應的類別

In [9]:
DecisionTree.get_classes(model)

3-element Array{String,1}:
 "Iris-setosa"
 "Iris-versicolor"
 "Iris-virginica"

## 隨機森林模型

In [10]:
model = DecisionTree.RandomForestClassifier(n_trees=50, max_depth=2)

RandomForestClassifier
n_trees:             50
n_subfeatures:       -1
partial_sampling:    0.7
max_depth:           2
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             nothing
ensemble:            nothing

## 訓練

In [11]:
DecisionTree.fit!(model, features, labels)

RandomForestClassifier
n_trees:             50
n_subfeatures:       -1
partial_sampling:    0.7
max_depth:           2
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
ensemble:            Ensemble of Decision Trees
Trees:      50
Avg Leaves: 3.22
Avg Depth:  2.0

## 預測

In [12]:
new_iris = [5.9, 3.0, 5.1, 1.9]
DecisionTree.predict(model, new_iris)

"Iris-virginica"

## 交叉驗證

In [13]:
accuracy = cross_val_score(model, features, labels, cv=5)

5-element Array{Float64,1}:
 0.9333333333333333
 0.9666666666666667
 0.9
 0.9
 1.0