## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [2]:
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

In [3]:
# 讀取鳶尾花資料集
iris = datasets.load_iris()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

# 建立模型
clf = DecisionTreeClassifier()

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

In [6]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)

Acuuracy:  0.9736842105263158


In [7]:
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [8]:
print("Feature importance: ", clf.feature_importances_)

Feature importance:  [0.         0.01796599 0.52229134 0.45974266]


In [25]:
# 1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)
for i in range(1, 15):
    clf = DecisionTreeClassifier(criterion='entropy', min_samples_leaf=2, max_depth=i)
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    acc = metrics.accuracy_score(y_test, y_pred)
    print("max_depth={}".format(i), "Acuuracy: ", acc)

max_depth=1 Acuuracy:  0.6842105263157895
max_depth=2 Acuuracy:  0.9736842105263158
max_depth=3 Acuuracy:  0.9736842105263158
max_depth=4 Acuuracy:  0.9736842105263158
max_depth=5 Acuuracy:  1.0
max_depth=6 Acuuracy:  1.0
max_depth=7 Acuuracy:  1.0
max_depth=8 Acuuracy:  0.9736842105263158
max_depth=9 Acuuracy:  0.9736842105263158
max_depth=10 Acuuracy:  0.9736842105263158
max_depth=11 Acuuracy:  0.9736842105263158
max_depth=12 Acuuracy:  0.9736842105263158
max_depth=13 Acuuracy:  0.9736842105263158
max_depth=14 Acuuracy:  1.0


In [26]:
# 2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較
# 讀取boston資料集
boston = datasets.load_boston()

In [35]:
import pandas as pd
df = pd.DataFrame(data=boston.data, columns=boston.feature_names)
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33


In [36]:
# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)

# 建立模型
clf = DecisionTreeRegressor()

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

In [39]:
mae = metrics.mean_absolute_error(y_pred, y_test) # 使用 MAE 評估
mse = metrics.mean_squared_error(y_pred, y_test) # 使用 MSE 評估
r2 = metrics.r2_score(y_pred, y_test) # 使用 r-square 評估
print("MAE: ", mae)
print("MSE: ", mse)
print("R-square: ", r2)

MAE:  3.3842519685039374
MSE:  26.576692913385823
R-square:  0.7329519654976232


In [43]:
for i in range(1, 15):
    clf = DecisionTreeRegressor(min_samples_leaf=i)
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    mae = metrics.mean_absolute_error(y_pred, y_test) # 使用 MAE 評估
    mse = metrics.mean_squared_error(y_pred, y_test) # 使用 MSE 評估
    r2 = metrics.r2_score(y_pred, y_test) # 使用 r-square 評估
    print("MAE: ", mae, "MSE: ", mse, "R-square: ", r2)

MAE:  3.3307086614173227 MSE:  26.583622047244095 R-square:  0.736809279507619
MAE:  3.2477690288713905 MSE:  27.73778652668416 R-square:  0.7164097951357775
MAE:  3.056876640419947 MSE:  22.078573665791765 R-square:  0.7826877707653889
MAE:  3.0568916385451823 MSE:  22.522241238595175 R-square:  0.7655667887288224
MAE:  2.9676859142607173 MSE:  21.774534834633773 R-square:  0.7720622488922928
MAE:  3.153219171467202 MSE:  24.507173079282644 R-square:  0.7457354022425329
MAE:  3.0389410720513084 MSE:  23.376328685648122 R-square:  0.7346678783102245
MAE:  2.9246786393833633 MSE:  24.101316457064595 R-square:  0.7227273794268401
MAE:  2.9908794380957415 MSE:  24.11555415801353 R-square:  0.7151759152076728
MAE:  3.1911855207868234 MSE:  24.5516829824093 R-square:  0.6946532189268637
MAE:  3.124865850219328 MSE:  24.316626839434054 R-square:  0.692655900977418
MAE:  3.1379354616871997 MSE:  24.413029779257506 R-square:  0.6876845540355423
MAE:  3.1609796989460026 MSE:  24.721236012197203