## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [1]:
from sklearn import datasets, metrics

# 如果是分類問題，請使用 DecisionTreeClassifier，若為回歸問題，請使用 DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

In [2]:
# 讀取鳶尾花資料集
iris = datasets.load_iris()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.25, random_state=4)

# 建立模型
clf = DecisionTreeClassifier(criterion="entropy")

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

In [3]:
acc = metrics.accuracy_score(y_test, y_pred)
print("Acuuracy: ", acc)
print("[Feature importance]", )
for name, important in zip(iris.feature_names, clf.feature_importances_):
    print(name, important)

Acuuracy:  0.9736842105263158
[Feature importance]
sepal length (cm) 0.0
sepal width (cm) 0.01560620187870998
petal length (cm) 0.07501716294579418
petal width (cm) 0.9093766351754958


In [4]:
# boston: regression
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.1, random_state=4)

# 建立模型
clf = DecisionTreeRegressor()

# 訓練模型
clf.fit(x_train, y_train)

# 預測測試集
y_pred = clf.predict(x_test)

In [5]:
print("r2 score: %.2f"
      % metrics.r2_score(y_test, y_pred))
print("[Feature importance]", )
for name, important in zip(boston.feature_names, clf.feature_importances_):
    print(name, important)

r2 score: 0.72
[Feature importance]
CRIM 0.0686047029519018
ZN 0.0014623228890225579
INDUS 0.010029153476023215
CHAS 0.0009508267012141993
NOX 0.015994616806144387
RM 0.5621869736687402
AGE 0.008544216749902557
DIS 0.07745691854249057
RAD 0.0016625674996536264
TAX 0.009959704834785662
PTRATIO 0.024906329304118124
B 0.008025546269865298
LSTAT 0.210216120306138


In [8]:
from sklearn import datasets, linear_model

regr = linear_model.LinearRegression()
regr.fit(x_train, y_train)
y_pred = regr.predict(x_test)

print("r2 score: %.2f"
      % metrics.r2_score(y_test, y_pred))

print("[Coefficients]", )
for name, important in zip(boston.feature_names, regr.coef_):
    print(name, important)

r2 score: 0.79
[Coefficients]
CRIM -0.12585665878406954
ZN 0.0484257396100201
INDUS 0.01840852809252633
CHAS 3.085095691516899
NOX -17.327701820564606
RM 3.6167471330861467
AGE 0.0021918185271774765
DIS -1.4936113225001264
RAD 0.3199792000272681
TAX -0.01272946486141267
PTRATIO -0.927469085924641
B 0.009509124683760478
LSTAT -0.5335924706228666
