## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

# 決策樹/回歸

In [1]:
from sklearn import datasets, metrics, linear_model

# 如果是分類問題，請使用 DecisionTreeClassifier，若為回歸問題，請使用 DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score

In [2]:
# 讀取Boston house prices資料集(回歸問題)
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)

# 建立模型
regr = DecisionTreeRegressor() #決策樹


# 訓練模型
regr.fit(x_train, y_train)


# 預測測試集
y_pred = regr.predict(x_test)


In [3]:
regr.predict(x_test)
#print("Acuuracy: ", acc)

array([14.4, 22. , 20.9, 22.5, 50. , 22.9, 37.3, 22.5, 17.2, 15.4, 23.9,
       16.5, 21.9, 23.3, 23.2, 13.8, 17.2, 12.8, 10.4, 14.8, 10.4, 15.4,
       20.5, 20.1, 19. , 21.4, 13.4, 14.5, 23.1, 27.1,  9.5, 22.6, 36.5,
       29.6, 13.8,  9.5, 33.2, 46. , 24.8, 22.9, 46. , 24.8, 12.7, 30.1,
       25.1, 20.9, 50. , 19.4, 22.7, 22.2, 29.6, 23.8, 11.3, 27.1, 19.1,
       19.3, 22. , 33.1, 16.6, 33.1, 16.2, 21.4, 37. , 19.6, 43.1, 30.1,
       21. ,  8.3, 22.5, 23.1, 22. , 17.2, 22. , 30.1, 27.1, 33.4, 15.2,
       21. , 17.7, 22.2, 22.5, 15. , 26.6, 20.6, 25. , 20.6, 32.2, 24.5,
       22.5, 50. , 29.1, 50. , 19.4, 48.3, 24. , 19.4, 20. , 27.5, 15.6,
       19. ,  8.4, 19.4, 34.9, 14.5, 23.3, 19.9, 37.3, 30.3, 50. , 21.6,
       22.2, 19.6, 13.1, 37. , 36.2, 21.4, 50. , 15.4,  7. , 19.4, 21.2,
       12.5, 23.1, 21.4, 18.5, 24.4, 50. ])

In [4]:
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 28.98


In [5]:
print(boston.feature_names)

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']


In [6]:
print("Feature importance: ", regr.feature_importances_)

Feature importance:  [0.06637804 0.00135721 0.00658648 0.00088578 0.02963429 0.54097103
 0.01997392 0.06499983 0.00097147 0.00980093 0.02511083 0.00921711
 0.22411307]


# 回歸模型

In [7]:
# 讀取Boston house prices資料集(回歸問題)
boston = datasets.load_boston()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=4)

# 建立模型
regr = linear_model.LinearRegression() #回歸模型

# 訓練模型

regr.fit(x_train, y_train)

# 預測測試集
y_pred= regr.predict(x_test)

In [8]:
# 預測值與實際值的差距，使用 MSE
print("Mean squared error: %.2f"
      % mean_squared_error(y_test, y_pred))

Mean squared error: 26.95


# 決策樹/分類

In [9]:
# 讀取 wine 資料
wine = datasets.load_wine()

# 切分訓練集/測試集
x_train, x_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.1, random_state=4)

# 建立一個羅吉斯回歸模型
clf = DecisionTreeClassifier()

# 將訓練資料丟進去模型訓練
clf.fit(x_train, y_train)

# 將測試資料丟進模型得到預測結果
y_pred = clf.predict(x_test)

In [10]:
acc = accuracy_score(y_test, y_pred)
print("Accuracy: ", acc)

Accuracy:  0.9444444444444444
