## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

In [1]:
from sklearn import datasets, metrics
import numpy as np

# 如果是分類問題，請使用 DecisionTreeClassifier，若為回歸問題，請使用 DecisionTreeRegressor
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.model_selection import train_test_split

# Iris, change test size

In [2]:
iris = datasets.load_iris()

sizes = np.linspace(0.1, 0.9, 9)
print(f'random_state: 4')

for size in sizes:

    x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size = size, random_state = 4)

    clf = DecisionTreeClassifier()

    clf.fit(x_train, y_train)

    y_pred = clf.predict(x_test)

    print('**************************************')
    print(f'test_size: {size}')
    
    acc = metrics.accuracy_score(y_test, y_pred)
    print("Acuuracy: ", acc)

    print(iris.feature_names)
    print("Feature importance: ", clf.feature_importances_)

**************************************
test_size: 0.1
Acuuracy:  0.9333333333333333
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.01482213 0.         0.54500659 0.44017128]
**************************************
test_size: 0.2
Acuuracy:  0.9666666666666667
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.01677501 0.         0.05652535 0.92669965]
**************************************
test_size: 0.30000000000000004
Acuuracy:  0.9782608695652174
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.         0.01943199 0.0643173  0.91625071]
**************************************
test_size: 0.4
Acuuracy:  0.9666666666666667
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.03916449 0.         0.06959508 0.89124043]
**************************************
test_size

# Iris, change random_state

In [5]:
print(f'test_size: 0.2')

for i in range(4, 44, 2):
    x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size = 0.2, random_state = i)

    clf = DecisionTreeClassifier()

    clf.fit(x_train, y_train)

    y_pred = clf.predict(x_test)
    
    print('**************************************')
    print(f'random_state: {i}')
    
    acc = metrics.accuracy_score(y_test, y_pred)
    print("Acuuracy: ", acc)

    print(iris.feature_names)
    print("Feature importance: ", clf.feature_importances_)

**************************************
random_state: 4
Acuuracy:  0.9666666666666667
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.         0.01677501 0.51670178 0.46652322]
**************************************
random_state: 6
Acuuracy:  0.9333333333333333
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.         0.03334028 0.55501128 0.41164844]
**************************************
random_state: 8
Acuuracy:  0.9
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.01667014 0.         0.03101421 0.95231565]
**************************************
random_state: 10
Acuuracy:  0.9333333333333333
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Feature importance:  [0.         0.02922146 0.40709787 0.56368067]
**************************************
random_state: 12
Acuuracy:  0.93333

# Boston

In [17]:
boston = datasets.load_boston()

reg = DecisionTreeRegressor()
    
print('random_state: 4')
    
for size in sizes:

    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = size, random_state = 4)

    reg.fit(x_train, y_train)

    y_pred = reg.predict(x_test)

    print('**************************************')
    print(f'test_size: {size}')
    
    acc = reg.score(x_test, y_test)
    print("Acuuracy: ", acc)

    print(boston.feature_names)
    print("Feature importance: ", reg.feature_importances_)

**************************************
test_size: 0.1
Acuuracy:  0.7264781973645271
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [0.04559684 0.00126866 0.01020991 0.00144846 0.04475773 0.56189001
 0.00729484 0.07724839 0.00198908 0.00994163 0.01004194 0.00761176
 0.22070075]
**************************************
test_size: 0.2
Acuuracy:  0.7243638577694661
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [3.48986698e-02 5.99430975e-04 4.13067329e-03 6.69054158e-06
 3.28375989e-02 5.98655714e-01 8.71088692e-03 6.30037067e-02
 8.33083457e-04 1.74966757e-02 2.12743340e-02 5.88057120e-03
 2.11671965e-01]
**************************************
test_size: 0.30000000000000004
Acuuracy:  0.6689995385947569
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [0.05075497 0.00250909 0.00660836 0.00060146 0.01641713 0.548

In [18]:
print('test size: 0.2')

for i in range(4, 44, 2):

    x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = 0.2, random_state = is)

    reg.fit(x_train, y_train)

    y_pred = reg.predict(x_test)

    print('**************************************')
    print(f'random_state: {i}')
    
    acc = reg.score(x_test, y_test)
    print("Acuuracy: ", acc)

    print(boston.feature_names)
    print("Feature importance: ", reg.feature_importances_)

test size: 0.2
**************************************
random_state: 42
Acuuracy:  0.7369053718206147
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [0.03591615 0.00151215 0.01045515 0.00095117 0.02890575 0.56162066
 0.00845424 0.0858597  0.00184778 0.02449725 0.01019445 0.00809212
 0.22169343]
**************************************
random_state: 42
Acuuracy:  0.7356708110263794
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [3.58884485e-02 5.69967440e-04 1.00719891e-02 5.58081716e-04
 5.04312670e-02 5.80903206e-01 8.29447472e-03 6.12914476e-02
 6.28108906e-04 1.72338454e-02 1.66090538e-02 7.21389589e-03
 2.10306214e-01]
**************************************
random_state: 42
Acuuracy:  0.6816187346901614
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature importance:  [5.29860758e-02 1.50398367e-03 1.54301347e-02 6.23769343

0.7143521569249532