## [作業重點]
目前你應該已經要很清楚資料集中，資料的型態是什麼樣子囉！包含特徵 (features) 與標籤 (labels)。因此要記得未來不管什麼專案，必須要把資料清理成相同的格式，才能送進模型訓練。
今天的作業開始踏入決策樹這個非常重要的模型，請務必確保你理解模型中每個超參數的意思，並試著調整看看，對最終預測結果的影響為何

## 作業

1. 試著調整 DecisionTreeClassifier(...) 中的參數，並觀察是否會改變結果？
2. 改用其他資料集 (boston, wine)，並與回歸模型的結果進行比較

# Dataset: iris

In [1]:
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.linear_model import LassoCV, RidgeCV, LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import GridSearchCV
import pandas as pd

In [2]:
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=9487)

In [3]:
params = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [5, 10, None],
    'min_samples_split':[2, 3, 4],
    'min_samples_leaf': [1, 2, 3]
}

clf = DecisionTreeClassifier()
grid = GridSearchCV(
    estimator=clf,
    param_grid=params,
    n_jobs=-1,
    iid=True,
    cv=5
)
grid.fit(x_train, y_train)
cv_results = pd.DataFrame(grid.cv_results_)
cv_results.sort_values("rank_test_score", ascending=True, inplace=True)

best_params_info = cv_results.iloc[0]
best_clf = DecisionTreeClassifier(**best_params_info['params']).fit(x_train, y_train)
worst_params_info = cv_results.iloc[-1]
worst_clf = DecisionTreeClassifier(**worst_params_info['params']).fit(x_train, y_train)

print(f"Best Params: {best_params_info['params']},\n"
      f"CV-Score:{best_params_info['mean_test_score']:.3f} +/- {best_params_info['std_test_score']:.4f}\n"
      f"Test-Score:{best_clf.score(x_test, y_test)}"
      f"\n==\n"
      f"Worst Params:{worst_params_info['params']},\n"
      f"Score:{worst_params_info['mean_test_score']:.3f} +/- {worst_params_info['std_test_score']:.4f}\n"
      f"Test-Score:{worst_clf.score(x_test, y_test)}"
     )


Best Params: {'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 3, 'min_samples_split': 4},
CV-Score:0.942 +/- 0.0206
Test-Score:1.0
==
Worst Params:{'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 2},
Score:0.925 +/- 0.0159
Test-Score:0.9666666666666667


In [4]:
pd.set_option('max_colwidth', 100)
cv_results[['params', 'mean_test_score', 'std_test_score', 'rank_test_score']]

Unnamed: 0,params,mean_test_score,std_test_score,rank_test_score
26,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 3, 'min_samples_split': 4}",0.941667,0.020615,1
45,"{'criterion': 'entropy', 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 2}",0.941667,0.01966,1
25,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 3, 'min_samples_split': 3}",0.941667,0.020615,1
24,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 3, 'min_samples_split': 2}",0.941667,0.020615,1
23,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 4}",0.941667,0.020615,1
22,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 3}",0.941667,0.020615,1
21,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 2, 'min_samples_split': 2}",0.941667,0.020615,1
20,"{'criterion': 'gini', 'max_depth': None, 'min_samples_leaf': 1, 'min_samples_split': 4}",0.941667,0.020615,1
38,"{'criterion': 'entropy', 'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 4}",0.941667,0.020615,1
16,"{'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 3, 'min_samples_split': 3}",0.941667,0.020615,1


# Dataset: boston

In [5]:
boston = datasets.load_boston()
x_train, x_test, y_train, y_test = train_test_split(
    boston.data, boston.target, test_size=0.2, random_state=9487)

### LinearRegression

In [6]:
regr = LinearRegression().fit(x_train, y_train)
pred = regr.predict(x_test)
print(f"R2_Score={r2_score(pred, y_test)}, MSE={mean_squared_error(pred, y_test):.3f}")

R2_Score=0.5662542434990603, MSE=23.629


### Lasso

In [7]:
lasso = LassoCV(cv=5, n_alphas=1000).fit(x_train, y_train)
print(f"alpha-upper={max(lasso.alphas_)}\nalpha-lower={min(lasso.alphas_)}\nalpha-best={lasso.alpha_}")
pred = lasso.predict(x_test)
print(f"R2_Score={r2_score(pred, y_test)}, MSE={mean_squared_error(pred, y_test):.3f}")

alpha-upper=718.7410682776195
alpha-lower=0.7187410682776196
alpha-best=0.7187410682776196
R2_Score=0.3205278158385301, MSE=29.037


### Ridge

In [8]:
ridge = RidgeCV(cv=5,).fit(x_train, y_train)
print(f"alpha-upper={max(ridge.alphas)}\nalpha-lower={min(ridge.alphas)}\nalpha-best={ridge.alpha_}")
pred = ridge.predict(x_test)
print(f"R2_Score={r2_score(pred, y_test)}, MSE={mean_squared_error(pred, y_test):.3f}")

alpha-upper=10.0
alpha-lower=0.1
alpha-best=0.1
R2_Score=0.5643260261603007, MSE=23.633


### DecisionTreeRegressor

In [9]:
tree = DecisionTreeRegressor().fit(x_train, y_train)
pred = tree.predict(x_test)
print(f"R2_Score={r2_score(pred, y_test)}, MSE={mean_squared_error(pred, y_test):.3f}")

R2_Score=0.8400039554290575, MSE=13.337


### For boston-dataset

DecisionTreeRegressor > LinearRegression ~= Ridge > Lasso