## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [25]:
# 匯入需要的套件
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

In [6]:
# 載入資料集

dataset = datasets.load_iris()
X = dataset.data
y = dataset.target

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [9]:
# 建立模型
tree = DecisionTreeClassifier()
clf = GradientBoostingClassifier()
svm = SVC()
knn = KNeighborsClassifier()
lgr = LogisticRegression()

In [10]:
tree.get_params

<bound method BaseEstimator.get_params of DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=None, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')>

In [11]:
# 建立窮舉法的搜尋參數
# tree_params
tree_params = {'criterion':['gini', 'entropy'],
               'max_depth':[None, 1, 3, 5, 7],
               'min_samples_leaf':[5, 10, 15, 20, 25, 30]}

# GradientBoostingClassifier_params
clf_params = {'n_estimators':[100, 200, 300],
              'max_depth':[None, 1, 3, 5, 7]}

# SVC_params
#svc_params = {}

# LogisticRegression_params
lr_params = {'C':[1e-1, 1e0, 1e1, 1e2], 'penalty':['l1', 'l2']}

# knn
KNN_params = {'n_neighbors':[1, 3, 5, 7]}

In [26]:
# 決策樹

grid_saerch_tree = GridSearchCV(tree, tree_params)
grid_saerch_tree.fit(X_train, y_train)
print('Best performing score:', grid_saerch_tree.best_score_)
print('Best performing params:', grid_saerch_tree.best_params_)

Best performing score: 0.9553359683794467
Best performing params: {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 25}


In [27]:
# GradientBoostingClassifier
grid_saerch_GB = GridSearchCV(clf, clf_params)
grid_saerch_GB.fit(X_train, y_train)
print('Best performing score:', grid_saerch_GB.best_score_)
print('Best performing params:', grid_saerch_GB.best_params_)

Best performing score: 0.9727272727272727
Best performing params: {'max_depth': 5, 'n_estimators': 100}


In [28]:
# LogisticRegression
grid_saerch_lr = GridSearchCV(lgr, lr_params)
grid_saerch_lr.fit(X_train, y_train)
print('Best performing score:', grid_saerch_lr.best_score_)
print('Best performing params:', grid_saerch_lr.best_params_)

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iteration

Best performing score: 0.9727272727272727
Best performing params: {'C': 10.0, 'penalty': 'l2'}


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

In [29]:
# KNN
grid_saerch_knn = GridSearchCV(knn, KNN_params)
grid_saerch_knn.fit(X_train, y_train)
print('Best performing score:', grid_saerch_knn.best_score_)
print('Best performing params:', grid_saerch_knn.best_params_)


Best performing score: 0.9640316205533598
Best performing params: {'n_neighbors': 7}


### 使用最佳超參數重新訓練

In [31]:
# 使用最佳參數重新建立模型
clf_bestparam = GradientBoostingClassifier(max_depth=grid_saerch_GB.best_params_['max_depth'],
                                           n_estimators=grid_saerch_GB.best_params_['n_estimators'])

# 訓練模型
clf_bestparam.fit(X_train, y_train)

# 預測測試集
y_pred = clf_bestparam.predict(X_test)

In [32]:
print(metrics.classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       1.00      0.94      0.97        16
           2       0.90      1.00      0.95         9

    accuracy                           0.97        38
   macro avg       0.97      0.98      0.97        38
weighted avg       0.98      0.97      0.97        38



In [37]:
print('Overall Accuracy:%0.3f'% metrics.accuracy_score(y_test, y_pred))

Overall Accuracy:0.974
