## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [78]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_wine, load_breast_cancer
from scipy.stats import randint

In [81]:
breast_cancer = load_breast_cancer()

In [82]:
X = breast_cancer.data
y = breast_cancer.target

In [83]:
X.shape

(569, 30)

In [84]:
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.3)

In [85]:
xtrain.shape

(398, 30)

# GBDT

In [86]:
GBDT = GradientBoostingClassifier(random_state=1).fit(xtrain, ytrain)
GBDT.score(xtest, ytest)

0.9649122807017544

# GridSearch

In [87]:
a = ['learning_rate', 'max_depth', 'n_estimators']
b = [np.linspace(0.05,0.3,10), list(range(2,7)), [50,100,200]]
param = dict(zip(a,b))

In [88]:
GBDT = GradientBoostingClassifier(random_state=1)

In [89]:
grid_search = GridSearchCV(GBDT, param).fit(xtrain, ytrain)

In [90]:
print(grid_search.best_score_)
print(grid_search.best_params_)

0.9623115577889447
{'learning_rate': 0.10555555555555556, 'max_depth': 3, 'n_estimators': 200}


In [96]:
GBDT = GradientBoostingClassifier(random_state=1, 
                                  learning_rate=0.10555555555555556, 
                                  max_depth=3, 
                                  n_estimators=200).fit(xtrain, ytrain)
GBDT.score(xtest, ytest)

0.9707602339181286

# RandomizedSearch

    # 對於搜索範圍是distribution的超參數，根據給定的distribution隨機採樣；
    # 對於搜索範圍是list的超參數，在給定的list中等概率採樣；
    # 對a、b兩步中得到的n_iter組採樣結果，進行遍歷。
    （補充）如果給定的搜索範圍均為list，則不放回抽樣n_iter次。

In [110]:
a = ['learning_rate', 'max_depth', 'n_estimators']
b = [np.linspace(0.05,0.3,10), randint(2,7), randint(50,200)]
param = dict(zip(a,b))

In [111]:
GBDT = GradientBoostingClassifier(random_state=1)
random_search = RandomizedSearchCV(GBDT, param).fit(xtrain, ytrain)

In [112]:
print(random_search.best_score_)
print(random_search.best_params_)

0.9547738693467337
{'learning_rate': 0.18888888888888888, 'max_depth': 3, 'n_estimators': 73}


In [114]:
GBDT = GradientBoostingClassifier(random_state=1, 
                                  learning_rate=0.18888888888888888, 
                                  max_depth=3, 
                                  n_estimators=73).fit(xtrain, ytrain)
GBDT.score(xtest, ytest)

0.9707602339181286