## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [12]:
from sklearn import datasets, metrics
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.model_selection import train_test_split, GridSearchCV

wine = datasets.load_wine()
#boston = datasets.load_boston()
#breast_cancer = datasets.load_breast_cancer()
#digits = datasets.load_digits()
x_train,x_test,y_train,y_test=train_test_split(wine.data,wine.target,test_size=0.1,random_state=42)

In [13]:
# 先用預設值的MSE
gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(x_train,y_train)
y_pred_default = gbr.predict(x_test)
print('Default MSE : ',metrics.mean_squared_error(y_test,y_pred_default))

Default MSE :  0.2605859119476395


In [14]:
# 先測試一個變數 n_estimators
param_test1= {'n_estimators':list(range(20,81,10))}  
grid_search1 = GridSearchCV(gbr, param_test1, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
grid_result1 = grid_search1.fit(x_train,y_train)

Fitting 3 folds for each of 7 candidates, totalling 21 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   6 out of  21 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  21 out of  21 | elapsed:    0.0s finished


In [15]:
print('Best Accuracy : %f using %s' % (grid_result1.best_score_,grid_result1.best_params_))

Best Accuracy : -0.108397 using {'n_estimators': 20}


In [16]:
# 測試兩個變數 max_depth, min_samples_split
param_test2= {'max_depth':list(range(3,14,2)), 'min_samples_split':list(range(20,301,50))}
grid_search2 = GridSearchCV(gbr, param_test2, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
grid_result2 = grid_search2.fit(x_train,y_train)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed:    0.3s finished


In [17]:
print('Best Accuracy : %f using %s' % (grid_result2.best_score_,grid_result2.best_params_))

Best Accuracy : -0.060270 using {'max_depth': 9, 'min_samples_split': 70}


In [18]:
# 三個一起測，看看結果
param_test3= {'max_depth':list(range(3,14,2)),'min_samples_split':list(range(20,301,50)),'n_estimators':list(range(20,81,10))}
grid_search3 = GridSearchCV(gbr, param_test3, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)
grid_result3 = grid_search3.fit(x_train,y_train)
print('Best Accuracy : %f using %s' % (grid_result3.best_score_,grid_result3.best_params_))

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 3 folds for each of 252 candidates, totalling 756 fits
Best Accuracy : -0.059712 using {'max_depth': 9, 'min_samples_split': 70, 'n_estimators': 80}


[Parallel(n_jobs=-1)]: Done 756 out of 756 | elapsed:    1.5s finished


In [21]:
# 結果有不同，多維圖形，參數要測試需要一起，單一分開的結果會不同
# 使用param_test3 模型預測，看準確度是否上升
gbr_bestparam = GradientBoostingRegressor(max_depth=grid_result3.best_params_['max_depth'],\
                                          min_samples_split=grid_result3.best_params_['min_samples_split'],\
                                          n_estimators=grid_result3.best_params_['n_estimators'])

gbr_bestparam.fit(x_train, y_train)
y_pred = gbr_bestparam.predict(x_test)
print('Modified MSE : ',metrics.mean_squared_error(y_test, y_pred))

Modified MSE :  0.06318188802973769


In [None]:
# MSE明顯下降許多