## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [55]:
from sklearn import datasets,metrics
from sklearn.model_selection import train_test_split ,KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

In [56]:
wine=datasets.load_wine()
x_train,x_test,y_train,y_test=train_test_split(wine.data,wine.target,test_size=0.25,random_state=1)
gbr=GradientBoostingRegressor(random_state=7)

In [57]:
gbr.fit(x_train,y_train)
y_pred=gbr.predict(x_test)
print(metrics.mean_squared_error(y_pred,y_test))

0.027281841263585432


In [58]:
#設定超參數
n_estimators = [100,150,200,300]
max_depth = [1,2,3,5]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

#建立搜尋物件,放入字典
grid_search = GridSearchCV(gbr,param_grid,scoring="neg_mean_squared_error",n_jobs=-1,verbose=1)

#搜尋最佳參數
grid_result = grid_search.fit(x_train,y_train)

#會跑3-fold cross-validation 總共要4種 一共48次

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.


Fitting 3 folds for each of 16 candidates, totalling 48 fits


[Parallel(n_jobs=-1)]: Done  48 out of  48 | elapsed:    0.2s finished


In [59]:
#印出最佳答案與參數
print("best accuracy %f using %s" % (grid_result.best_score_ , grid_result.best_params_))

best accuracy -0.077762 using {'max_depth': 2, 'n_estimators': 150}


In [60]:
#重建最佳模型
gbr_best = GradientBoostingRegressor(max_depth = grid_result.best_params_["max_depth"],
                                     n_estimators = grid_result.best_params_["n_estimators"])
#訓練模型
gbr_best.fit(x_train,y_train)

#預測
y_pred_new = gbr_best.predict(x_test)

#分數
print("before using : %f " % metrics.mean_squared_error(y_pred,y_test))
print("after using : %f" % metrics.mean_squared_error(y_pred_new,y_test))

before using : 0.027282 
after using : 0.039147
