## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [1]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

In [2]:
diabetes = datasets.load_diabetes()

## RandomForestRegressor

In [3]:
rfr = RandomForestRegressor()
n_estimators = [10, 20, 40]
max_depth = [5, 10, 15]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)

In [4]:
x_train, x_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.2, random_state=4)

grid_search = GridSearchCV(rfr, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

grid_result = grid_search.fit(x_train, y_train)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    2.9s finished


In [5]:
grid_result.best_params_

{'max_depth': 10, 'n_estimators': 40}

In [6]:
grid_result.best_score_

-3318.039477462274

In [7]:
best_rfr = RandomForestRegressor(n_estimators=grid_result.best_params_['n_estimators'],
                                 max_depth=grid_result.best_params_['max_depth'])

best_rfr.fit(x_train, y_train)

y_pred = best_rfr.predict(x_test)

In [8]:
MSE = metrics.mean_squared_error(y_pred, y_test)
print(MSE)

3363.7195760376694


## GradientBoostRegressor

In [9]:
gbr = GradientBoostingRegressor()
n_estimators = [25, 50, 100, 200]
max_depth = [2, 3, 4]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)
param_grid

{'n_estimators': [25, 50, 100, 200], 'max_depth': [2, 3, 4]}

In [10]:
x_train, x_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.2, random_state=4)

grid_search = GridSearchCV(gbr, param_grid=param_grid, scoring='neg_mean_squared_error', n_jobs=-1, verbose=1)

grid_result = grid_search.fit(x_train, y_train)

Fitting 3 folds for each of 12 candidates, totalling 36 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  36 out of  36 | elapsed:    0.9s finished


In [11]:
print(f'Best Score: {grid_result.best_score_:.2f} by Best Param: {grid_result.best_params_}')

Best Score: -3189.48 by Best Param: {'max_depth': 2, 'n_estimators': 50}


In [12]:
best_gbr = GradientBoostingRegressor(n_estimators=grid_result.best_params_['n_estimators'],
                                     max_depth=grid_result.best_params_['max_depth'])

best_gbr.fit(x_train, y_train)

y_pred = best_gbr.predict(x_test)

In [13]:
MSE = metrics.mean_squared_error(y_test, y_pred)
print(MSE)

3100.5424062029356
