## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

<h1>diabetes<h1>

In [3]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import datasets,metrics
from sklearn.model_selection import train_test_split,KFold,GridSearchCV
import pandas as pd

diabetes=datasets.load_diabetes()
df=pd.DataFrame(diabetes.data,columns=diabetes.feature_names)
first_5_col=df.columns[0:5]
print(df[first_5_col].head())

        age       sex       bmi        bp        s1
0  0.038076  0.050680  0.061696  0.021872 -0.044223
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449
2  0.085299  0.050680  0.044451 -0.005671 -0.045599
3 -0.089063 -0.044642 -0.011595 -0.036656  0.012191
4  0.005383 -0.044642 -0.036385  0.021872  0.003935


In [5]:
x_train,x_test,y_train,y_test=train_test_split(diabetes.data,diabetes.target,test_size=0.25,random_state=42)
clf=GradientBoostingRegressor(random_state=7)
clf.fit(x_train,y_train)
y_pred=clf.predict(x_test)

print(f'train error: {metrics.mean_squared_error(y_train, clf.predict(x_train))}')
print(f'test error: {metrics.mean_squared_error(y_test, y_pred)}')

train error: 905.4545327948351
test error: 3194.3823958820526


In [6]:
learning_rate = [0.0001, 0.001, 0.01, 0.1, 0.2]
n_estimators=[50,100,200,300,400]
max_depth=[1,2,3,4,5]
param_grid=dict(n_estimators=n_estimators,max_depth=max_depth,learning_rate=learning_rate)

grid_search=GridSearchCV(clf,param_grid,scoring="neg_mean_squared_error",n_jobs=-1,verbose=1)
grid_result=grid_search.fit(x_train,y_train)

Fitting 5 folds for each of 125 candidates, totalling 625 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  18 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 240 tasks      | elapsed:    2.8s
[Parallel(n_jobs=-1)]: Done 594 out of 625 | elapsed:    5.7s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done 625 out of 625 | elapsed:    6.1s finished


In [7]:
print("Best Accuracy: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Best Accuracy: -3240.757496 using {'learning_rate': 0.2, 'max_depth': 1, 'n_estimators': 50}


In [8]:
grid_result.best_params_

{'learning_rate': 0.2, 'max_depth': 1, 'n_estimators': 50}

In [10]:
clf_bestparam=GradientBoostingRegressor(learning_rate=grid_result.best_params_['learning_rate'],max_depth=grid_result.best_params_['max_depth'],n_estimators=grid_result.best_params_['n_estimators'])
clf_bestparam.fit(x_train,y_train)
y_pred=clf_bestparam.predict(x_test)

print(f'train error: {metrics.mean_squared_error(y_train, clf_bestparam.predict(x_train))}')
print(f'test error: {metrics.mean_squared_error(y_test, y_pred)}')

train error: 2481.4767934561482
test error: 2815.7248636806307
