## [作業重點]
了解如何使用 Sklearn 中的 hyper-parameter search 找出最佳的超參數

### 作業
請使用不同的資料集，並使用 hyper-parameter search 的方式，看能不能找出最佳的超參數組合

In [9]:
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split, KFold, GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor

## Boston

In [10]:
boston = datasets.load_boston()

In [11]:
dir(boston)

['DESCR', 'data', 'feature_names', 'filename', 'target']

In [12]:
print(f'DESCR: {boston.DESCR}')

DESCR: .. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRA

In [18]:
x_train, x_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.25, random_state=42)

gbr = GradientBoostingRegressor(random_state=7)

gbr.fit(x_train, y_train)

y_pred = gbr.predict(x_test)

print(f'Mean Squared Error: {metrics.mean_squared_error(y_test, y_pred)}')

accuracy: 8.913775994322064


In [19]:
n_estimators = [100, 200, 300]
max_depth = [1, 3, 5]
param_grid = dict(n_estimators=n_estimators, max_depth=max_depth)
grad_search = GridSearchCV(gbr, param_grid, scoring="neg_mean_squared_error", n_jobs=-1, verbose=1)

In [20]:
grid_result = grad_search.fit(x_train, y_train)

Fitting 3 folds for each of 9 candidates, totalling 27 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  27 out of  27 | elapsed:    2.0s finished


In [21]:
print(f"Best Mean Square Error: {grid_result.best_score_} using: {grid_result.best_params_}")

Best Mean Square Error: -13.118704337935425 using: {'max_depth': 3, 'n_estimators': 200}


In [22]:
grid_result.best_params_

{'max_depth': 3, 'n_estimators': 200}

In [23]:
gbr_bestparam = GradientBoostingRegressor(max_depth=grid_result.best_params_['max_depth'], 
                                          n_estimators=grid_result.best_params_['n_estimators'])

gbr_bestparam.fit(x_train, y_train)

y_pred = gbr_bestparam.predict(x_test)

print(f"Mean Square Error: {metrics.mean_squared_error(y_test, y_pred)}")

Mean Square Error: 8.59943474787538
