# Perform grid search on model hyperparameters

All Rights Reserved © <a href="http://www.louisdorard.com" style="color: #6D00FF;">Louis Dorard</a>

<img src="http://s3.louisdorard.com.s3.amazonaws.com/ML_icon.png">

## Load data

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
data = pd.read_csv("/data/boston-housing.csv")
target_column = 'medv'
features = data.drop(target_column, axis=1)
outputs = data[target_column]
X = features.values.astype(float)
y = outputs.values

## Grid search from scratch

Let's implement a procedure to tune 1 hyperparameter — here, `max_features` in Random Forest.

In [None]:
FOLDS = 10

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from numpy import arange
params = arange(0.1, 1.0, 0.1)
means = []
stdevs = []
for i in params:
    s = cross_val_score(RandomForestRegressor(n_estimators=10, max_features=i), X, y, scoring="r2", cv=FOLDS, verbose=0)
    m = s.mean()
    st = s.std()
    means.append(m)
    stdevs.append(st)
    print("Param " + str(i) + ": " + str(m) + " +/- " + str(st))

## Grid search with scikit's `GridSearchCV`

### Example with 1 hyperparameter

Define a grid search task:

In [None]:
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(RandomForestRegressor(n_estimators=10),
                           {"max_features": params},
                           scoring="r2",
                           cv=FOLDS, # cv is the number of folds; smaller values will make the evaluation quicker; recommended values are between 5 and 10
                           n_jobs=-1,
                           verbose=1)

Run the search:

In [None]:
grid_search.fit(X, y)

Plot grid search results:

In [None]:
%matplotlib inline
from matplotlib import pyplot
pyplot.errorbar(params, grid_search.cv_results_['mean_test_score'], yerr=grid_search.cv_results_['std_test_score'])
pyplot.title("Influence of hyperparameter")
pyplot.xlabel("Hyperparameter")
pyplot.ylabel("Performance")
pyplot.show()

### Example with 2 hyperparameters

In [None]:
grid = {"max_depth": [3, 9, None],
        "max_features": [0.5, 0.75]}
grid_search = GridSearchCV(RandomForestRegressor(n_estimators=10),
                           grid,
                           scoring="r2",
                           cv=FOLDS,
                           n_jobs=-1)

Run the search to find the best combination of hyperparameters:

In [None]:
grid_search.fit(X, y)

Create model from `X` and `y` using the best hyperparameters found during this search:

In [None]:
model = RandomForestRegressor(**grid_search.best_params_)
model.fit(X, y)

### Example with `KFold`

In [None]:
from sklearn.model_selection import KFold
SEED = 8
kfold = KFold(n_splits=FOLDS, shuffle=True, random_state=SEED)
grid_search = GridSearchCV(RandomForestRegressor(n_estimators=10),
                           grid,
                           scoring="r2",
                           cv=kfold,
                           n_jobs=-1)