Parameter fine-tuning is an important part of a machine learning project, especially in deep learning. Although this course focuses on traditional machine learning, few parameters need to be adjusted and the parameter space is limited. However, some models still require lots of parameters, such as a tree-based model.

In practice, people fine-tune by hands. They try s series of values for one parameter, evaluate models, and choose the best one. If there are more than two parameters, then repeat the process until all parameters are chosen. The whole process is time-consuming and tedious, this process can be completely automated.

`sklearn` provides some useful functions to help us. In this lesson, we only focus on the `GridSearchCV` The principle is similar in other ways.

## Grid Search

The grid search provided by `GridSearchCV` exhaustively generates candidates from a grid of parameter values specified with the `param_grid` parameter. In this example, we would work on the tree-based model, which has lots of parameters.

At first, let's create the dataset we need. 

In [8]:
import sklearn.datasets as datasets

X, y = datasets.make_classification(800, n_features=10, n_informative=6)
print("The shape of the X is {}".format(X.shape))
print("The shape of the y is {}".format(y.shape))
print("The first five samples of X {}".format(X[:5]))

The shape of the X is (800, 10)
The shape of the y is (800,)
The first five samples of X [[-1.35341271 -1.9653489   0.99531833  3.71227406 -0.62111538 -0.38267528
  -1.34580511 -1.86971732 -0.33564883  1.19036159]
 [ 1.54690714  0.0530981   0.03621268  0.26972515 -0.64728135 -0.1123988
  -1.07393959  1.86067013 -0.01875296 -0.54086998]
 [-0.34711118  2.69078252 -0.37211752 -4.60466638  1.65670524  1.13690944
   2.97455169 -0.96120026  1.14379302 -2.54672847]
 [-0.91558688 -0.93280044  0.1197404   3.70690686 -1.40209552 -1.56093025
  -1.57097579 -0.17687429  0.11681649 -1.76994032]
 [ 0.24771566 -1.97875531 -0.18720424  4.07258022  1.31323846 -2.37548826
  -0.28857329 -0.9234144   0.92026837 -0.29837037]]


Split the data to train and test set.

In [9]:
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)

Create the estimator we want to fine tune. In this example, we want to create a classifier which is a **GBDT**.

In [10]:
from sklearn.ensemble import GradientBoostingClassifier

gb = GradientBoostingClassifier(random_state=10)

Define a map which is param_grid.

In [13]:
param_grid = [{"n_estimators": [1, 2, 4, 16, 32, 64], "learning_rate": [0.05, 0.1, 0.2, 0.4],
               "min_samples_leaf": [1, 2, 4, 8, 16], "max_depth": [1, 2, 3]}]

Create a **GridSearchCV** object, and **fit** it. It may take 30 seconds to get the result, to accelerate the training, we set the **n_jobs=4** which enables the multiple processors. From the param_grid above, you can figure out the number of combinations is 6 * 5 * 4 * 3=360, which means the **GridSearchCV** would train 360 models and evaluate all of them. So, **GridSearchCV** is a very inefficient approach. Of course, if you have a small amount of data, it can still save a lot of repetitive work and save some time.

In [14]:
from sklearn.model_selection import GridSearchCV

cv = GridSearchCV(gb, param_grid=param_grid, scoring="f1", n_jobs=4)
cv.fit(train_x, train_y)

GridSearchCV(cv=None, error_score=nan,
             estimator=GradientBoostingClassifier(ccp_alpha=0.0,
                                                  criterion='friedman_mse',
                                                  init=None, learning_rate=0.1,
                                                  loss='deviance', max_depth=3,
                                                  max_features=None,
                                                  max_leaf_nodes=None,
                                                  min_impurity_decrease=0.0,
                                                  min_impurity_split=None,
                                                  min_samples_leaf=1,
                                                  min_samples_split=2,
                                                  min_weight_fraction_leaf=0.0,
                                                  n_estimators=100,
                                                  n_iter_n...
                 

Let's see what is the **best_parameter_** and the corresponding metric(F1-score) of different parameter combinations.

In [15]:
print("Best parameters set found on development set:")
print(cv.best_params_)
print("Grid scores on development set:")
means = cv.cv_results_['mean_test_score']
stds = cv.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, cv.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

Best parameters set found on development set:
{'learning_rate': 0.4, 'max_depth': 3, 'min_samples_leaf': 8, 'n_estimators': 64}
Grid scores on development set:
0.716 (+/-0.053) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 1, 'n_estimators': 1}
0.732 (+/-0.051) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 1, 'n_estimators': 2}
0.732 (+/-0.051) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 1, 'n_estimators': 4}
0.734 (+/-0.068) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 1, 'n_estimators': 16}
0.759 (+/-0.103) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 1, 'n_estimators': 32}
0.788 (+/-0.087) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 1, 'n_estimators': 64}
0.716 (+/-0.053) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 2, 'n_estimators': 1}
0.732 (+/-0.051) for {'learning_rate': 0.05, 'max_depth': 1, 'min_samples_leaf': 2, 'n_estimators': 2}
0.732 (+/-0.0