## Using Randomized Search for Hyperparameter Tuning
The goal of this exercise is to perform hyperparameter tuning using randomized search and cross-validation.

In [1]:
# import libraries
import pandas as pd

In [2]:
# create headers for data
_headers = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'car']

In [3]:
# read in cars dataset
df = pd.read_csv('https://raw.githubusercontent.com/'\
                 'PacktWorkshops/The-Data-Science-Workshop/'\
                 'master/Chapter07/Dataset/car.data', names=_headers, index_col=None)
print(df.shape)
df.head()

(1728, 7)


Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,car
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


In [4]:
# encode categorical variables
_df = pd.get_dummies(df, columns=['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety'])
_df.head()

Unnamed: 0,car,buying_high,buying_low,buying_med,buying_vhigh,maint_high,maint_low,maint_med,maint_vhigh,doors_2,...,doors_5more,persons_2,persons_4,persons_more,lug_boot_big,lug_boot_med,lug_boot_small,safety_high,safety_low,safety_med
0,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,0,1,0
1,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,0,0,1
2,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,0,1,1,0,0
3,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,1,0
4,unacc,0,0,0,1,0,0,0,1,1,...,0,1,0,0,0,1,0,0,0,1


In [5]:
# split the data into features and labels
features = _df.drop(['car'], axis=1).values
labels = _df[['car']].values

In [6]:
# import libraries
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

In this step, you import numpy for numerical computations, RandomForestClassifier to create an ensemble of estimators, and RandomizedSearchCV to perform a randomized search with cross-validation.

In [7]:
# instanitate RandomForestClassifier
clf = RandomForestClassifier()

In this step, you instantiate RandomForestClassifier. A random forest classifier is a voting classifier. It makes use of multiple decision trees, which are trained on different subsets of the data. The results from the trees contribute to the output of the random forest by using a voting mechanism.

In [8]:
# Specify parameters
params = {'n_estimators':[500, 1000, 2000], 'max_depth': np.arange(1, 8)}

RandomForestClassifier accepts many parameters, but we specify two: the number of trees in the forest, called n_estimators, and the depth of the nodes in each tree, called max_depth.

In [9]:
# Instantiate a randomized search
clf_cv = RandomizedSearchCV(clf, param_distributions=params, cv=5)

In this step, you specify three parameters when you instantiate the clf class, the estimator, or model to use, which is a random forest classifier, param_distributions, the parameter search space, and cv, the number of cross-validation datasets to create.

In [10]:
# perform search
clf_cv.fit(features, labels.ravel())

RandomizedSearchCV(cv=5, estimator=RandomForestClassifier(),
                   param_distributions={'max_depth': array([1, 2, 3, 4, 5, 6, 7]),
                                        'n_estimators': [500, 1000, 2000]})

In this step, you perform the search by calling fit(). This operation trains different models using the cross-validation datasets and various combinations of the hyperparameters.

In [11]:
# print best parameter combination
print("Tuned Random Forest Parameters: {}".format(clf_cv.best_params_))

Tuned Random Forest Parameters: {'n_estimators': 2000, 'max_depth': 5}


In [12]:
# print best score
print('score is {}'.format(clf_cv.best_score_))

score is 0.7593113847700427


In [13]:
# inspect best model
model = clf_cv.best_estimator_
model

RandomForestClassifier(max_depth=5, n_estimators=2000)

In this exercise, you learned to make use of cross-validation and random search to find the best model using a combination of hyperparameters. This process is called hyperparameter tuning, in which you find the best combination of hyperparameters to use to train the model that you will put into production.