# Hyperparameter Tuning


## Grid Search for Multiple Hyperparameters

Choosing best combination for multiple hyperparameters.

In [15]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, cross_val_score, cross_validate
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression


In [2]:
data = pd.read_csv('/Users/sylvia/Desktop/datasets/insurance.csv')
data.head()


Unnamed: 0,age,sex,bmi,children,smoker,region,expenses
0,19,female,27.9,0,yes,southwest,16884.92
1,18,male,33.8,1,no,southeast,1725.55
2,28,male,33.0,3,no,southeast,4449.46
3,33,male,22.7,0,no,northwest,21984.47
4,32,male,28.9,0,no,northwest,3866.86


In [3]:
# Binary Variables - sex and smoker
data['sex'] = data['sex'].replace({'female':1, 'male':0})
data['smoker'] = data['smoker'].replace({'yes':1, 'no':0})
data.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,expenses
0,19,1,27.9,0,1,southwest,16884.92
1,18,0,33.8,1,0,southeast,1725.55
2,28,0,33.0,3,0,southeast,4449.46
3,33,0,22.7,0,0,northwest,21984.47
4,32,0,28.9,0,0,northwest,3866.86


In [4]:
# Multiclass variables - region
data_ohe = pd.get_dummies(data)
data_ohe = data_ohe.reindex(columns = [col for col in data_ohe.columns if col != 'expenses'] + ['expenses'])
data_ohe.head()

Unnamed: 0,age,sex,bmi,children,smoker,region_northeast,region_northwest,region_southeast,region_southwest,expenses
0,19,1,27.9,0,1,0,0,0,1,16884.92
1,18,0,33.8,1,0,0,0,1,0,1725.55
2,28,0,33.0,3,0,0,0,1,0,4449.46
3,33,0,22.7,0,0,0,1,0,0,21984.47
4,32,0,28.9,0,0,0,1,0,0,3866.86


**Split into Features and Target**

In [5]:
X = data_ohe.drop('expenses', axis = 1)
y = data_ohe['expenses']

In [6]:
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size = .2, random_state=3)
xtrain.shape, ytrain.shape, xtest.shape, ytest.shape

((1070, 9), (1070,), (268, 9), (268,))

**Modelling**

In [7]:
model = DecisionTreeRegressor(max_depth=7).fit(xtrain, ytrain) 


In [8]:
ypred_test = model.predict(xtest)

print('RMSE: %.2f' % np.sqrt(mean_squared_error(ytest, ypred_test)))
print('R2_score: %.2f' % (r2_score(ytest, ypred_test)))

RMSE: 5045.75
R2_score: 0.83


**HYPERPARAMETER TUNING**

**Tuning single hyper parameter**


In [9]:
# Specifying the range for max_depths to create multiple trees

# Initially this range is arbitrarily taken, its an iterative approach based on further analysis

params = {'max_depth':list(range(2,20))} 
params


{'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]}

In [10]:
dtr = DecisionTreeRegressor()  # base model without any hyperparameter

gs_dtr = GridSearchCV(estimator=dtr, 
                      param_grid=params, # parameters in which we want to search optimal values
                      scoring='r2', 
                      cv=5).fit(xtrain, ytrain)


In [11]:
gs_dtr.best_params_, gs_dtr.best_score_


({'max_depth': 4}, 0.8449212957688363)

**Tuning multiple hyper parameter**


In [12]:
# All tunable parameters can be passed inside this dictionary params as below.
# Hyperparameter name as key & list of values as values in dictionary.

params = {'max_depth':list(range(2,7)), 'min_samples_split':[5,10,15]}

for x in params['max_depth']:
    for y in params['min_samples_split']:
        print(x, y)
    print('-'*20)
    

2 5
2 10
2 15
--------------------
3 5
3 10
3 15
--------------------
4 5
4 10
4 15
--------------------
5 5
5 10
5 15
--------------------
6 5
6 10
6 15
--------------------


We observe above all possible combinations (15 in total) with just 2 hyperparameters. This means 1 best possible combination will be selected out of total 15 models feasible.

In [13]:
dtr = DecisionTreeRegressor()

dtr_gs = GridSearchCV(estimator=dtr,
                      param_grid=params,
                      scoring='r2', 
                      cv=5).fit(xtrain, ytrain)

In [14]:
dtr_gs.best_params_, dtr_gs.best_score_

({'max_depth': 4, 'min_samples_split': 5}, 0.8449212957688363)

**Note**

This result is only from ranges of options specified in params option. In case optimum result is outside those ranges, it is not covered then.

* Problem with GridSearchCV is that the number of combinations will increase with multiple hyperparameters like above
* With multiple hyperparameters through GridSearchCV, whole process will take lot of time with large data 
* Due to such probs we use RandomizedSearchCV

**RandomizedSearchCV** : Instead of making combinations with all the hyperparameters, it will select few of the combinations and does hyperparameter tuning.
* Due to which no of combinations are less and will yeild optimal results faster

- With randomised search you will know in what viscinity of values u have optimal results coming in.
