# Hyperparameter Optimization
<hr style="border:2px solid black">

## 1. Introduction

<img src="hp_tuning.png" width="400"/>

**Hyperparameters are:**

1. A variable that controls some aspect of the training process of a ML algorithm
2. The value of a hyperparameter is **NOT** learned by the ML algorithm during training
3. The value of the hyperparameter is set by the ML parctitioner
4. All ML algorithms available have some hyperparameters, some more than others. All traditional ML algorithms come with default values for their hyperparameters
5. The best set of values for an ML algorithm is **problem/data-dependent** and is obtained using a hyperparameter tunung method
6. Tuning hyperparameters will probably result in a better model that is not overfitted and generalizes well to unseen data


**How do I choose the right hyperparameters?**

>- Manually
>- Grid Search
>- Random Search
>- Bonus: Bayesian Search

**Exrecise**

Write the name of 3 different hyperparameters of 3 of the ML models you learnt.

- Linear regression & variants:
    + `alpha`: defines the penalty strength
    + `max_iter`: number of maximum iteration to use
    + `l1_ratio`: elastic net mixing parameter

- Random Forest:
    + `n_estimator`: number of trees
    + `max_depth`: number of tree branching
    + `bootstrap`: whether bootstrap samples are used when building trees

- Logistic regression:
    + `solver`: optimization algorithm
    + `penalty`: norm of penalty term
    + `tol`: tolerance for stopping criteria

<hr style="border:2px solid black">

## 2. Hypeparameter Tuning: Penguin Dataset

**Load packages**

In [58]:
# data analysis stack
import pandas as pd
import numpy as np

# data visualization stack
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set() # set seaborn as default style

# data pre-processing stack
from sklearn.preprocessing import (
    StandardScaler,
    OneHotEncoder,
    PolynomialFeatures
)
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

#machine learning stack
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor

# miscellaneous
import time
import warnings
warnings.filterwarnings("ignore")

**Load data**

In [59]:
df = sns.load_dataset("penguins")
df.dropna(inplace=True)
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female
5,Adelie,Torgersen,39.3,20.6,190.0,3650.0,Male


**Features and target variable**

In [60]:
numerical_features = ['bill_length_mm', 
                      'bill_depth_mm',
                      'flipper_length_mm'
                     ]

categorical_features = ['species', 
                        'island',
                        'sex'
                       ]

features = numerical_features + categorical_features

target_variable = 'body_mass_g'

In [61]:
# feature and target columns
X,y = df[features],df[target_variable]

**Train-test split**

In [62]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape, X_test.shape

((266, 6), (67, 6))

### 2.1 Lasso Estimator

**Preprocessing**

In [63]:
# scaling and polynomial features
numerical_transformer = Pipeline(
    steps=[
        ('scaler', StandardScaler()),
        ('polynomial', PolynomialFeatures())
    ]
)

In [64]:
# one-hot encoding
categorical_transformer = Pipeline(
    steps=[
        ('ohe', OneHotEncoder(drop='first'))
    ]
)

In [65]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numerical_transformer, numerical_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

**Grid estimator**

In [66]:
estimator = Pipeline(
    steps=[
        ('preprocessor', preprocessor),   # preprocessing step
        ('lasso', Lasso()) # lasso regression
    ]
)

**Parameter grid**

In [67]:
param_grid = {
    'preprocessor__num__polynomial__degree': [2, 3],
    'preprocessor__num__polynomial__interaction_only': [False, True],
    'lasso__alpha': [100., 10., 1., 0.1, 0.01],
    'lasso__max_iter': [5_000, 10_000, 20_000]
}

**Instantiate GridSearchCV**

In [69]:
from sklearn.model_selection import GridSearchCV

In [71]:
gscv = GridSearchCV(
    estimator=estimator,
    param_grid=param_grid,
    scoring='r2',
    cv=5, 
    n_jobs=-1,
    verbose=1
)

**Grid-search cross-validation**

In [72]:
# initial time
ti = time.time()

# grid-search cross-validation
gscv.fit(X_train,y_train)

# final time 
tf = time.time()

# time taken
print(f"time taken: {round(tf-ti,2)} sec")

Fitting 5 folds for each of 60 candidates, totalling 300 fits
time taken: 1.95 sec


**Cross-validation results**

In [73]:
# all results
gscv.cv_results_

{'mean_fit_time': array([0.00740223, 0.00904474, 0.01226602, 0.01009078, 0.00656629,
        0.00752034, 0.00631313, 0.00521693, 0.00927548, 0.00594816,
        0.0083159 , 0.00616064, 0.0060946 , 0.00634766, 0.00885749,
        0.01158266, 0.01053305, 0.01315823, 0.01363902, 0.00713372,
        0.01226697, 0.00695581, 0.00964265, 0.01721425, 0.01515336,
        0.01046371, 0.01137862, 0.01000056, 0.01982732, 0.01163692,
        0.02007475, 0.01281414, 0.0121264 , 0.00669117, 0.01348286,
        0.00930023, 0.00576687, 0.00916543, 0.00821376, 0.0050107 ,
        0.00893326, 0.00492802, 0.00821962, 0.00645447, 0.00511889,
        0.0051384 , 0.0092164 , 0.00546184, 0.00569129, 0.0053081 ,
        0.02924924, 0.01114254, 0.01777968, 0.0237781 , 0.01695037,
        0.01008701, 0.01213346, 0.01825461, 0.02012768, 0.01092443]),
 'std_fit_time': array([2.16870738e-03, 4.14498815e-03, 7.42611679e-03, 6.59986729e-03,
        8.79865566e-04, 3.13880102e-03, 1.20573854e-03, 2.67232774e-04,
     

In [75]:
# list of columns to show
column_list = ['param_preprocessor__num__polynomial__degree',
               'param_preprocessor__num__polynomial__interaction_only',
               'param_lasso__alpha',
               'param_lasso__max_iter',
               'mean_test_score',
               'std_test_score',
               'rank_test_score'
              ]
# create result dataframe
result_df = pd.DataFrame(gscv.cv_results_)[column_list]

# rename columns
result_df.rename(
    columns=lambda name: name.split('__')[-1],inplace=True
)

# order by rank
result_df.sort_values(
    by='rank_test_score', ascending=True, inplace=True, ignore_index=True
)

result_df

Unnamed: 0,degree,interaction_only,alpha,max_iter,mean_test_score,std_test_score,rank_test_score
0,3,False,10.0,20000,0.859041,0.019154,1
1,3,False,10.0,10000,0.859041,0.019154,1
2,3,False,10.0,5000,0.859041,0.019154,1
3,3,True,1.0,10000,0.857309,0.021919,4
4,3,True,1.0,20000,0.857309,0.021919,4
5,3,True,1.0,5000,0.857309,0.021919,4
6,3,False,1.0,20000,0.857294,0.015192,7
7,3,False,1.0,10000,0.857294,0.015192,7
8,3,False,1.0,5000,0.857294,0.015192,7
9,2,True,1.0,10000,0.856452,0.023333,10


**Best hyperparameters**

In [23]:
gscv.best_params_

{'lasso__alpha': 10.0,
 'lasso__max_iter': 5000,
 'preprocessor__num__polynomial__degree': 3,
 'preprocessor__num__polynomial__interaction_only': False}

**Best score**

In [24]:
round(gscv.best_score_,6)

0.859041

**Model selection**

In [25]:
best_model = gscv.best_estimator_
best_model

**Build model**

In [26]:
best_model.fit(X_train,y_train);

**Model performance**

In [27]:
# training score
training_score = best_model.score(X_train,y_train)

# test score
test_score = best_model.score(X_test,y_test)

print(f'Train score: {round(training_score,6)}')
print(f'Test score : {round(test_score,6)}')

Train score: 0.870775
Test score : 0.878991


### 2.2 Random Forest Estimator

**Preprocessing**

In [28]:
# scaling and polynomial features
numerical_transformer = Pipeline(
    steps=[
        ('scaler', StandardScaler())
    ]
)

In [29]:
# one-hot encoding
categorical_transformer = Pipeline(
    steps=[
        ('ohe', OneHotEncoder(drop='first'))
    ]
)

In [30]:
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numerical_transformer, numerical_features),
        ("cat", categorical_transformer, categorical_features)
    ]
)

**Grid estimator**

In [31]:
estimator = Pipeline(
    steps=[
        ('preprocessor', preprocessor), # preprocessing step
        ('rf', RandomForestRegressor()) # random forest regression
    ]
)

**Parameter grid**

In [32]:
param_grid = {
    'rf__n_estimators': [50,100,200,300,500],
    'rf__max_depth': [5,10,20,50,100,None],
    'rf__min_samples_split': [2, 5, 10]
}

**Grid-search cross-validation**

In [33]:
gscv = GridSearchCV(
    estimator=estimator,
    param_grid=param_grid,
    scoring='r2',
    cv=5, 
    n_jobs=-1,
    verbose=1
)

In [34]:
# initial time
ti = time.time()

# grid-search cross-validation
gscv.fit(X_train,y_train)

# final time 
tf = time.time()

# time taken
print(f"time taken: {round(tf-ti,2)} sec")

Fitting 5 folds for each of 90 candidates, totalling 450 fits
time taken: 21.79 sec


**Best hyperparameters**

In [35]:
gscv.best_params_

{'rf__max_depth': 5, 'rf__min_samples_split': 2, 'rf__n_estimators': 50}

**Best score**

In [36]:
round(gscv.best_score_,6)

0.854888

**Instantiate best model**

In [37]:
best_model = gscv.best_estimator_
best_model

**Build model**

In [38]:
best_model.fit(X_train,y_train);

**Model performance**

In [39]:
# training score
training_score = best_model.score(X_train,y_train)

# test score
test_score = best_model.score(X_test,y_test)

print(f'Train score: {round(training_score,6)}')
print(f'Test score : {round(test_score,6)}')

Train score: 0.92742
Test score : 0.88707


### 2.3 KNN Estimator

**Preprocessing**
- Same as in Section 2.2 with Random Forest estimator

**Grid estimator**

In [40]:
estimator = Pipeline(
    steps=[
        ('preprocessor', preprocessor), # preprocessing step
        ('knn', KNeighborsRegressor())  # knn regression
    ]
)

**Parameter grid**

In [41]:
param_grid = {
    'knn__n_neighbors': list(range(1,31)),
    'knn__metric': ['euclidean', 'manhattan', 'minkowski'],
    'knn__weights': ['uniform', 'distance']
}

**Grid-search cross-validation**

In [42]:
gscv = GridSearchCV(
    estimator=estimator,
    param_grid=param_grid,
    scoring='r2',
    cv=5, 
    n_jobs=-1,
    verbose=1
)

In [43]:
# initial time
ti = time.time()

# grid-search cross-validation
gscv.fit(X_train,y_train)

# final time 
tf = time.time()

# time taken
print(f"time taken: {round(tf-ti,2)} sec")

Fitting 5 folds for each of 180 candidates, totalling 900 fits
time taken: 2.13 sec


**Best hyperparameters**

In [44]:
gscv.best_params_

{'knn__metric': 'manhattan',
 'knn__n_neighbors': 23,
 'knn__weights': 'distance'}

**Best score**

In [45]:
round(gscv.best_score_,6)

0.856476

**Instantiate best model**

In [46]:
best_model = gscv.best_estimator_
best_model

**Build model**

In [47]:
best_model.fit(X_train,y_train);

**Model performance**

In [48]:
# training score
training_score = best_model.score(X_train,y_train)

# test score
test_score = best_model.score(X_test,y_test)

print(f'Train score: {round(training_score,6)}')
print(f'Test score : {round(test_score,6)}')

Train score: 1.0
Test score : 0.887093


<hr style="border:2px solid black">

**Conclusion: Lasso estimator pipeline with the best hyperparameters in the model to select**

<hr style="border:2px solid black">

## References

- [What is the Difference Between a Parameter and a Hyperparameter?](https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/)

- [Hyperparameters Optimization](https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d)

- [Hyperparameter tuning for machine learning models](https://www.jeremyjordan.me/hyperparameter-tuning/)

- [Hyperparameter Optimization With Random Search and Grid Search](https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/)