<font color="blue">To use this notebook on Google Colaboratory, you will need to make a copy of it. Go to **File** > **Save a Copy in Drive**. You can then use the new copy that will appear in the new tab.</font>

# AfterWork Data Science: Hyperparameter Tuning with Python

### Pre-requisites

In [1]:
# We will start by running this cell which will import the necessary libraries
# ---
# 
import pandas as pd                # Pandas for data manipulation
import numpy as np                 # Numpy for scientific computation

## 1. Manual Search

### Example 

In [None]:
# Example 
# ---
# Question: Will John, 40 years old with a salary of 2500 will buy a car?
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# ---
#

In [2]:
# Step 1
# ---
# Loading our dataset 
social_df = pd.read_csv('http://bit.ly/SocialNetworkAdsDataset') 

# Data preparation: Encoding
social_df["Gender"] = np.where(social_df["Gender"].str.contains("Male", "Female"), 1, 0) 

# Defining our predictor and label variable
X = social_df.iloc[:, [1, 2 ,3]].values  # Independent/predictor variables
y = social_df.iloc[:, 4].values          # Dependent/label variable

# Splitting our dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)


# Performing scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler() 
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

In [3]:
# Step 2
# ---
# Defining our classifier
from sklearn.tree import DecisionTreeClassifier  

# We will get to see the values of the Decision Tree classifier hyper parameters in the output below 
# The decision tree has a quite a number of hyperparameters that require fine-tuning in order 
# to get the best possible model that reduces the generalization error. 
# To explore other decision tree hyperparameters, we can explore the sckit-learn documentation 
# by following this link: https://bit.ly/3eu3XIh
# ---
# We will focus on two specific hyperparameters:
# 1. Max depth: This is the maximum number of children nodes that can grow out from 
# the decision tree until the tree is cut off. 
# For example, if this is set to 3, then the tree will use three children nodes 
# and cut the tree off before it can grow any more. 
# 2. Min samples leaf: This is the minimum number of samples, or data points, 
# that are required to be present in the leaf node.
# ---
#
decision_classifier = DecisionTreeClassifier(random_state=42)

# Fitting our data
decision_classifier.fit(X_train, y_train)

DecisionTreeClassifier(random_state=42)

In [4]:
# Step 3
# ---
# Making our predictions
decision_y_prediction = decision_classifier.predict(X_test) 

# Calculating our metrics
# ---
#
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score 
print(accuracy_score(decision_y_prediction, y_test))
print(confusion_matrix(decision_y_prediction, y_test))
print(classification_report(decision_y_prediction, y_test))

0.84
[[56  9]
 [ 7 28]]
              precision    recall  f1-score   support

           0       0.89      0.86      0.88        65
           1       0.76      0.80      0.78        35

    accuracy                           0.84       100
   macro avg       0.82      0.83      0.83       100
weighted avg       0.84      0.84      0.84       100



In [5]:
# Repeating Step 2
# ---
# Let's now perform hyper parameter tuning by setting 
# the hyperparameters max_depth = 2 and min_samples_leaf = 100
# and get our output?
# ---
# 
decision_classifier = DecisionTreeClassifier(max_depth = 2, min_samples_leaf = 50, random_state=42)

# Fitting our data
decision_classifier.fit(X_train, y_train)

DecisionTreeClassifier(max_depth=2, min_samples_leaf=50, random_state=42)

In [6]:
# Repeating Step 3
# --- 
# Step 3
# ---
# Making our predictions
decision_y_prediction = decision_classifier.predict(X_test) 
 
# Calculating our metrics 
print(accuracy_score(decision_y_prediction, y_test))
print(confusion_matrix(decision_y_prediction, y_test))
print(classification_report(decision_y_prediction, y_test))

0.9
[[54  1]
 [ 9 36]]
              precision    recall  f1-score   support

           0       0.86      0.98      0.92        55
           1       0.97      0.80      0.88        45

    accuracy                           0.90       100
   macro avg       0.92      0.89      0.90       100
weighted avg       0.91      0.90      0.90       100



Can you get a better accuracy? By tuning the same hyperparameters or other parameters?

To read more about hyper parameter tuning for decision trees, you can refer to this reading: [Link](https://towardsdatascience.com/how-to-tune-a-decision-tree-f03721801680)

### <font color="green">Challenge</font>

In [None]:
# Challenge 1
# ---
# Using the given dataset above, create a logistic regression classifier 
# then tune its hyperparameters to get the best possible accuracy.
# Make a comparisons of your with other fellows in your breakout rooms.
# Hint: Use the following documentation to tune the hyper parameters.
# Sckit-learn documentation: https://bit.ly/2YZR4iP
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# 

## 2. Grid Search

### Example

In [7]:
# Example 
# ---
# Question: Will John, 40 years old with a salary of 2500 will buy a car?
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# ---
#

In [8]:
# Step 2
# ---
# Defining our classifier 

# We will get to see the values of the Decision Tree classifier hyper parameters in the output below 
# The decision tree has a quite a number of hyperparameters that require fine-tuning in order 
# to get the best possible model that reduces the generalization error. 
# To explore other decision tree hyperparameters, we can explore the sckit-learn documentation 
# by following this link: https://bit.ly/3eu3XIh
# ---
# Again we will focus on the same two specific hyperparameters:
# 1. Max depth: This is the maximum number of children nodes that can grow out from 
# the decision tree until the tree is cut off. 
# For example, if this is set to 3, then the tree will use three children nodes 
# and cut the tree off before it can grow any more. 
# 2. Min samples leaf: This is the minimum number of samples, or data points, 
# that are required to be present in the leaf node.
# ---
# 
decision_classifier = DecisionTreeClassifier(random_state=42)

In [9]:
# Step 3: Hyperparameters: Getting Started with Grid Search
# ---
# We will continue from where we left off from the previous example,
# We will create a dictionary of all the parameters and their corresponding 
# set of values that you want to test for best performance. 
# The name of the dictionary items corresponds to the parameter name 
# and the value corresponds to the list of values for the parameter.
# As shown grid_param dictionary with two parameters max_depth, min_samples_leaf.
# The parameter values that we want to try out are passed in the list.   
# The Grid Search algorithm basically would check for all possible combinations 
# of parameter values and returns the combination with the best accuracy. 
# For instance, in the above case the Grid Search algorithm 
# will check all combinations (5 x 5 = 25).
# ---
# 
grid_param = {
    'max_depth': [2, 3, 4, 10, 15],
    'min_samples_leaf': [10, 20, 30, 40, 50]
}

In [10]:
# Step 2: Instantiating GridSearchCV object
# ---
# We then create an instance of the GridSearchCV class 
# and pass values for the estimator parameter, 
# which basically is the algorithm that you want to execute. 
# The param_grid parameter takes the created grid dictionary 
# The scoring parameter takes the performance metrics, 
# the cv parameter corresponds to number of folds, which will set 5 in our case, 
# and finally the n_jobs parameter refers to the number of CPU's that we want to use for execution. 
# Setting the value of n_jobs = -1 allows us us to use all available computing power.
# You can refer to the GridSearchCV documentation to find out more: https://bit.ly/2Yr0qVC
# ---
# 
from sklearn.model_selection import GridSearchCV
gd_sr_cl = GridSearchCV(estimator = decision_classifier,
                     param_grid = grid_param,
                     scoring = 'accuracy',
                     cv = 5,
                     n_jobs =-1)

In [11]:
# Step 3: Calling the fit method
# ---
# We now fit our data and call the fit method of the class 
# and pass it the training and test set, as shown in the following code.
# If we had lost of other parameters this would take abit of some time to execute. 
# This is because the GridSearchCV would go through all the combinations of hyperparameters. 
# ---
# 
gd_sr_cl.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=42), n_jobs=-1,
             param_grid={'max_depth': [2, 3, 4, 10, 15],
                         'min_samples_leaf': [10, 20, 30, 40, 50]},
             scoring='accuracy')

In [12]:
# Step 4
# --- 
# We use gd_sr_cl.best_params_ attribute of the GridSearchCV object
# to check the parameters with the highest accuracy
# ---
# 
best_parameters = gd_sr_cl.best_params_
print(best_parameters)
 
# We shouldn't stop here however, instead we should add  
# other estimators and see if the accuracy increases  

{'max_depth': 3, 'min_samples_leaf': 10}


In [13]:
# Step 5: Finding the obtained accuracy
# ---
# We can also obtain the best accuracy by doing the following
# ---
# 
best_result = gd_sr_cl.best_score_
print(best_result)

0.9066666666666666


Can you get a better accuracy? By refering to the decision tree documentation, choosing additional approriate hyper-parameters and set the hyperparameter values to the grid search space in an effort to get a better accuracy.

### <font color="green">Challenge</font>

In [None]:
# Challenge
# ---
# In this challenge, we still be required to use grid search while using 
# the logistic regression classifier we created earlier to get the best possible accuracy. 
# Hint: Use the following documentation to tune the hyperparameters.
# Sckit-learn documentation: https://bit.ly/2YZR4iP
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# 

## 3. Random Search

### Example

In [14]:
# Example 
# ---
# Question: Will John, 40 years old with a salary of 2500 will buy a car?
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# ---
#

In [15]:
# Defining our classifier 
# ---
# We will get to see the values of the Decision Tree classifier hyper parameters in the output below 
# The decision tree has a quite a number of hyperparameters that require fine-tuning in order 
# to get the best possible model that reduces the generalization error. 
# To explore other decision tree hyperparameters, we can explore the sckit-learn documentation 
# by following this link: https://bit.ly/3eu3XIh
# ---
# Again, we will focus on the same two specific hyperparameters:
# 1. Max depth: This is the maximum number of children nodes that can grow out from 
# the decision tree until the tree is cut off. 
# For example, if this is set to 3, then the tree will use three children nodes 
# and cut the tree off before it can grow any more. 
# 2. Min samples leaf: This is the minimum number of samples, or data points, 
# that are required to be present in the leaf node.
# ---
# 
decision_classifier = DecisionTreeClassifier(random_state=42)

In [16]:
# Step 1: Hyperparameters: Getting Started with Random Search
# ---
# While performing random search, we would need to provide a statistical distribution 
# for each hyperparameter from which values may be randomly sampled.
# We'll define a sampling distribution for each hyperparameter.
# ---
# 

# Let's define our parameters and the respective distributions to sample from
# ---
#
from scipy.stats import randint as sp_randint
param_dist = {"max_depth": [3, None], 
              "min_samples_leaf": sp_randint(1, 50)}

In [17]:
# Step 2
# ---
# We then instantiate our RandomizedSearchCV object 
# ---
# We can read more about the RandomizedSearchCV documentation
# by following this link: https://bit.ly/2V9Xhri
# ---
#
from sklearn.model_selection import RandomizedSearchCV 
random_sr = RandomizedSearchCV(decision_classifier, param_dist, cv = 5) 

In [18]:
# Step 3: Then fitting our data
# ---
#
random_sr.fit(X_train, y_train)

RandomizedSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=42),
                   param_distributions={'max_depth': [3, None],
                                        'min_samples_leaf': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fdd402ac990>})

In [19]:
# Step 4: Checking for the best parameters
# ---
#
best_parameters = random_sr.best_params_
print(best_parameters)

{'max_depth': None, 'min_samples_leaf': 24}


In [20]:
# And lastly obtaining our accuracy
# --
# 
best_result = random_sr.best_score_
print(best_result)

0.9066666666666666


Can you get a better accuracy? By refering to the decision tree documentation, choosing additional approriate hyper-parameters and set the hyperparameter values to the random search space in an effort to get a better accuracy.

### <font color="green">Challenge</font>

In [21]:
# Challenge
# ---
# Again, we will also be required to use random search while using 
# the logistic regression classifier we created earlier to get the best possible accuracy. 
# Hint: Use the following documentation to tune the hyperparameters.
# Sckit-learn documentation: https://bit.ly/2YZR4iP
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# 

## 4. Bayesian Optimisation 

### Example

In [22]:
# Example 
# ---
# Question: Will John, 40 years old with a salary of 2500 will buy a car?
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# ---
#

In [23]:
# Defining our classifier 
# ---
# We will get to see the values of the Decision Tree classifier hyper parameters in the output below 
# The decision tree has a quite a number of hyperparameters that require fine-tuning in order 
# to get the best possible model that reduces the generalization error. 
# To explore other decision tree hyperparameters, we can explore the sckit-learn documentation 
# by following this link: https://bit.ly/3eu3XIh
# ---
# Again, we will focus on the same two specific hyperparameters:
# 1. Max depth: This is the maximum number of children nodes that can grow out from 
# the decision tree until the tree is cut off. 
# For example, if this is set to 3, then the tree will use three children nodes 
# and cut the tree off before it can grow any more. 
# 2. Min samples leaf: This is the minimum number of samples, or data points, 
# that are required to be present in the leaf node.
# ---
# 

In [1]:
# Step 1: Hyperparameters: Getting Started with Bayesian Optimisation
# ---
# While performing bayesian optimisation, we perform the following steps, 
# 1. Set up a space dictionary.
# - In this space, we create a probability distribution for each of the used hyperparameters.
# 2. Set up the objective function using the respective classifier/regressor.
# 3. Run our Bayesian Optimizer.
# ---
# 

# Let's define set up our space dictionary 
# ---
#

# We will import the hyperopt library which will helps us perform bayesian optimisation
# ---
# Hyperopt librariy documentation: https://bit.ly/2Dyynf4
# ---
#
from hyperopt import hp, fmin, tpe, STATUS_OK
from sklearn.model_selection import cross_val_score

# 1. Setting up a our space dictionary
# ---
#
space = {'max_depth': hp.quniform('max_depth', 10, 1200, 10), 
        'min_samples_leaf': hp.uniform ('min_samples_leaf', 0, 0.5)}

# 2. Setting up our objective function
# ----
#
def objective(space): 
    classifier = DecisionTreeClassifier(max_depth = space['max_depth'],
                                 min_samples_leaf = space['min_samples_leaf'])
    
    accuracy = cross_val_score(classifier, X_train, y_train, cv = 4).mean() 

    # We aim to maximize accuracy; in this case we return it as a negative value
    return {'loss': -accuracy, 'status': STATUS_OK }

# 3. Running our bayesian optimizer
# ---
#
best = fmin(fn= objective,                        # the objective function to miminize / the loss function to minimize
            space = space,                        # the range of input values to test during optimisation
            algo= tpe.suggest,                    # the search algorithm to use
            max_evals = 80,                       # the no. of iteration to perform
            rstate=np.random.RandomState(42))     # the randomstate for reproducability / to get the same result when we run the code

# printing out our outcome
best

NameError: ignored

In [25]:
# We can access values of the above paramemters by
# ---
#
print("Max Depth:", best['max_depth'])
print("Min Samples Leaf:", best['min_samples_leaf'])

Max Depth: 630.0
Min Samples Leaf: 0.08629075344668666


In [26]:
# Let's now perform our classification with our optimal hyperparameters  
# ---
# 
decision_classifier = DecisionTreeClassifier(max_depth = best['max_depth'], 
                                             min_samples_leaf = best['min_samples_leaf'], random_state=42)

# Fitting our data
decision_classifier.fit(X_train, y_train)

DecisionTreeClassifier(max_depth=630.0, min_samples_leaf=0.08629075344668666,
                       random_state=42)

In [27]:
# Making our predictions
# ---
#
decision_y_prediction = decision_classifier.predict(X_test) 

# Calculating our metrics 
print(accuracy_score(decision_y_prediction, y_test))
print(confusion_matrix(decision_y_prediction, y_test))
print(classification_report(decision_y_prediction, y_test))

0.92
[[57  2]
 [ 6 35]]
              precision    recall  f1-score   support

           0       0.90      0.97      0.93        59
           1       0.95      0.85      0.90        41

    accuracy                           0.92       100
   macro avg       0.93      0.91      0.92       100
weighted avg       0.92      0.92      0.92       100



Can you get a better accuracy? By refering to the decision tree documentation, choosing additional approriate hyper-parameters and set the hyperparameter values to the  search space in an effort to get a better accuracy.

### <font color="green">Challenge</font>

In [28]:
# Challenge
# ---
# Again, we will also be required to use bayesian optimisation while using 
# the logistic regression classifier we created earlier to get the best possible accuracy. 
# Hint: Use the following documentation to tune the hyperparameters.
# Sckit-learn documentation: https://bit.ly/2YZR4iP
# ---
# Dataset url = http://bit.ly/SocialNetworkAdsDataset
# 