# Auto-Hyperparameters Tuning

Automated Hyperparameter Tuning can be done by using techniques such as 
- 1. Bayesian Optimization
- 2. Gradient Descent
- 3. Evolutionary Algorithms

## Bayesian Optimisation

#### 1. Bayesian Optimization

- The problem with grid and random search is that these are uninformed methods because they do not use the past results, Bayesian methods differ from random or grid search in that they use past evaluation results to choose the next values to evaluate. The concept is: limit expensive evaluations of the objective function by choosing the next input values based on those that have done well in the past.

- Bayesian optimization uses probability to find the minimum of a function. The final aim is to find the input value to a function which can gives us the lowest possible output value.It usually performs better than random,grid and manual search providing better performance in the testing phase and reduced optimization time.


## **What is HYPEROPT** <a class="anchor" id="4.1"></a>

[Table of Contents](#0.1)

- **HYPEROPT** is a powerful python library that search through an hyperparameter space of values and find the best possible values that yield the minimum of the loss function. 

- Bayesian Optimization technique uses Hyperopt to tune the model hyperparameters. Hyperopt is a Python library which is used to tune model hyperparameters.

- More information on Hyperopt can be found at the following link:-

https://hyperopt.github.io/hyperopt/?source=post_page

### Four Part of Bayesian Optimization¶
Bayesian hyperparameter optimization requires the same four parts as we implemented in grid and random search:

1. **Objective Function**: takes in an input (hyperparameters) and returns a score to minimize or maximize (the cross validation score)
2. **Domain space**: the range of input values (hyperparameters) to evaluate

3. **Optimization Algorithm** : the method used to construct the surrogate function and choose the next values to evaluate

4. **Results**: score, value pairs that the algorithm uses to build the surrogate function The only differences are that now our objective function will return a score to minimize (this is just convention in the field of optimization), our domain space will be probability distributions rather than a hyperparameter grid, and the optimization algorithm will be an informed method that uses past results to choose the next hyperparameter values to evaluate.


The available hyperopt optimization algorithms are -

- **hp.choice(label, options)** — Returns one of the options, which should be a list or tuple.

- **hp.randint(label, upper)** — Returns a random integer between the range [0, upper).

- **hp.uniform(label, low, high)** — Returns a value uniformly between low and high.

- **hp.quniform(label, low, high, q)** — Returns a value round(uniform(low, high) / q) * q, i.e it rounds the decimal values and returns an integer.

- **hp.normal(label, mean, std)** — Returns a real value that’s normally-distributed with mean and standard deviation sigma.


- Here **best_hyperparams** gives us the optimal parameters that best fit model and better loss function value. 

- **trials** is an object that contains or stores all the relevant information such as hyperparameter, loss-functions for each set of parameters that the model has been trained. 

- **‘fmin’** is an optimization function that minimizes the loss function and takes in 4 inputs - fn, space, algo and max_evals.

- Algorithm used is **tpe.suggest**.

# 1. Importing necessary libraries and data, Preprocessing and Defining the Model

In [2]:
# Loading Libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report,accuracy_score 
import warnings
warnings.filterwarnings('ignore')

# Loading Data
df = pd.read_csv('C:/Users/Shubham/Documents/Data Science/Notebooks/00. Data_Store/preprocessed_diabetes.csv')

# Splitting into Features and Target
x = df.drop(["Outcome"], axis=1)
y = df["Outcome"]

# Splitting into Train Test
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=33)

# Defining the RandomForest Classifier
rf = RandomForestClassifier()

# 2. Using HyperOpt

In [8]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score

# 2.1 Domain Space - 
Creating the dictionary of hyperparameters 

The following will be used in this post:
1. hp.choice(label, options) where options should be a python list or tuple.

2. hp.normal(label, mu, sigma) where mu and sigma are the mean and standard deviation, respectively.

3. hp.uniform(label, low, high) where low and high are the lower and upper bounds on the range.

4. Others are available, such as hp.normal, hp.lognormal, hp.quniform, but we will not use them here.

In [9]:
space = {'criterion': hp.choice('criterion', ['entropy', 'gini']),
        'max_depth': hp.quniform('max_depth', 10, 12, 10),
        'max_features': hp.choice('max_features', ['auto', 'sqrt','log2', None]),
        'min_samples_leaf': hp.uniform('min_samples_leaf', 0, 0.5),
        'min_samples_split' : hp.uniform ('min_samples_split', 0, 1),
        'n_estimators' : hp.choice('n_estimators', [10, 50])
        }
space

{'criterion': <hyperopt.pyll.base.Apply at 0x19e07384250>,
 'max_depth': <hyperopt.pyll.base.Apply at 0x19e073843a0>,
 'max_features': <hyperopt.pyll.base.Apply at 0x19e07384070>,
 'min_samples_leaf': <hyperopt.pyll.base.Apply at 0x19e0732dca0>,
 'min_samples_split': <hyperopt.pyll.base.Apply at 0x19e0732da00>,
 'n_estimators': <hyperopt.pyll.base.Apply at 0x19e0732d8e0>}

# 2.2. Objective Function -

In [10]:
def objective(space):
    model = RandomForestClassifier(criterion = space["criterion"],
                                   max_depth = space["max_depth"],
                                   max_features = space["max_features"],
                                   min_samples_leaf = space["min_samples_leaf"],
                                   min_samples_split = space["min_samples_split"],
                                   n_estimators = space["n_estimators"]
                                  )
    accuracy = cross_val_score(model, x_train, y_train, cv=2).mean()
    
    # As we minimise the function and want higher accuracy we return negative accuracy
    
    return {'loss': - accuracy, 'status':STATUS_OK}

# 2.3. Optimization Algorithm -

In [18]:
trials = Trials()
best = fmin(fn = objective,
           space = space,
           algo = tpe.suggest,
           max_evals = 20,
           trials = trials)

100%|████████████████████████████████████████████████| 20/20 [00:01<00:00, 12.37trial/s, best loss: -0.752442996742671]


# 2.4 Results

In [17]:
best

{'criterion': 1,
 'max_depth': 10.0,
 'max_features': 0,
 'min_samples_leaf': 0.023777739592153457,
 'min_samples_split': 0.46628285738423525,
 'n_estimators': 0}

In [14]:
crit = {0: 'entropy', 1: 'gini'}
feat = {0: 'auto', 1: 'sqrt', 2: 'log2', 3: None}
est = {0: 10, 1: 50, 2: 75, 3: 100, 4: 125}

rf = RandomForestClassifier(criterion = crit[best["criterion"]],
                           max_depth = best["max_depth"],
                           max_features = feat[best["max_features"]],
                           min_samples_leaf = best["min_samples_leaf"],
                           min_samples_split = best["min_samples_split"],
                           n_estimators = est[best["n_estimators"]]
                           )

In [15]:
rf.fit(x_train, y_train)

RandomForestClassifier(max_depth=10.0, min_samples_leaf=0.023777739592153457,
                       min_samples_split=0.46628285738423525, n_estimators=10)

In [16]:
rf.score(x_test, y_test)

0.7142857142857143