# Super-charge hyper-paramater search with Optuna
> The post introduce the basics principle of probability and information theory and their application to machine learning.
- toc: true 
- badges: true
- comments: true
- author: Anthony Faust
- show_tags: true
- categories: [Machine learning, Deep learning]
- image: images/post/search.jpg


## Introduction
Training machine learning sometimes involves various hyperparameter settings. Performing a hyperparameter search is an integral element in building machine learning models. It consists of attuning different sets of parameters to find the best settings for excellent model performance. It should be remarked that deep neural networks can involve many hyperparameter settings. Getting the best set parameters for such a high dimensional space might a challenging task. Opportunely, different strategies and tools can be used to simplify the process. This post will guide you on how to use Optuna for a hyper-parameter search using [PyTorch](https://pytorch.org/) and [PyTorch lightning](https://github.com/PyTorchLightning/pytorch-lightning) framework.

To install these packages

```python
pip install -U optuna
pip install -U torch torchvision
pip install -U pytorch-lightning
```

### Optuna
[Optuna](https://optuna.org/) is an open-source hyperparameter optimization framework.  It automates the process of searching for optimal hyperparameter using Python conditionals, loops, and syntax. The optuna library offers efficiently hyper-parameter search in large spaces while pruning unpromising trials for faster results. Using optuna it is possible to parallelize hyperparameter searches over multiple threads or processes without modifying code.
The optuna optimization problem consists of three main building blocks; **objective function**, **trial** and **study**. Let consider a simple optimisation problem: *Suppose a rectangular garden is to be constructed using a rock wall as one side of the garden and wire fencing for the other three sides as shown in figure belwo. Given  500m of wire fencing, determine the dimensions that would create a garden of maximum area. What is the maximum area?*

Let  $x$ denote the length of the side of the garden perpendicular to the rock wall and  $y$  denote the length of the side parallel to the rock wall. Then the area of the garden $A= x \cdot y$. We want to find the maximum possible area subject to the constraint that the total fencing is 500m. The total amount of fencing used will be  $2x+y$.  Therefore, the constraint equation is 
\begin{align}
500 & = 2x +y \\
y  & = 500-2x\\
A(x) &= x \cdot (500-2x) =  500x - 2x^2
\end{align}

From equation above, $A(x) = 500x - 2x^2$ is an **objective function**, the function to be optimized. To maximize this function, we need to determine optimization constraints. We know that to construct a rectangular garden, we certainly need the lengths of both sides to be positive $y>0$, and  $x>0$. Since $500  = 2x +y$ and $y>0$ then $x<250$. Therefore, we will try to determine the maximum value of A(x) for x over the open interval (0,50).

Optuna [**trial**](https://optuna.readthedocs.io/en/stable/reference/trial.html)  corresponds to a single execution of the **objective function** and is internally instantiated upon each invocation of the function. To obtain the parameters for each trial within a provided *contsrtainst* the [**suggest**](https://optuna.readthedocs.io/en/stable/reference/trial.html) is used. 

```python
trial.suggest_uniform('x', 0, 250)
```

We can now code the objective function that be optimized for our problem.




In [10]:
def gardent_area(trial):
    x = trial.suggest_uniform('x', 0, 250)
    return (500*x - 2*x**2 ) 

Once the objective function has been defined, the [**study object**]() is used to start the optimization. Thus optuna **trial** is a single call of the objective function whereas **study** is  an optimization session, which is a set of trials. We can now create a study and start the optimisation process.

In [16]:
import optuna
study = optuna.create_study(study_name="garden", direction="maximize")
study.optimize(gardent_area, n_trials=10)

[32m[I 2020-06-20 21:26:28,248][0m Finished trial#0 with value: 28205.741307428172 with parameters: {'x': 85.98552388810133}. Best is trial#0 with value: 28205.741307428172.[0m
[32m[I 2020-06-20 21:26:28,383][0m Finished trial#1 with value: 15560.237260738759 with parameters: {'x': 36.42866507932366}. Best is trial#0 with value: 28205.741307428172.[0m
[32m[I 2020-06-20 21:26:28,504][0m Finished trial#2 with value: 27381.28139315071 with parameters: {'x': 168.98135176895596}. Best is trial#0 with value: 28205.741307428172.[0m
[32m[I 2020-06-20 21:26:28,601][0m Finished trial#3 with value: 25826.98243229805 with parameters: {'x': 72.92785020905154}. Best is trial#0 with value: 28205.741307428172.[0m
[32m[I 2020-06-20 21:26:28,708][0m Finished trial#4 with value: 22181.440803890582 with parameters: {'x': 57.66294038157957}. Best is trial#0 with value: 28205.741307428172.[0m
[32m[I 2020-06-20 21:26:28,824][0m Finished trial#5 with value: 26068.17672389483 with parameters: 

Once the study is completed you can get the best parameters as follows

In [17]:
study.best_params

{'x': 141.8606644342003}

Thefore the dimensions that would create a garden of maximum area are $x=129.4033$ and $y=241.1933$. The best area  is the obtained best value which is.

In [18]:
study.best_value

30681.435989674595

## Deep neural net

Suppose we want to build MLP classifier to recognize handwritten digits using the MNIST dataset. We will first build a pytorch lightning model as follows

In [40]:
import torch
import torch.nn as nn
from torch.nn import functional as F
import pytorch_lightning as pl
import  pytorch_lightning.metrics.functional as metrics

In [27]:
class MLP(nn.Module):
    
    def __init__(self, hparams):
        super().__init__()
        self.model = nn.Sequential(
        nn.Linear(hparams['in_size'],hparams['hidden_size']), 
        nn.ReLU(),
        nn.Linear(hparams['hidden_size'],hparams['hidden_size']), 
        torch.nn.ReLU(),
        nn.Linear(hparams['hidden_size'],hparams['out_size'])   
        )
        
    def forward(self, x):
        return self.model(x)

In [35]:
class MLPIL(pl.LightningModule):
    
    def __init__(self, hparams):
        super().__init__()
        self.hparams = hparams
        self.model = MLP(hparams)
    #define forward pass
    def forward(self, x):
        return self.model(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)
    
    ##define validat step
    def training_step(self, x):
        pass

In [36]:
hparams = {"in_size": 28*28, "hidden_size":128, "out_size":10}

In [37]:
model = MLPIL(hparams)

In [38]:
model

MLPIL(
  (model): MLP(
    (model): Sequential(
      (0): Linear(in_features=784, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=128, bias=True)
      (3): ReLU()
      (4): Linear(in_features=128, out_features=10, bias=True)
    )
  )
)