# Super-charge hyper-paramater search with Optuna
> Learn how to perform hyper-paramater search using Optuna
- toc: true 
- badges: true
- comments: true
- author: Anthony Faustine
- show_tags: true
- categories: [Machine learning, Deep learning]
- image: images/post/search.jpg


## Introduction
Training machine learning sometimes involves various hyperparameter settings. Performing a hyperparameter search is an integral element in building machine learning models. It consists of attuning different sets of parameters to find the best settings for excellent model performance. It should be remarked that deep neural networks can involve many hyperparameter settings. Getting the best set parameters for such a high dimensional space might a challenging task. Opportunely, different strategies and tools can be used to simplify the process. This post will guide you on how to use Optuna for a hyper-parameter search using [PyTorch](https://pytorch.org/) and [PyTorch lightning](https://github.com/PyTorchLightning/pytorch-lightning) framework.

To install these packages

```python
pip install -U optuna
pip install -U torch torchvision
pip install -U pytorch-lightning
```

### Optuna
[Optuna](https://optuna.org/) is an open-source hyperparameter optimization framework.  It automates the process of searching for optimal hyperparameter using Python conditionals, loops, and syntax. The optuna library offers efficiently hyper-parameter search in large spaces while pruning unpromising trials for faster results. Using optuna it is possible to parallelize hyperparameter searches over multiple threads or processes without modifying code.
The optuna optimization problem consists of three main building blocks; **objective function**, **trial** and **study**. Let consider a simple optimisation problem: *Suppose a rectangular garden is to be constructed using a rock wall as one side of the garden and wire fencing for the other three sides as shown in figure belwo. Given  500m of wire fencing, determine the dimensions that would create a garden of maximum area. What is the maximum area?*

Let  $x$ denote the length of the side of the garden perpendicular to the rock wall and  $y$  denote the length of the side parallel to the rock wall. Then the area of the garden $A= x \cdot y$. We want to find the maximum possible area subject to the constraint that the total fencing is 500m. The total amount of fencing used will be  $2x+y$.  Therefore, the constraint equation is 
\begin{align}
500 & = 2x +y \\
y  & = 500-2x\\
A(x) &= x \cdot (500-2x) =  500x - 2x^2
\end{align}

From equation above, $A(x) = 500x - 2x^2$ is an **objective function**, the function to be optimized. To maximize this function, we need to determine optimization constraints. We know that to construct a rectangular garden, we certainly need the lengths of both sides to be positive $y>0$, and  $x>0$. Since $500  = 2x +y$ and $y>0$ then $x<250$. Therefore, we will try to determine the maximum value of A(x) for x over the open interval (0,50).

Optuna [**trial**](https://optuna.readthedocs.io/en/stable/reference/trial.html)  corresponds to a single execution of the **objective function** and is internally instantiated upon each invocation of the function. To obtain the parameters for each trial within a provided *contsrtainst* the [**suggest**](https://optuna.readthedocs.io/en/stable/reference/trial.html) is used. 

```python
trial.suggest_uniform('x', 0, 250)
```

We can now code the objective function that be optimized for our problem.




In [1]:
def gardent_area(trial):
    x = trial.suggest_uniform('x', 0, 250)
    return (500*x - 2*x**2 ) 

Once the objective function has been defined, the [**study object**]() is used to start the optimization. Thus optuna **trial** is a single call of the objective function whereas **study** is  an optimization session, which is a set of trials. We can now create a study and start the optimisation process.

In [6]:
import optuna
study = optuna.create_study(study_name="garden", direction="maximize")
study.optimize(gardent_area, n_trials=20)

[32m[I 2020-06-21 09:21:59,460][0m Finished trial#0 with value: 30192.883232182987 with parameters: {'x': 102.00960235427611}. Best is trial#0 with value: 30192.883232182987.[0m
[32m[I 2020-06-21 09:21:59,525][0m Finished trial#1 with value: 11665.34358605057 with parameters: {'x': 223.956193373506}. Best is trial#0 with value: 30192.883232182987.[0m
[32m[I 2020-06-21 09:21:59,587][0m Finished trial#2 with value: 29967.46342643536 with parameters: {'x': 150.32327559345987}. Best is trial#0 with value: 30192.883232182987.[0m
[32m[I 2020-06-21 09:21:59,670][0m Finished trial#3 with value: 30132.313125645393 with parameters: {'x': 148.63986965229094}. Best is trial#0 with value: 30192.883232182987.[0m
[32m[I 2020-06-21 09:21:59,731][0m Finished trial#4 with value: 26891.79238720805 with parameters: {'x': 171.68087195410956}. Best is trial#0 with value: 30192.883232182987.[0m
[32m[I 2020-06-21 09:21:59,791][0m Finished trial#5 with value: 25917.313521440778 with parameters

Once the study is completed you can get the best parameters as follows

In [7]:
study.best_params

{'x': 125.85085124003193}

In [8]:
study.best_value

31248.552104334674

## Hyper-param search for  deep neural net


### MLP model

Suppose we want to build MLP classifier to recognize handwritten digits using the MNIST dataset. We will first build a pytorch MLP model with the following default parameters
```python
hparams = {"in_size": 28*28, "hidden_size":128, "out_size":10, "layer_size":5, "dropout":0.2}
```

In [12]:
import torch
import torch.nn as nn
from torch.nn import functional as F

In [13]:
class MLP(nn.Module):
    def __init__(self, hparams):
        super().__init__()
        layers = [hparams['in_size'], hparams['hidden_size']]+[hparams['hidden_size']*2**i  for i in range(hparams['layer_size']-1)] 
        self.layers = []
        
        for i in range(len(layers)-1):
            layer = nn.Linear(layers[i], layers[i+1])
            self.layers.append(layer) 
            self.layers.append(nn.ReLU())
            if i!=0:
                self.layers.append(nn.Dropout(hparams['dropout']))
                
        out_layer = nn.Linear(layers[-1], hparams['out_size'])
        self.layers.append(out_layer)
        
        ##initilize weights
        for layer in self.layers:
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)
        self.mlp =  nn.Sequential(*self.layers)
        
        
    def forward(self, x):
        return self.mlp(x)

In [14]:
hparams = {"in_size": 28*28, "hidden_size":128, "out_size":10, "layer_size":5, "dropout":0.2}
model = MLP(hparams)
model

MLP(
  (mlp): Sequential(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=128, bias=True)
    (3): ReLU()
    (4): Dropout(p=0.2, inplace=False)
    (5): Linear(in_features=128, out_features=256, bias=True)
    (6): ReLU()
    (7): Dropout(p=0.2, inplace=False)
    (8): Linear(in_features=256, out_features=512, bias=True)
    (9): ReLU()
    (10): Dropout(p=0.2, inplace=False)
    (11): Linear(in_features=512, out_features=1024, bias=True)
    (12): ReLU()
    (13): Dropout(p=0.2, inplace=False)
    (14): Linear(in_features=1024, out_features=10, bias=True)
  )
)

### Pytorch lightning model

The key question is how do we pick these parameters. We will use optuna to search optimal parameters that will give us good performance. First we will create a pytorch lightning model which will provides the structure on how to organize the fundamemtal component of any machine learning project. These elements include the arcnitecture or model, data (train, val and test set), model building (train and val step) and evalutaion (test-step). Since we fine defined our MLP, we go ahead and define the data that we will use for this problem.

In [45]:
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
import os
from torchvision import datasets, transforms
import pytorch_lightning as pl
import  pytorch_lightning.metrics.functional as metrics

class MLPIL(pl.LightningModule):
    
    def __init__(self, hparams):
        super().__init__()
        self.hparams = hparams
        self.model = MLP(hparams)
    
    
    def forward(self, x):
        return self.model(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x.flatten(1, 3))
        loss = F.cross_entropy(logits, y)
        acc  = metrics.accuracy(torch.max(logits, 1)[1], y)
        
        logs = {'loss': loss, "tra_acc":acc}
        return {'loss': loss, 'log': logs}
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x.flatten(1, 3))
        loss = F.cross_entropy(logits, y)
        acc  = metrics.accuracy(torch.max(logits, 1)[1], y)
        
        logs = {'val_loss': loss, "val_acc":acc}
        return logs

    def validation_epoch_end(self, outputs):
        # OPTIONAL
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        avg_acc = torch.stack([x['val_acc'] for x in outputs]).mean()
        logs = {'avg_val_loss': avg_loss, "avg_val_acc":avg_acc}
        return {'val_loss': avg_loss, 'val_acc':avg_acc, 'log': logs}

    
    def train_dataloader(self):
        # REQUIRED
        return DataLoader(MNIST(os.getcwd(), train=True, download=True, 
                                transform=transforms.ToTensor()), 
                                batch_size=self.hparams["batch_size"])

    def val_dataloader(self):
        # OPTIONAL
        return DataLoader(MNIST(os.getcwd(), train=True, download=True, 
                                transform=transforms.ToTensor()),
                                batch_size=self.hparams["batch_size"])

    
    def configure_optimizers(self):
        
        optimizer =  torch.optim.SGD(self.model.parameters(), 
                                         lr=self.hparams['learning_rate'], 
                                         momentum=self.hparams['momentum'], 
                                         nesterov=self.hparams['nesterov'],
                                         weight_decay=self.hparams['weight_decay'])    
        
        return optimizer
         

### Define objective function

In [46]:
class DictLogger(pl.loggers.TensorBoardLogger):
    """PyTorch Lightning `dict` logger."""
    # see https://github.com/PyTorchLightning/pytorch-lightning/blob/50881c0b31/pytorch_lightning/logging/base.py

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metrics = [] 

    def log_metrics(self, metrics, step=None):
        super().log_metrics(metrics, step=step)
        self.metrics.append(metrics)


In [47]:
## Prepare directory
from pathlib import Path
from optuna.integration import PyTorchLightningPruningCallback
DIR = Path(os.getcwd())
MODEL_DIR = DIR/ "MLP"
MODEL_DIR.mkdir(parents=True, exist_ok=True)
print(f"now run `tensorboard --logdir {MODEL_DIR}")

class DictLogger(pl.loggers.TensorBoardLogger):
    """PyTorch Lightning `dict` logger."""
    # see https://github.com/PyTorchLightning/pytorch-lightning/blob/50881c0b31/pytorch_lightning/logging/base.py

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metrics = [] 

    def log_metrics(self, metrics, step=None):
        super().log_metrics(metrics, step=step)
        self.metrics.append(metrics)

default_params = {"in_size": 28*28, "hidden_size":128, "out_size":10, 
           "layer_size":5, "dropout":0.2, "batch_size":32,
          'learning_rate':1e-3, 'momentum':0.9, 'nesterov': True,
          'weight_decay':1e-5,
          'epochs':50}

def objective(trial=None):
    
    if trial is not None:
        lr_param = {'learning_rate': trial.suggest_loguniform("learning_rate", 1e-5, 1e-2)}
        default_params.update(lr_param)
        wdecay_param = {'weight_decay': trial.suggest_loguniform('weight_decay', 1e-5, 1e-1)}
        default_params.update(wdecay_param)
        hidden_size_param={'hidden_size': trial.suggest_categorical("hidden_size", [8*2**i for i in range(6)])}
        default_params.update(hidden_size_param)
        batch_size_param={'hidden_size': trial.suggest_categorical("batch_size", [16, 32, 64, 128])}
        default_params.update(hidden_size_param)
        layer_size_param={"layer_size": trial.suggest_int("layer_size", 2, 5)}
        default_params.update(layer_size_param)
        dropout_param={"dropout": trial.suggest_float('dropout', 0.0, 1.0)}
        default_params.update(dropout_param)
        momentum_param = {'momentum': trial.suggest_float('momentum', 0.0, 1.0)}
        default_params.update(momentum_param)
        nest_param={'nesterov':trial.suggest_categorical("nesterov", [False, True])}
        default_params.update(nest_param)
        early_stopping = PyTorchLightningPruningCallback(trial, monitor='val_acc')
        checkpoint_callback = pl.callbacks.ModelCheckpoint(
        os.path.join(MODEL_DIR, "trial_{}".format(trial.number)), monitor="val_acc"
    )
        logger = DictLogger(MODEL_DIR,  version=trial.number)
    else: 
        early_stopping = pl.callbacks.EarlyStopping(monitor='val_acc', min_delta=1e-4, patience=20, mode="max")  
        logger = DictLogger(MODEL_DIR)
    trainer = pl.Trainer(
                    logger = logger,
                    checkpoint_callback=checkpoint_callback,
                    max_epochs=default_params['epochs'],
                    gpus=[0] if torch.cuda.is_available() else None,
                    early_stop_callback=early_stopping,
                    accumulate_grad_batches=1,
                     benchmark=True)
    
    model = MLPIL(default_params)
    trainer.fit(model)
    if trial is not None:
        return logger.metrics[-1]['val_acc']

    

now run `tensorboard --logdir /Users/sambaiga/Documents/sambaiga/_notebooks/MLP


In [48]:
import optuna
def run_study(num_trials=2):
    study = optuna.create_study(direction='maximize')
    study.optimize(objective, n_trials=num_trials)
    return study

In [49]:
study = run_study(num_trials=2)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 3 M   


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…



HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

[33m[W 2020-06-21 11:52:45,679][0m Setting status of trial#0 as TrialState.FAIL because of the following error: KeyError('val_acc')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/optuna/study.py", line 732, in _run_trial
    result = func(trial)
  File "<ipython-input-47-7c6150fb2720>", line 66, in objective
    return logger.metrics[-1]['val_acc']
KeyError: 'val_acc'[0m





KeyError: 'val_acc'

In [36]:
model = MLPIL(default_params)

In [39]:
x, y = next(iter(model.train_dataloader()))

torch.Size([32, 784])