# Super-charge Deep learning hyper-parameter search with Optuna
>  Learn how to perform deep learning hyper-parameter search using Pytorch Lightning and Optuna
- toc: False
- badges: true
- comments: true
- categories: [Deep learning, Machine learning]
- image:  images/post/search.jpg
- author: Anthony Faustine


## Introduction
Training machine learning sometimes involves various hyperparameter settings. Performing a hyperparameter search is an integral element in building machine learning models. It consists of attuning different sets of parameters to find the best settings for best model performance. It should be remarked that deep neural networks can involve many hyperparameter settings. Getting the best set parameters for such a high dimensional space might a challenging task. Opportunely, different strategies and tools can be used to simplify the process. This post will guide you on how to use Optuna for a hyper-parameter search using [PyTorch](https://pytorch.org/) and [PyTorch lightning](https://github.com/PyTorchLightning/pytorch-lightning) framework.
The notebook with all the code for this post can be found on this [colab link](https://colab.research.google.com/drive/1QVST56bq3zNyIYx9595HVcq5fwFNH44x?usp=sharing).

### Optuna
[Optuna](https://optuna.org/) is an open-source hyperparameter optimization framework. It automates the process of searching for optimal hyperparameter using Python conditionals, loops, and syntax. The optuna library offers efficiently hyper-parameter search in large spaces while pruning unpromising trials for faster results. It is also possible to run a hyperparameter search over multiple processes without modifying code.
For a brief introduction of optuna, you can watch this video.

> youtube: https://youtu.be/J_aymk4YXhg

The optuna optimization problem consists of three main building blocks; **objective function**, **trial**, and **study**. Let consider a simple optimisation problem: *Suppose a rectangular garden is to be constructed using a rock wall as one side of the garden and wire fencing for the other three sides as shown in figure below (taken from this [link](https://math.libretexts.org/Bookshelves/Calculus/Map%3A_Calculus_-_Early_Transcendentals_(Stewart)/04%3A_Applications_of_Differentiation/4.07%3A_Optimization_Problems)). Given 500m of wire fencing, determine the dimensions that would create a garden of maximum area. What is the maximum area?*
![](my_icons/optuna_one.png)

Let $x$ denote the side of the garden's side perpendicular to the rock wall, and $y$ indicates the side parallel to the rock wall. Then the area of the garden $A= x \cdot y$. We want to find the maximum possible area subject to the constraint that the total fencing is 500m. The total amount of fencing used will be $2x+y$. Therefore, the constraint equation is 
$$
\begin{aligned}
500 & = 2x +y \\
y & = 500-2x\\
A(x) &= x \cdot (500-2x) = 500x - 2x^2
\end{aligned}

$$

From equation above, $A(x) = 500x - 2x^2$ is an **objective function**, the function to be optimized. To maximize this function, we need to determine optimization constraints. We know that to construct a rectangular garden, we certainly need the lengths of both sides to be positive $y>0$, and $x>0$. Since $500 = 2x +y$ and $y>0$ then $x<250$. Therefore, we will try to determine the maximum value of A(x) for x over the open interval (0,50).

Optuna [**trial**](https://optuna.readthedocs.io/en/stable/reference/trial.html) corresponds to a single execution of the **objective function** and is internally instantiated upon each invocation of the function. 
To obtain the parameters for each trial within a provided *constraints* the [**suggest**](https://optuna.readthedocs.io/en/stable/reference/trial.html)  method is used. 
```python
trial.suggest_uniform('x', 0, 250)
```

We can now code the objective function that be optimized for our problem.
```python
def gardent_area(trial):
 x = trial.suggest_uniform('x', 0, 250)
 return (500*x - 2*x**2 ) 
```

Once the objective function has been defined, the [study object]() is used to start the optimization.  The **study** is an optimization session, a set of trials. Optuna provide different [sampler strategies](https://optuna.readthedocs.io/en/latest/reference/samplers.html) such as Random Sampler and [Tree-structured Parzen Estimator (TPE)](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf) sampler. A sampler has the responsibility to determine the parameter values to be evaluated in a trial. By default, Optuna uses [TPE](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf) sampler, which is a form of Bayesian Optimization. The TPE provides a more efficient search than a random sampler search by choosing points closer to past good results. It possible to add a custom sampler as described in this [link](https://optuna.readthedocs.io/en/latest/tutorial/sampler.html#overview-of-sampler)
Let create a study and start the optimization process. 

In [1]:
##collapse
import optuna
def gardent_area(trial):
    x = trial.suggest_uniform('x', 0, 250)
    return (500*x - 2*x**2 ) 

study = optuna.create_study(study_name="garden", direction="maximize")
study.optimize(gardent_area, n_trials=20)
print(study.best_params)
print(study.best_value)

[32m[I 2020-06-22 22:52:09,728][0m Finished trial#0 with value: 10151.49608988398 with parameters: {'x': 22.290448569483033}. Best is trial#0 with value: 10151.49608988398.[0m
[32m[I 2020-06-22 22:52:09,773][0m Finished trial#1 with value: 17318.662620758565 with parameters: {'x': 208.46058165158396}. Best is trial#1 with value: 17318.662620758565.[0m
[32m[I 2020-06-22 22:52:09,823][0m Finished trial#2 with value: 27521.441691493033 with parameters: {'x': 168.17729906158428}. Best is trial#2 with value: 27521.441691493033.[0m
[32m[I 2020-06-22 22:52:09,895][0m Finished trial#3 with value: 13298.982948674558 with parameters: {'x': 219.7391604652623}. Best is trial#2 with value: 27521.441691493033.[0m
[32m[I 2020-06-22 22:52:09,963][0m Finished trial#4 with value: 28781.742638896496 with parameters: {'x': 89.86983232958102}. Best is trial#4 with value: 28781.742638896496.[0m
[32m[I 2020-06-22 22:52:10,029][0m Finished trial#5 with value: 24178.31616974594 with parameters

{'x': 126.57020554529808}
31245.068909091035


Once the study is completed, you can get the best parameters using ```study.best_params``` and ```study.best_value``` will give you the best value.

## Hyper-param search for deep neural net

Suppose we want to build MLP classifier to recognize handwritten digits using the MNIST dataset. We will first build a pytorch MLP model with the following default parameters
```python
hparams = {"in_size": 28*28, "hidden_size":128, "out_size":10, "layer_size":5, "dropout":0.2}
```

In [2]:
#collapse
import torch
import torch.nn as nn
from torch.nn import functional as F

class MLP(nn.Module):
    def __init__(self, hparams):
        super().__init__()
        layers = [hparams['in_size'], hparams['hidden_size']]+[hparams['hidden_size']*2**i  for i in range(hparams['layer_size']-1)] 
        self.layers = []
        
        for i in range(len(layers)-1):
            layer = nn.Linear(layers[i], layers[i+1])
            self.layers.append(layer) 
            self.layers.append(nn.ReLU())
            if i!=0:
                self.layers.append(nn.Dropout(hparams['dropout']))
                
        out_layer = nn.Linear(layers[-1], hparams['out_size'])
        self.layers.append(out_layer)
        
        ##initilize weights
        for layer in self.layers:
            if isinstance(layer, nn.Linear):
                nn.init.xavier_uniform_(layer.weight)
        self.mlp =  nn.Sequential(*self.layers)
        
        
    def forward(self, x):
        return self.mlp(x)

In [3]:
#collapse
hparams = {"in_size": 28*28, "hidden_size":128, "out_size":10, "layer_size":5, "dropout":0.2}
model = MLP(hparams)
model

MLP(
  (mlp): Sequential(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=128, bias=True)
    (3): ReLU()
    (4): Dropout(p=0.2, inplace=False)
    (5): Linear(in_features=128, out_features=256, bias=True)
    (6): ReLU()
    (7): Dropout(p=0.2, inplace=False)
    (8): Linear(in_features=256, out_features=512, bias=True)
    (9): ReLU()
    (10): Dropout(p=0.2, inplace=False)
    (11): Linear(in_features=512, out_features=1024, bias=True)
    (12): ReLU()
    (13): Dropout(p=0.2, inplace=False)
    (14): Linear(in_features=1024, out_features=10, bias=True)
  )
)

For the above MLP model, we need to specify the following parameters *hidden size, dropout, and number of linear layers*. The critical question is, how do we pick these parameters. We will use optuna to search for optimal parameters that will give us an excellent performance. First, we will create a PyTorch lightning model that will provide the structure for organizing the fundamentals component of any machine learning project. These elements include the data, architecture or model, optimizer, loss function, training, and evaluation step. Since we fine defined our MLP, we go ahead and create a PyTorch lightning module.

In [4]:
#collapse
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
import os
from torchvision import datasets, transforms
import pytorch_lightning as pl
import  pytorch_lightning.metrics.functional as metrics

class MLPIL(pl.LightningModule):
    
    def __init__(self, hparams):
        super().__init__()
        self.hparams = hparams
        self.model = MLP(hparams)
    
    
    def forward(self, x):
        return self.model(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x.reshape(x.size(0), -1))
        loss = F.cross_entropy(logits, y)
        acc  = metrics.accuracy(torch.max(logits, 1)[1], y)
        
        logs = {'loss': loss, "tra_acc":acc}
        return {'loss': loss, 'log': logs}
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x.reshape(x.size(0), -1))
        loss = F.cross_entropy(logits, y)
        acc  = metrics.accuracy(torch.max(logits, 1)[1], y)
        
        logs = {'val_loss': loss, "val_acc":acc}
        return logs

    def validation_epoch_end(self, outputs):
        # OPTIONAL
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        avg_acc = torch.stack([x['val_acc'] for x in outputs]).mean()
        logs = {'val_loss': avg_loss, "val_acc":avg_acc}
        return {'log':logs}

    
    def train_dataloader(self):
        # REQUIRED
        return DataLoader(MNIST(os.getcwd(), train=True, download=True, 
                                transform=transforms.ToTensor()), 
                                batch_size=self.hparams["batch_size"], num_workers=4)

    def val_dataloader(self):
        # OPTIONAL
        return DataLoader(MNIST(os.getcwd(), train=False, download=True, 
                                transform=transforms.ToTensor()),
                                batch_size=self.hparams["batch_size"], num_workers=4)

    
    def configure_optimizers(self):
        
        optimizer =  torch.optim.SGD(self.model.parameters(), 
                                         lr=self.hparams['learning_rate'], 
                                         momentum=self.hparams['momentum'], 
                                         nesterov=self.hparams['nesterov'],
                                         weight_decay=self.hparams['weight_decay'])    
        
        return optimizer
         


To learn the parameters of the MLP we will use Stochastic Gradient Descent Optimizer (SGD) optimizer. The SGD has several other hper-parameters such as learning rate which we can also optimize.
```pyhon
 optimizer = torch.optim.SGD(self.model.parameters(), 
 lr=self.hparams['learning_rate'], 
 momentum=self.hparams['momentum'], 
 nesterov=self.hparams['nesterov'],
 weight_decay=self.hparams['weight_decay']) 
```
Thus the SGD optimizer will add four additional parameters. We can also treat the batch size as hyper-parameter to optimize. We will have the following set of parameters to optimizers.

```python
default_params = {"in_size": 28*28, "hidden_size":128, "out_size":10, 
 "layer_size":5, "dropout":0.2, "batch_size":32,
 'learning_rate':1e-3, 'momentum':0.9, 'nesterov': True,
 'weight_decay':1e-5,
 'epochs':2}
```

### Defining the hyperparameters and objective function to be optimized
Since we know all the parameters that we want to optimize, we will use the optuna **suggest** to define a search space for each hyperparameter that we want to tune. Optuna supports a variety of suggests which can be used to optimize floats, integers, or discrete categorical values. Numerical values such as learning rate can be suggested using a logarithmic scale.


In [20]:
def get_search_space(trial, default_params):
    lr_param = {'learning_rate': trial.suggest_loguniform("learning_rate", 1e-4, 1e-3)}
    default_params.update(lr_param)
    wdecay_param = {'weight_decay': trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)}
    default_params.update(wdecay_param)
    hidden_size_param={'hidden_size': trial.suggest_categorical("hidden_size", [8*2**i for i in range(6)])}
    default_params.update(hidden_size_param)
    batch_size_param={'hidden_size': trial.suggest_categorical("batch_size", [16, 32, 64, 128])}
    default_params.update(hidden_size_param)
    layer_size_param={"layer_size": trial.suggest_int("layer_size", 2, 5)}
    default_params.update(layer_size_param)
    dropout_param={"dropout": trial.suggest_float('dropout', 0.1, .5)}
    default_params.update(dropout_param)
    momentum_param = {'momentum': trial.suggest_float('momentum', 0.8, 1.0)}
    default_params.update(momentum_param)
    nest_param={'nesterov':trial.suggest_categorical("nesterov", [False, True])}
    default_params.update(nest_param)
    return default_params

To create an objective function, we use the trainer module within PyTorch lightning with the default TensorBoard logger. The trainer will return the validation score. Optuna will use this score to evaluate the performance of the hyperparameters and decide where to sample in upcoming trials.

In [21]:
#collapse
from pathlib import Path
from optuna.integration import PyTorchLightningPruningCallback
DIR = Path(os.getcwd())
MODEL_DIR = DIR/ "MLP"
MODEL_DIR.mkdir(parents=True, exist_ok=True)
print(f"now run `tensorboard --logdir {MODEL_DIR}")

class DictLogger(pl.loggers.TensorBoardLogger):
    """PyTorch Lightning `dict` logger."""
    # see https://github.com/PyTorchLightning/pytorch-lightning/blob/50881c0b31/pytorch_lightning/logging/base.py

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metrics = [] 

    def log_metrics(self, metrics, step=None):
        super().log_metrics(metrics, step=step)
        self.metrics.append(metrics)

now run `tensorboard --logdir /Users/sambaiga/Documents/sambaiga/_notebooks/MLP


In addition to sampling strategies, Optuna provides a mechanism to automatically stops unpromising trials at the early stages of the training. This allows computing time to be used for tests that show more potential. This feature is called [**pruning**](https://optuna.readthedocs.io/en/stable/tutorial/pruning.html), and it is a form of automated early-stopping. The [PyTorchLightingPruningCallBack](https://optuna.readthedocs.io/en/stable/reference/integration.html) provides integration Optuna pruning  function to PyTorch lightning. Pruner using the median stopping rule.

In [22]:
#collapse
default_params = {"in_size": 28*28, "hidden_size":128, "out_size":10, 
           "layer_size":5, "dropout":0.2, "batch_size":32,
          'learning_rate':1e-3, 'momentum':0.9, 'nesterov': True,
          'weight_decay':1e-5,
          'epochs':50}

def objective(trial=None):
    
    if trial is not None:
        hparams = get_search_space(trial, default_params) 
        early_stopping = PyTorchLightningPruningCallback(trial, monitor='val_acc')
        checkpoint_callback = pl.callbacks.ModelCheckpoint(
        os.path.join(MODEL_DIR, "trial_{}".format(trial.number)), monitor="val_acc"
        )   
    
        logger = DictLogger(MODEL_DIR,  version=trial.number)
    else: 
        early_stopping = pl.callbacks.EarlyStopping(monitor='val_acc', min_delta=1e-4, patience=20, mode="max")  
        logger = DictLogger(MODEL_DIR)
    trainer = pl.Trainer(
                    logger = logger,
                    checkpoint_callback=checkpoint_callback,
                    max_epochs=10,
                    gpus=1 if torch.cuda.is_available() else None,
                    early_stop_callback=early_stopping
                     )
    
    model = MLPIL(hparams)
    trainer.fit(model)
    if trial is not None:
        return logger.metrics[-1]['val_acc']


To start the optimization, we create a study object and pass the objective function to method optimize() as follows.

In [25]:
##collapse
import optuna
def run_study(num_trials=10):
    #Activate the pruning feature. `MedianPruner` stops unpromising 
    # Set up the median stopping rule as the pruning condition
    pruner = optuna.pruners.MedianPruner()
    study = optuna.create_study(pruner=pruner, direction='maximize')
    study.optimize(objective, n_trials=num_trials)
    print("Number of finished trials: {}".format(len(study.trials)))
    print("Best trial:")
    trial = study.best_trial
    print("  Value: {}".format(trial.value))
    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))
    return study

To turn on the pruning feature, we hve to set up the the pruning condition which periodically monitors the intermediate objective values. Several [pruning condistions](https://optuna.readthedocs.io/en/latest/reference/pruners.html) such as [Hyperband](http://www.jmlr.org/papers/volume18/16-558/16-558.pdf), [Successive Halving](https://arxiv.org/abs/1502.07943) exists as decribed in optuna documentation. For this example we will use the ```MedianPruner()``` which prune if the trial’s best intermediate result is worse than median of intermediate results of previous trials at the same step.

In [26]:
#collapse
study = run_study(num_trials=10)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 817 K 


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




[32m[I 2020-06-22 23:36:32,489][0m Finished trial#0 with value: 0.763079047203064 with parameters: {'learning_rate': 0.0002781761413312715, 'weight_decay': 0.00026786709389757784, 'hidden_size': 128, 'batch_size': 16, 'layer_size': 5, 'dropout': 0.4475426537245917, 'momentum': 0.8343662356211846, 'nesterov': True}. Best is trial#0 with value: 0.763079047203064.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 286 K 


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

[32m[I 2020-06-22 23:38:10,138][0m Finished trial#1 with value: 0.8695088028907776 with parameters: {'learning_rate': 0.00019459451111337528, 'weight_decay': 0.0008316437322277936, 'hidden_size': 128, 'batch_size': 16, 'layer_size': 4, 'dropout': 0.30811559721491616, 'momentum': 0.8792675991877269, 'nesterov': True}. Best is trial#1 with value: 0.8695088028907776.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 6 K   





HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




[32m[I 2020-06-22 23:39:32,721][0m Finished trial#2 with value: 0.5426318049430847 with parameters: {'learning_rate': 0.00031379768467785276, 'weight_decay': 0.00024206467053167096, 'hidden_size': 8, 'batch_size': 128, 'layer_size': 3, 'dropout': 0.4051284289161241, 'momentum': 0.8700648073603755, 'nesterov': True}. Best is trial#1 with value: 0.8695088028907776.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 16 K  


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




[32m[I 2020-06-22 23:41:03,403][0m Finished trial#3 with value: 0.7521964907646179 with parameters: {'learning_rate': 0.0009969827342691411, 'weight_decay': 0.00191834042067662, 'hidden_size': 16, 'batch_size': 64, 'layer_size': 4, 'dropout': 0.4199071669462878, 'momentum': 0.8583730620569984, 'nesterov': False}. Best is trial#1 with value: 0.8695088028907776.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 16 K  


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

[32m[I 2020-06-22 23:42:18,938][0m Finished trial#4 with value: 0.9208266735076904 with parameters: {'learning_rate': 0.0003222124173376937, 'weight_decay': 0.0002461730007892341, 'hidden_size': 16, 'batch_size': 128, 'layer_size': 4, 'dropout': 0.27574042289278133, 'momentum': 0.9910653091313444, 'nesterov': True}. Best is trial#4 with value: 0.9208266735076904.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores






  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 3 M   


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

[32m[I 2020-06-22 23:47:26,157][0m Finished trial#5 with value: 0.9458865523338318 with parameters: {'learning_rate': 0.0009011149211580522, 'weight_decay': 0.008235695017360493, 'hidden_size': 256, 'batch_size': 16, 'layer_size': 5, 'dropout': 0.13696254402480412, 'momentum': 0.9454761908726577, 'nesterov': True}. Best is trial#5 with value: 0.9458865523338318.[0m





GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 817 K 


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




[32m[I 2020-06-22 23:49:29,151][0m Finished trial#6 with value: 0.7892372012138367 with parameters: {'learning_rate': 0.00011431229408656762, 'weight_decay': 9.658630631397371e-05, 'hidden_size': 128, 'batch_size': 16, 'layer_size': 5, 'dropout': 0.4476176051738697, 'momentum': 0.9355278762575644, 'nesterov': True}. Best is trial#5 with value: 0.9458865523338318.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 118 K 


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

[32m[I 2020-06-22 23:50:47,697][0m Finished trial#7 with value: 0.8660143613815308 with parameters: {'learning_rate': 0.00011985939874632534, 'weight_decay': 0.00013574996304822177, 'hidden_size': 128, 'batch_size': 128, 'layer_size': 2, 'dropout': 0.43912042347804847, 'momentum': 0.8847775912911807, 'nesterov': False}. Best is trial#5 with value: 0.9458865523338318.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 817 K 





HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…




[32m[I 2020-06-22 23:52:38,352][0m Finished trial#8 with value: 0.7717651724815369 with parameters: {'learning_rate': 0.00021577654778061675, 'weight_decay': 0.00010839860995588048, 'hidden_size': 128, 'batch_size': 64, 'layer_size': 5, 'dropout': 0.32940522064176114, 'momentum': 0.8292747404960318, 'nesterov': False}. Best is trial#5 with value: 0.9458865523338318.[0m
GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name  | Type | Params
-------------------------------
0 | model | MLP  | 6 K   


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validation sanity check', layout=Layout…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Training', layout=Layout(flex='2'), max…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Validating', layout=Layout(flex='2'), m…

[32m[I 2020-06-22 23:53:47,373][0m Finished trial#9 with value: 0.5784744620323181 with parameters: {'learning_rate': 0.00034234441557670246, 'weight_decay': 7.769001540971277e-05, 'hidden_size': 8, 'batch_size': 64, 'layer_size': 3, 'dropout': 0.3206051108604173, 'momentum': 0.8964296642692322, 'nesterov': False}. Best is trial#5 with value: 0.9458865523338318.[0m



Number of finished trials: 10
Best trial:
  Value: 0.9458865523338318
  Params: 
    learning_rate: 0.0009011149211580522
    weight_decay: 0.008235695017360493
    hidden_size: 256
    batch_size: 16
    layer_size: 5
    dropout: 0.13696254402480412
    momentum: 0.9454761908726577
    nesterov: True


After the study is completed, we can export trials as a pandas data frame. This provides various features to analyze studies. It is also useful to draw a histogram of objective values and to export trials as a CSV file. 
```python
df = study.trials_dataframe()
```

## Visualize study

In [27]:
import pandas as pd
df = study.trials_dataframe()

In [28]:
df

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_batch_size,params_dropout,params_hidden_size,params_layer_size,params_learning_rate,params_momentum,params_nesterov,params_weight_decay,state
0,0,0.763079,2020-06-22 23:33:43.417440,2020-06-22 23:36:32.488858,00:02:49.071418,16,0.447543,128,5,0.000278,0.834366,True,0.000268,COMPLETE
1,1,0.869509,2020-06-22 23:36:32.492923,2020-06-22 23:38:10.137740,00:01:37.644817,16,0.308116,128,4,0.000195,0.879268,True,0.000832,COMPLETE
2,2,0.542632,2020-06-22 23:38:10.139134,2020-06-22 23:39:32.720646,00:01:22.581512,128,0.405128,8,3,0.000314,0.870065,True,0.000242,COMPLETE
3,3,0.752196,2020-06-22 23:39:32.722697,2020-06-22 23:41:03.401500,00:01:30.678803,64,0.419907,16,4,0.000997,0.858373,False,0.001918,COMPLETE
4,4,0.920827,2020-06-22 23:41:03.405764,2020-06-22 23:42:18.937616,00:01:15.531852,128,0.27574,16,4,0.000322,0.991065,True,0.000246,COMPLETE
5,5,0.945887,2020-06-22 23:42:18.939922,2020-06-22 23:47:26.156283,00:05:07.216361,16,0.136963,256,5,0.000901,0.945476,True,0.008236,COMPLETE
6,6,0.789237,2020-06-22 23:47:26.158874,2020-06-22 23:49:29.150356,00:02:02.991482,16,0.447618,128,5,0.000114,0.935528,True,9.7e-05,COMPLETE
7,7,0.866014,2020-06-22 23:49:29.152156,2020-06-22 23:50:47.697022,00:01:18.544866,128,0.43912,128,2,0.00012,0.884778,False,0.000136,COMPLETE
8,8,0.771765,2020-06-22 23:50:47.699384,2020-06-22 23:52:38.352413,00:01:50.653029,64,0.329405,128,5,0.000216,0.829275,False,0.000108,COMPLETE
9,9,0.578474,2020-06-22 23:52:38.354018,2020-06-22 23:53:47.372384,00:01:09.018366,64,0.320605,8,3,0.000342,0.89643,False,7.8e-05,COMPLETE


In [29]:
study.best_params

{'learning_rate': 0.0009011149211580522,
 'weight_decay': 0.008235695017360493,
 'hidden_size': 256,
 'batch_size': 16,
 'layer_size': 5,
 'dropout': 0.13696254402480412,
 'momentum': 0.9454761908726577,
 'nesterov': True}

In [30]:
optuna.visualization.plot_contour(study)

In [31]:
optuna.visualization.plot_optimization_history(study)

In [32]:
optuna.visualization.plot_param_importances(study)


plot_param_importances is experimental (supported from v1.5.0). The interface can change in the future.


get_param_importances is experimental (supported from v1.3.0). The interface can change in the future.


MeanDecreaseImpurityImportanceEvaluator is experimental (supported from v1.5.0). The interface can change in the future.



In [33]:
import shutil
shutil.rmtree(MODEL_DIR)