<a href="https://colab.research.google.com/github/rahiakela/deep-learning-research-and-practice/blob/main/inside-deep-learing/05-modern-training-techniques/04_hyperparameter_optimization_with_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Hyperparameter optimization with Optuna

The improvements so far have focused on optimizing when we have gradients.
But hyperparameters are things we would like to optimize for which we do not have
any gradients, such as the initial learning rate $\eta$ to use and the value of the momentum
term $\mu$. 

We would also like to optimize the architecture of our networks: 

should we use two layers or three? 

How about the number of neurons in each hidden layer?

The first hyperparameter tuning method most people learn in machine learning
is called grid search. 

While valuable, grid search works well only for optimizing one or two variables at a time due to its exponential cost as more variables are added.

When training a neural network, we usually have at least three parameters we want to optimize(number of layers, number of neurons in each layer, and learning rate $\eta$). 

We instead use a newer approach—**Optuna**—to tuning hyperparameters, which works much better.

Optuna does a better job of hyperparameter optimization by
using a Bayesian technique to model the hyperparameter problem as its own machine learning task.

For Optuna, we define a function that we want to minimize (or
maximize), which takes as input a trial object. This trial object is used to get guesses
for each parameter we want to tune and returns a score at the end.

<img src='https://github.com/rahiakela/deep-learning-research-and-practice/blob/main/inside-deep-learing/05-modern-training-techniques/images/6.png?raw=1' width='600'/>

Let’s look at a toy function that we want to minimize:

$$ 
f(x, y) = abs(x-3)*(y+2)
$$

It’s easy to tell that one minimum exists at $x = 3$ and $y = -2$.

So now, let's try to find it with Optuna.

##Setup

In [None]:
!pip install optuna

In [None]:
from tqdm.autonotebook import tqdm

import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow

import pandas as pd

from sklearn.metrics import accuracy_score

from sklearn.datasets import make_moons

import time

In [None]:
!wget https://github.com/EdwardRaff/Inside-Deep-Learning/raw/main/idlmam.py

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision 
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader

import optuna

from idlmam import train_simple_network, Flatten, weight_reset, set_seed, run_epoch

In [5]:
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('png', 'pdf')
from IPython.display import display_pdf
from IPython.display import Latex

torch.backends.cudnn.deterministic=True
set_seed(45)

In [6]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

##Dataset

We use the
Fashion-MNIST dataset because it is slightly more challenging while retaining the same
size and shape as the original MNIST corpus, which will let us accomplish some testing
in a reasonable time.

In [None]:
epochs = 50
B = 256

train_data = torchvision.datasets.FashionMNIST("./data", train=True, transform=transforms.ToTensor(), download=True)
test_data = torchvision.datasets.FashionMNIST("./data", train=False, transform=transforms.ToTensor(), download=True)

training_loader = DataLoader(train_data, batch_size=B, shuffle=True)
testing_loader = DataLoader(test_data, batch_size=B)

##Training algorithem

Some problems take just a few epochs, others take hundreds to thousands, and
both of these factors change based on how much data you have. 

For these reasons, I don’t like requiring a learning rate schedule to always be used. I like to work without one first and then add one based on the problem at hand. 

However, we must always use some kind of optimizer. So if none is given, we  use a good default.

In [8]:
def train_network(model, loss_func, train_loader, val_loader=None, test_loader=None, 
                  score_funcs=None, epochs=50, device="cpu", checkpoint_file=None, 
                  lr_schedule=None, optimizer=None, disable_tqdm=False):
  """
  Train simple neural networks
    
  Keyword arguments:
  model -- the PyTorch model / "Module" to train
  loss_func -- the loss function that takes in batch in two arguments, the model outputs and the labels, and returns a score
  train_loader -- PyTorch DataLoader object that returns tuples of (input, label) pairs. 
  val_loader -- Optional PyTorch DataLoader to evaluate on after every epoch
  test_loader -- Optional PyTorch DataLoader to evaluate on after every epoch
  score_funcs -- A dictionary of scoring functions to use to evalue the performance of the model
  epochs -- the number of training epochs to perform
  device -- the compute lodation to perform training
  lr_schedule -- the learning rate schedule used to alter \eta as the model trains. If this is not None than the user must also provide the optimizer to use. 
  optimizer -- the method used to alter the gradients for learning
  """
  if score_funcs == None:
    score_funcs = {}

  tracking = ["epoch", "total time", "train loss"]

  if val_loader  is not None:
    tracking.append("val loss")

  if test_loader is not None:
    tracking.append("test loss")
  
  for eval_score in score_funcs:
    tracking.append("train " + eval_score)
    if val_loader is not None:
      tracking.append("val " + eval_score)
    if test_loader is not None:
      tracking.append("test " + eval_score)

  # How long have we spent in the training loop?
  total_train_time = 0
  results = {}
  # Initialize every item with an empty list
  for item in tracking:
    results[item] = []

  if optimizer == None:
    # AdamW optimizer is a good default optimizer
    optimizer = torch.optim.AdamW(model.parameters())

  # Place the model on the correct compute resource (CPU or GPU)
  model.to(device)

  # iterating through all the data (batches) multiple times (epochs)
  for epoch in tqdm(range(epochs), desc="Epoch", disable=disable_tqdm):
    # Put the model in training mode
    model = model.train()

    # train the model
    total_train_time += run_epoch(model, optimizer, train_loader, loss_func, device, results, score_funcs, prefix="train", desc="Training")
    
    results["total time"].append(total_train_time)
    results["epoch"].append(epoch)

    if val_loader is not None:
      #  Put the model to "evaluation" mode, b/c we don't want to make any updates!
      model = model.eval()
      with torch.no_grad():
        run_epoch(model, optimizer, val_loader, loss_func, device, results, score_funcs, prefix="val", desc="Validating")

    # In PyTorch, the convention is to update the learning rate after every epoch
    if lr_schedule is not None:
      if isinstance(lr_schedule, torch.optim.lr_scheduler.ReduceLROnPlateau):
        lr_schedule.step(results["val loss"][-1])
      else:
        lr_schedule.step()

    if test_loader is not None:
      model = model.eval()
      with torch.no_grad():
        run_epoch(model, optimizer, test_loader, loss_func, device, results, score_funcs, prefix="test", desc="Testing")

    # lets us save the model, the optimizer state, and other information, all in one object
    if checkpoint_file is not None:
      torch.save({
          "epoch": epoch,
          "model_state_dict": model.state_dict(),
          "optimizer_state_dict": optimizer.state_dict(),
          "results": results
      }, checkpoint_file)

  # Finally, convert the results into a pandas DataFrame
  return pd.DataFrame.from_dict(results)

##Optuna trail object

Optuna figures out how many hyperparameters exist by means of us using the trial
object to obtain a guess for each parameter. This happens with the suggest_uniform
function, which requires us to provide a range of possible values.

In [9]:
def toy_func(trial):
  # The below two calls ask optuna for two parameters, and definethe minimum and maximum value for each one
  x = trial.suggest_uniform("x", -10.0, 10.0)
  y = trial.suggest_uniform("y", -10.0, 10.0)
  # Now we can compute and return the result. Optuna will try to minimize this value
  return abs((x - 3) * (y + 2))

Now we can use the `create_study` function to build the task
and call optimize with the number of trials we want to let Optuna have to minimize
the function.

In [None]:
# If you said direction='maximize' Optuna would try and maximize the value returned by toy_func
study = optuna.create_study(direction="minimize")
# We tell Optuna which function to minimize, and that it gets 100 attempts to do so
study.optimize(toy_func, n_trials=100)

We can access true answer using `study.best_params`, which contains a dict
object mapping the hyperparameters to the values that, in combination, gave the best result.

In [11]:
print(study.best_params)

{'x': 3.0021287951785474, 'y': -5.426301445265385}


We can use a contour plot to see an example.

In [12]:
optuna.visualization.plot_contour(study)

##Optuna with PyTorch

We do not want to go crazy, as optimizing without any
gradients is still very difficult and Optuna is not a magic bullet. But we can use Optuna
to help us make some decisions. 

For example, how many neurons should we have in
each layer, and how many layers?

1. Create train/validation splits
2. Ask Optuna to give us three critical hyperparameters
3. Define our model using the parameters
4. Compute and return the result from the validation split

In [17]:
def objective(trial):
  train_subset = int(len(train_data) * 0.8)
  test_subset = len(train_data) - train_subset 

  split = torch.utils.data.random_split(train_data, [train_subset, test_subset])

  train_loader = DataLoader(split[0], batch_size=B, shuffle=True)
  val_loader = DataLoader(split[1], batch_size=B, shuffle=False)

  # search hidden layer size
  n = trial.suggest_int("neurons_per_layer", 16, 256)
  layers = trial.suggest_int("hidden_layers", 1, 6)

  #How many values are in the input?
  D = 28*28 #28 * 28 images
  # How many channels are in the input?
  C = 1
  # How many classes are there?
  classes = 10

  # At least one hidden layer, that take in D inputs
  sequential_layers = [
    nn.Flatten(),
    nn.Linear(D, n),
    nn.Tanh()
  ]

  # Now lets add in a variable number of hidden layers, depending on what Optuna gave us for the "layers" parameter
  for _ in range(layers - 1):
    sequential_layers.append(nn.Linear(n, n))
    sequential_layers.append(nn.Tanh())

  # Output layer
  sequential_layers.append(nn.Linear(n, classes))

  # Now turn the list of layers into a PyTorch Sequential Module 
  fc_model = nn.Sequential(*sequential_layers)

  # What should our global learning rate be? Notice that we can ask for new hyper-parameters from optuna whenever we want
  eta_global = trial.suggest_loguniform("learning_rate", 1e-5, 1e-2)

  optimizer = torch.optim.AdamW(fc_model.parameters(), lr=eta_global)
  scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, epochs // 3)
  loss_func = nn.CrossEntropyLoss()

  results = train_network(fc_model, 
                          loss_func, 
                          train_loader,
                          epochs=10, 
                          test_loader=val_loader,
                          optimizer=optimizer,
                          lr_schedule=scheduler,
                          score_funcs={"Accuracy": accuracy_score},
                          device=device,
                          disable_tqdm=True)
  # A objective value linked with the Trial object
  return results["test Accuracy"].iloc[-1]

You must remember that this is a validation split and that we have not used
the test set. We should only use the test set after the hyperparameters have been found, to determine the overall accuracy.

Let's searches for the hyperparameters
for this problem.

In [18]:
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=10)

[32m[I 2022-08-10 04:21:10,931][0m A new study created in memory with name: no-name-4c2cbfe3-17f2-46a7-82e7-1a27f87bc069[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:22:17,208][0m Trial 0 finished with value: 0.8425 and parameters: {'neurons_per_layer': 243, 'hidden_layers': 1, 'learning_rate': 8.455916725638289e-05}. Best is trial 0 with value: 0.8425.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:23:27,877][0m Trial 1 finished with value: 0.8760833333333333 and parameters: {'neurons_per_layer': 27, 'hidden_layers': 1, 'learning_rate': 0.007075925000731979}. Best is trial 1 with value: 0.8760833333333333.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:24:30,367][0m Trial 2 finished with value: 0.8178333333333333 and parameters: {'neurons_per_layer': 24, 'hidden_layers': 2, 'learning_rate': 0.00016419895638856356}. Best is trial 1 with value: 0.8760833333333333.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:25:36,498][0m Trial 3 finished with value: 0.6339166666666667 and parameters: {'neurons_per_layer': 77, 'hidden_layers': 5, 'learning_rate': 2.00236678844851e-05}. Best is trial 1 with value: 0.8760833333333333.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:26:40,526][0m Trial 4 finished with value: 0.8831666666666667 and parameters: {'neurons_per_layer': 249, 'hidden_layers': 5, 'learning_rate': 0.0008535814178607436}. Best is trial 4 with value: 0.8831666666666667.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:27:46,410][0m Trial 5 finished with value: 0.8869166666666667 and parameters: {'neurons_per_layer': 214, 'hidden_layers': 4, 'learning_rate': 0.0019639627227200397}. Best is trial 5 with value: 0.8869166666666667.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:28:52,108][0m Trial 6 finished with value: 0.8868333333333334 and parameters: {'neurons_per_layer': 204, 'hidden_layers': 4, 'learning_rate': 0.0025987894108167276}. Best is trial 5 with value: 0.8869166666666667.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:30:00,430][0m Trial 7 finished with value: 0.87375 and parameters: {'neurons_per_layer': 168, 'hidden_layers': 6, 'learning_rate': 0.0004089247621863562}. Best is trial 5 with value: 0.8869166666666667.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:31:11,235][0m Trial 8 finished with value: 0.8620833333333333 and parameters: {'neurons_per_layer': 93, 'hidden_layers': 5, 'learning_rate': 0.0003426830186103634}. Best is trial 5 with value: 0.8869166666666667.[0m


Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

Training:   0%|          | 0/188 [00:00<?, ?it/s]

Testing:   0%|          | 0/47 [00:00<?, ?it/s]

[32m[I 2022-08-10 04:32:19,103][0m Trial 9 finished with value: 0.8775 and parameters: {'neurons_per_layer': 212, 'hidden_layers': 6, 'learning_rate': 0.0016746563111952157}. Best is trial 5 with value: 0.8869166666666667.[0m


In [19]:
print(study.best_params)

{'neurons_per_layer': 214, 'hidden_layers': 4, 'learning_rate': 0.0019639627227200397}


Let's look at the progress Optuna made
over time and other views of the optimization process. 

Doing so can help us build some
intuition about the range of “good” parameters.

In [20]:
fig = optuna.visualization.plot_optimization_history(study)
fig.show()

We might also want to get an idea of how each hyperparameter performs with respect
to the objective (accuracy). That can be done with a slice plot.

In [21]:
fig = optuna.visualization.plot_slice(study)
fig.show()

Optuna can also help you understand the interactions between hyperparameters. One
option is the `plot_contour()` function, which creates a grid showing how every combination
of two different hyperparameters impacts the results.

In [22]:
fig = optuna.visualization.plot_contour(study, params=["neurons_per_layer", "hidden_layers", "learning_rate"])
fig.show()

The other option is the
`plot_parallel_coordinate()` function, which shows all the results of every trial in one graph.

In [23]:
fig = optuna.visualization.plot_parallel_coordinate(study, params=["neurons_per_layer", "hidden_layers", "learning_rate"])
fig.show()

Now that we have trained our network, so we need to train a new model with this information to determine what final
validation accuracy you get on the true validation set.

##Pruning trials with Optuna