Usually, Optuna is used to optimize hyper-parameters, but as an example, let us directly optimize a quadratic function in an IPython shell.

In [1]:
import optuna

The objective function is what will be optimized.

In [177]:
def a(x):
    return x
def b(x):
    return x
def objective(trial):
    x = trial.suggest_float('x',0,1)
    y = trial.suggest_float('y',0,1)
    c = trial.suggest_categorical('z',[optim.RMSprop,optim.LBFGS])
    return (x-2 + y) ** 2

This function returns the value of (x−2)2. Our goal is to find the value of x that minimizes the output of the objective function. This is the “optimization.” During the optimization, Optuna repeatedly calls and evaluates the objective function with different values of x.

A Trial object corresponds to a single execution of the objective function and is internally instantiated upon each invocation of the function.

The suggest APIs (for example, suggest_float()) are called inside the objective function to obtain parameters for a trial. suggest_float() selects parameters uniformly within the range provided. In our example, from −10 to 10.

To start the optimization, we create a study object and pass the objective function to method optimize() as follows.

In [178]:
study = optuna.create_study()
study.optimize(objective, n_trials=4)

[32m[I 2020-11-04 01:50:42,159][0m A new study created in memory with name: no-name-90a54c37-cf22-4e86-9c0f-7541fda27a44[0m
[33m[W 2020-11-04 01:50:42,162][0m Trial 0 failed because of the following error: TypeError('optimizer can only optimize Tensors, but one of the params is int')
Traceback (most recent call last):
  File "C:\Users\Asus\AppData\Roaming\Python\Python37\site-packages\optuna\study.py", line 799, in _run_trial
    result = func(trial)
  File "<ipython-input-177-16a6526e7022>", line 9, in objective
    cc = c(params=[1,3])
  File "C:\Users\Asus\Miniconda3\lib\site-packages\torch\optim\lbfgs.py", line 233, in __init__
    super(LBFGS, self).__init__(params, defaults)
  File "C:\Users\Asus\Miniconda3\lib\site-packages\torch\optim\optimizer.py", line 51, in __init__
    self.add_param_group(param_group)
  File "C:\Users\Asus\Miniconda3\lib\site-packages\torch\optim\optimizer.py", line 200, in add_param_group
    "but one of the params is " + torch.typename(param))
Type

TypeError: optimizer can only optimize Tensors, but one of the params is int

In [161]:
print(study.best_params)

{'x': 0.8167804319008877, 'y': 0.8045517494297751, 'z': <function a at 0x0000029647E5D318>}


In [119]:
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=4)

[32m[I 2020-11-03 23:33:34,556][0m A new study created in memory with name: no-name-05e4a3b0-4c31-4377-b560-7dc05294ca29[0m
[32m[I 2020-11-03 23:33:34,559][0m Trial 0 finished with value: 0.4224620415636302 and parameters: {'x': 0.8281157896717252, 'y': 0.5219134097813346}. Best is trial 0 with value: 0.4224620415636302.[0m
[32m[I 2020-11-03 23:33:34,562][0m Trial 1 finished with value: 0.8536605997062827 and parameters: {'x': 0.8143293286981494, 'y': 0.26173311945850075}. Best is trial 1 with value: 0.8536605997062827.[0m
[32m[I 2020-11-03 23:33:34,563][0m Trial 2 finished with value: 2.3746851389598636 and parameters: {'x': 0.164025351071456, 'y': 0.2949733059830345}. Best is trial 2 with value: 2.3746851389598636.[0m
[32m[I 2020-11-03 23:33:34,565][0m Trial 3 finished with value: 2.5604380700680793 and parameters: {'x': 0.041232570923020795, 'y': 0.358630538036691}. Best is trial 3 with value: 2.5604380700680793.[0m


In [120]:
print(study.best_params)

{'x': 0.041232570923020795, 'y': 0.358630538036691}


When used to search for hyper-parameters in machine learning, usually the objective function would return the loss or accuracy of the model.

Let us clarify the terminology in Optuna as follows:

Trial: A single call of the objective function

Study: An optimization session, which is a set of trials

Parameter: A variable whose value is to be optimized, such as x in the above example

In Optuna, we use the study object to manage optimization. Method create_study() returns a study object. A study object has useful properties for analyzing the optimization outcome.

To get the best parameter:

In [28]:
#To get the best value
study.best_value

0.470793735864454

In [29]:
#Best params
study.best_params

{'x': 0.6112550829175928, 'y': 0.7026008098684388}

In [30]:
#Best trial
study.best_trial

FrozenTrial(number=1, value=0.470793735864454, datetime_start=datetime.datetime(2020, 11, 3, 21, 18, 25, 744039), datetime_complete=datetime.datetime(2020, 11, 3, 21, 18, 25, 745036), params={'x': 0.6112550829175928, 'y': 0.7026008098684388}, distributions={'x': UniformDistribution(high=1, low=0), 'y': UniformDistribution(high=1, low=0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=1, state=TrialState.COMPLETE)

In [31]:
#all trials 
study.trials

[FrozenTrial(number=0, value=1.5086828166275867, datetime_start=datetime.datetime(2020, 11, 3, 21, 18, 25, 743040), datetime_complete=datetime.datetime(2020, 11, 3, 21, 18, 25, 743040), params={'x': 0.08653221865993654, 'y': 0.6851832798390528}, distributions={'x': UniformDistribution(high=1, low=0), 'y': UniformDistribution(high=1, low=0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=0, state=TrialState.COMPLETE),
 FrozenTrial(number=1, value=0.470793735864454, datetime_start=datetime.datetime(2020, 11, 3, 21, 18, 25, 744039), datetime_complete=datetime.datetime(2020, 11, 3, 21, 18, 25, 745036), params={'x': 0.6112550829175928, 'y': 0.7026008098684388}, distributions={'x': UniformDistribution(high=1, low=0), 'y': UniformDistribution(high=1, low=0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=1, state=TrialState.COMPLETE),
 FrozenTrial(number=2, value=1.6956906104507283, datetime_start=datetime.datetime(2020, 11, 3, 21, 18, 25, 747031), dateti

By executing optimize() again, we can continue the optimization.

In [32]:
study.optimize(objective,n_trials=10)

[32m[I 2020-11-03 21:22:17,820][0m Trial 4 finished with value: 1.4578516550605316 and parameters: {'x': 0.16282839179581543, 'y': 0.6297563295777577}. Best is trial 1 with value: 0.470793735864454.[0m
[32m[I 2020-11-03 21:22:17,822][0m Trial 5 finished with value: 1.1627288295720606 and parameters: {'x': 0.918673522844393, 'y': 0.003027432561733434}. Best is trial 1 with value: 0.470793735864454.[0m
[32m[I 2020-11-03 21:22:17,824][0m Trial 6 finished with value: 0.7954999795411859 and parameters: {'x': 0.7695088836321555, 'y': 0.33858306082557843}. Best is trial 1 with value: 0.470793735864454.[0m
[32m[I 2020-11-03 21:22:17,825][0m Trial 7 finished with value: 0.2784199543587225 and parameters: {'x': 0.6730877999066328, 'y': 0.7992570529008828}. Best is trial 7 with value: 0.2784199543587225.[0m
[32m[I 2020-11-03 21:22:17,827][0m Trial 8 finished with value: 2.0740206536428203 and parameters: {'x': 0.03138077167792663, 'y': 0.5284731754361793}. Best is trial 7 with value

In [33]:
study.best_value

0.0005850131815088767

## Try Optuna with deep learning

In [56]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms


def get_mnist_loaders(train_batch_size,test_batch_size):
    train_loader = torch.utils.data.DataLoader(datasets.MNIST('/files/',
                                              train=True,download=True,
                                              transform=transforms.Compose([
                                                  transforms.ToTensor(),
                                                  transforms.Normalize((0.1307,), (0.3081,))
                                              ])),
                        batch_size= train_batch_size, shuffle=True)
    test_loader = torch.utils.data.DataLoader(datasets.MNIST('/files/', train=False, transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=test_batch_size, shuffle=True)
    return train_loader, test_loader
    

In [86]:
train_loader,test_loader = get_mnist_loaders(20,20)

In [68]:
tr = data, target 
data.shape

torch.Size([20, 1, 28, 28])

In [125]:
#Define NN class
class Net(nn.Module):
    def __init__(self,activation):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        self.activation = activation 

    def forward(self, x):
        x = self.activation(self.conv1(x))
        x = self.conv2(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.activation(self.fc1(x))
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

In [100]:
device = torch.device("cuda" if torch.cuda.is_available else "cpu")
model = Net(F.relu).to(device)
print(model)

Net(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (dropout1): Dropout2d(p=0.25, inplace=False)
  (dropout2): Dropout2d(p=0.5, inplace=False)
  (fc1): Linear(in_features=9216, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)


In [113]:
def train(log_interval, model, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data.to(device))
        loss = F.nll_loss(output, target.to(device))
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))           

def test(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data.to(device))
            test_loss += F.nll_loss(output, target.to(device), reduction='sum').item() 
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.to(device).view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_accuracy = 100. * correct / len(test_loader.dataset)
    
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    return test_accuracy

In [182]:
from PIL import ImageFile
import numpy as np
ImageFile.LOAD_TRUNCATED_IMAGES = True


def train_mnist(trial):
    cfg = { 'device' : "cuda" if torch.cuda.is_available() else "cpu",
          'train_batch_size' : 64,
          'test_batch_size' : 1000,
          'n_epochs' : 1,
          'seed' : 0,
          'log_interval' : 100,
          'save_model' : False,
          'lr' : 0.001,
          'momentum': 0.5,
          'optimizer': optim.SGD,
          'activation': F.relu}
    optimizer_value = trial.suggest_categorical('optimizer',[optim.SGD, optim.RMSprop])
    torch.manual_seed(cfg['seed'])
    train_loader, test_loader = get_mnist_loaders(cfg['train_batch_size'], cfg['test_batch_size'])
    model = Net(cfg['activation'],trial).to(device)
    optimizer = optimizer_value(model.parameters(), lr=cfg['lr'])
    for epoch in range(1, cfg['n_epochs'] + 1):
        train(cfg['log_interval'], model, train_loader, optimizer, epoch)
        test_accuracy = test(model, test_loader)

    if cfg['save_model']:
        torch.save(model.state_dict(), "mnist_cnn.pt")
      
    return test_accuracy

In [141]:
#train_mnist()

# Enhancing the classifier with Optuna

In [183]:
#Define NN class with optimzer

class Net(nn.Module):
    def __init__(self,activation,trial):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        #Try different values
        dropout_rate_1 = trial.suggest_float("dropout_rate_1", 0, 1)
        self.dropout1 = nn.Dropout2d(dropout_rate_1)
        #Try different values
        dropout_rate_2 = trial.suggest_float("dropout_rate_2", 0, 1)
        self.dropout2 = nn.Dropout2d(dropout_rate_2)
        #Try different values
        fc2_input_dim = trial.suggest_int("fc2_input_dim", 40, 80)
        self.fc1 = nn.Linear(9216, fc2_input_dim)
        self.fc2 = nn.Linear(fc2_input_dim, 10)
        self.activation = activation 

    def forward(self, x):
        x = self.activation(self.conv1(x))
        x = self.conv2(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.activation(self.fc1(x))
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

In [184]:
def objective(trial):
    # Create a convolutional neural network.
    return train_mnist(trial)

In [185]:
# 3. Create a study object and optimize the objective function.
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=3)
print("Number of finished trials: {}".format(len(study.trials)))
print("Best trial:")
trial = study.best_trial
print("  Value: {}".format(trial.value))
print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

[32m[I 2020-11-04 01:53:40,279][0m A new study created in memory with name: no-name-1288d177-5007-4148-a7f7-5a114a72ad2c[0m




[32m[I 2020-11-04 01:54:00,487][0m Trial 0 finished with value: 86.02 and parameters: {'optimizer': <class 'torch.optim.sgd.SGD'>, 'dropout_rate_1': 0.5575685784855745, 'dropout_rate_2': 0.32591264617428684, 'fc2_input_dim': 46}. Best is trial 0 with value: 86.02.[0m



Test set: Average loss: 0.6287, Accuracy: 8602/10000 (86%)



[32m[I 2020-11-04 01:54:16,335][0m Trial 1 finished with value: 97.91 and parameters: {'optimizer': <class 'torch.optim.rmsprop.RMSprop'>, 'dropout_rate_1': 0.5806917792270129, 'dropout_rate_2': 0.20284483158536126, 'fc2_input_dim': 67}. Best is trial 1 with value: 97.91.[0m



Test set: Average loss: 0.0665, Accuracy: 9791/10000 (98%)



[32m[I 2020-11-04 01:54:30,402][0m Trial 2 finished with value: 86.32 and parameters: {'optimizer': <class 'torch.optim.sgd.SGD'>, 'dropout_rate_1': 0.23426539706779692, 'dropout_rate_2': 0.2564793260883751, 'fc2_input_dim': 66}. Best is trial 1 with value: 97.91.[0m



Test set: Average loss: 0.5741, Accuracy: 8632/10000 (86%)

Number of finished trials: 3
Best trial:
  Value: 97.91
  Params: 
    optimizer: <class 'torch.optim.rmsprop.RMSprop'>
    dropout_rate_1: 0.5806917792270129
    dropout_rate_2: 0.20284483158536126
    fc2_input_dim: 67


In [138]:
study.best_params

{'dropout_rate_1': 0.6462543659366137,
 'dropout_rate_2': 0.1099762086400825,
 'fc2_input_dim': 42}