# Hyperparameter Optimization in PyTorch Lightning with Optuna

In notebook 4 of this exercise, you've learned how to develop and train models with PyTorch Lightning.

In this optional notebook, we'll show you how you can perform hyperparameter optimization in PyTorch Lightning using the  framework *Optuna*.



In [20]:
import shutil
#!pip install pytorch-lightning==0.7.6 > /dev/null
import pytorch_lightning as pl
import torch
from exercise_code.MyPytorchModel import MyPytorchModel
from exercise_code.Util import save_model, load_model, test_and_save

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


<img src=https://raw.githubusercontent.com/optuna/optuna/master/docs/image/optuna-logo.png></img>

Optuna is an automatic hyperparameter optimization framework, which works with several deep learning libraries, including PyTorch Lightning. Have a look at https://github.com/optuna/optuna!

Two important concepts of Optuna are the terms **Study** and **Trial**:
* **Study**: optimization based on an objective function (e.g.: maximize validation accuracy)
* **Trial**: a single execution of the objective function (i.e., train a model for a specific hyperparameter configuration)

The goal of a study is to find out the optimal set of hyperparameter values through multiple trials (e.g., n_trials=100). Optuna is a framework designed for the automation and the acceleration of the optimization studies.

On a high level, hyper parameter tuning with Optuna works very similar to the search algorithms we implemented in exercise 6. However, Optuna has a set of advantages:

* **Parallelization** of hyperparameter searches over multiple threads or processes without modifying code
* more **efficient search algorithms** for large search spaces and **pruning of unpromising trials** for faster results
* many additional features for automated search of optimal hyperparameters

### Install & Import Optuna

If you haven't done yet, install Optuna via pip by uncommenting the line in the cell below: `!pip install optuna`.

If you want to install Optuna with conda or you have problems with the installation via pip, you can also use `$ conda install -c conda-forge optuna`.

In [21]:
!pip install optuna
import optuna
from optuna.integration import PyTorchLightningPruningCallback

Collecting optuna
  Using cached optuna-1.5.0.tar.gz (200 kB)
Collecting alembic
  Using cached alembic-1.4.2.tar.gz (1.1 MB)
  Installing build dependencies ... [?25lerror
[31m  ERROR: Command errored out with exit status 1:
   command: /home/kamranisg/Desktop/i2dl_exercises/.venv/bin/python3.7 /home/kamranisg/Desktop/i2dl_exercises/.venv/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-b348k5fj/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel
       cwd: None
  Complete output (14 lines):
  Traceback (most recent call last):
    File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
      "__main__", mod_spec)
    File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
      exec(code, run_globals)
    File "/home/kamranisg/Desktop/i2dl_exercises/.venv/lib/python3.7/site-packages/pip/__main__.py", line 23, in <module>
      from pip._inte

ModuleNotFoundError: No module named 'optuna'

### Logger

The default logger in PyTorch Lightning automatically writes to event files to be consumed by TensorBoard. 

Here we just set up a simple callback, that keeps the metrics from each validation step in memory, which we can then use to find the best model.

In [6]:
from pytorch_lightning import Callback

class MetricsCallback(Callback):
    """PyTorch Lightning metric callback."""
    def __init__(self):
        super().__init__()
        self.metrics = []

    def on_validation_end(self, trainer, pl_module):
        self.metrics.append(trainer.callback_metrics)

### Objective

As mentioned above, the goal of our hyperparameter tuning is, to optimize an objective function by finding the best set of hyperparameters. 
(
Optuna is a black-box optimizer, which means we need to provide this objective function (`objective(trial)`), which gets passed a `trial` object and returns a numerical value to evaluate the performance of the current hyperparameters and where to sample in the upcoming trial.

In our case, the metric we want to optimize is the validation accuracy of our models. To get the validation accuracy for a given hyperparameter configuration, we do the same thing as in the previous notebook:

* define a PL-trainer
* define hyper parameters - by sampling them from the provided hyperparameter ranges, similar as in our old random search implementation
* initialize a model with these hyperparameters
* train that model, and report its validation accuracy as our objective function value

As you can see in the code cell below, the implementation is straight forward:

PS: notice the different sampling modes, e.g. `trial.suggest_int`, `trial.suggest_loguniform` which are also similar to our previous implementation! Check out the documentation at https://optuna.readthedocs.io/en/latest/reference/trial.html to find out about further sampling modes, and **feel free to add additional hyperparameters!**

In [18]:
def objective(trial):
    # as explained above, we'll use this callback to collect the validation accuracies
    metrics_callback = MetricsCallback()  
    
    # create a trainer
    trainer = pl.Trainer(
        #train_percent_check=1.0,
        #val_percent_check=1.0,
        logger=False,                                                                  # deactivate PL logging
        max_epochs=1,                                                                  # epochs
        gpus=0 if torch.cuda.is_available() else None,                                 # #gpus
        callbacks=[metrics_callback],                                                  # save latest accuracy
        early_stop_callback=PyTorchLightningPruningCallback(trial, monitor="val_acc"), # early stopping
    )

    # here we sample the hyper params, similar as in our old random search
    trial_hparams = {"n_hidden": trial.suggest_int("n_hidden", 100, 200), 
                     "batch_size": 64, 
                     "learning_rate": trial.suggest_loguniform("learning_rate", 1e-6, 1e-1)}
    
    # create model from these hyper params and train it
    model = MyPytorchModel(trial_hparams)
    model.prepare_data()
    trainer.fit(model)

    # save model
    save_model(model, '{}.p'.format(trial.number), "checkpoints")

    # return validation accuracy from latest model, as that's what we want to minimize by our hyper param search
    return metrics_callback.metrics[-1]["val_acc"]

An important difference to our previous search implementation is, that Optuna does not sample randomly! 

It uses a **Tree-structured Parzen Estimator (TPE)** sampler, which is a kind of bayesian optimization, and more efficient than a pure random search, because it chooses the hyperparameter values after evaluating all previous trials to make a smart guess where the best hyperparameters can be found.

### Run the Search

After we've defined our objective function, we can start the search.

For that, we can also provide a **pruner**, which is a very useful concept supported by Optuna: Similar to *early stopping*, a pruner automatically stops unpromising trials in early stages of training (a.k.a. automated early-stopping) and therefore can significantly speed up the search.

If you want to specify a pruner, have a look at the available options at https://optuna.readthedocs.io/en/latest/reference/pruners.html. 

Then, we create our `study`, define the direction that we want to optimize (i.e., *maximize* the validation accuracy) and start the optimization.

In [19]:
pruner = optuna.pruners.NopPruner()
study = optuna.create_study(direction="maximize", pruner=pruner)
study.optimize(objective, n_trials=2, timeout=600)

NameError: name 'optuna' is not defined

Notice that we save every model in a directory called `./checkpoints`.

### Check the Results

After we've finished the search, we access the best trial at `study.best_trial`: 

In [None]:
print("Number of finished trials: {}".format(len(study.trials)))

print("Best trial:")
best_trial = study.best_trial

print("  Value: {}".format(best_trial.value))

print("  Params: ")
for key, value in best_trial.params.items():
    print("    {}: {}".format(key, value))

### Saving your best model


When you're done with your hyperparameter search and have achieved at least 50% validation accuracy, you can save your best model to the `./models`-directory in order to submit the model.

Before that, we will check again whether the number of parameters is below 5 Mio and the file size is below 20 MB.

When your final model is saved, we'll lastly report the test accuracy.

In [None]:
best_model = load_model('checkpoints/{}.p'.format(best_trial.number))
best_model.prepare_data()
test_and_save(best_model)

### Remove checkpoints directory
Lastly, let's remove the checkpoints directory again to clear up space.

In [None]:
shutil.rmtree("checkpoints")

### References

Here we provided just a quick overview about hyperparameter optimization using Optuna. Optuna has a lot more to offer, so definitely make sure to check out the following resources:

* Source Code: https://github.com/optuna/optuna
* Docs: https://optuna.readthedocs.io/en/stable/
* Website: https://optuna.org
* Paper: https://arxiv.org/pdf/1907.10902.pdf