
# Hyperparameter Optimization with Optuna

Tutorial

### Jefferson Fialho Coelho


In [1]:
import optuna
from optuna.visualization import plot_contour
from optuna.visualization import plot_edf
from optuna.visualization import plot_intermediate_values
from optuna.visualization import plot_optimization_history
from optuna.visualization import plot_parallel_coordinate
from optuna.visualization import plot_param_importances
from optuna.visualization import plot_slice
from optuna.trial import TrialState
from optuna.importance import MeanDecreaseImpurityImportanceEvaluator

### Definitions:
* **Parameter**:
> A model parameter is a configuration variable that is **internal to the model** and whose value can be **estimated from data.**
  
  * E.g.:
    * The coefficients (or weights) of linear and logistic regression models.
    * Weights and biases of a nn
    * The cluster centroids in clustering

### Definitions:
  
* **Hiperparameter**:
> A model hyperparameter is a configuration that is **external to the model** and whose value **cannot be estimated from data.**
   
   * E.g.:
     * Learning rate
     * Dropout rate
     * Early stop
     * Neurons
     * Layers
     * alpha, beta, ...

# Hyperparameter tuning problem

## When we need to tune?

<div>
    <img src="img/just_runs.jpg" style="width:600px; margin:0 auto;"/>
</div>

# Hyperparameter tuning Complexity

<div>
    <img src="img/headache.jpg" style="width:375px; margin:0 auto;"/>
</div>

# Hyperparameter tuning Complexity

* A **NP-Hard** problem
* **High-dimensionality** problem
  * X hyperparameters creates X dimension to describe a objetive function/best combination
  


# Tuning techniques

<center><h1>Hyperparameter search space</h1></center>
<br>
A surface to be searched where:

<li><b>each dimension represents a hyperparameter</b> and </li>
<li><b>each point represents one model configuration</b> (hyperparameter set) </li>
<br><br>

<div>
    <img src="img/blinds.png" style="width:500px; margin:0 auto;"/>
</div>

<center><h1>Hand tuning by "trial and error"</h1></center>

<br>

<div>
    <img src="img/blinds720.gif" style="margin:0 auto;"/>
</div>

## **Hand tuning** by "trial and error"

* for hyperparameter_1:
   * for hyperparameter_2:
     * for hyperparameter_3:
       * for hyperparameter_4:
          * ...
          
* Computationally **expensive**
* **Curse of dimensionality**

<div>
    <img src="img/run.png" style="width:300px; margin:0 auto; padding-top: 50px;"/>
</div>


## Grid search

* Easy to implement (many apis E.g.: scikit)
* Works fine with few hyperparameters (low dimensionality)
* the number of experiments grows exponentially
<br><br><br>
<div>
    <img src="img/grid.png" style="width:400px; margin:0 auto;"/>
</div>

## Random search

* Easy to implement (many apis E.g.: scikit)
* Works fine with few hyperparameters (low dimensionality)
* the number of experiments grows exponentially
<br><br><br>
<div>
    <img src="img/random.png" style="width:400px; margin:0 auto;"/>
</div>

## Grid search x random search

<br>

<div>
    <img src="img/gridxrand.png" style="margin:0 auto;"/>
</div>

# Other optimization techniques

* **Bayesian Optimization**
  * A black-box estimator to approximate complex functions
* Successive Halving
  * hyper parameter configuration set is definied randomly
  * throw out the worst performing trials and only continue running the best performing trials, until a single hyper parameter configuration remains.
* Hyperband
  * a grid search over the optimal allocation strategy

# Optimization Tutorial

## Optuna: Optimization Framework
    
### Paper:
[**Optuna: A Next-generation Hyperparameter Optimization Framework. In KDD.**](https://arxiv.org/abs/1907.10902)

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta,and Masanori Koyama. 2019.

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

### Overview of Optuna’s system design
<img src="img/optuna_flow.png"/>

## Optuna Key features

* Eager search spaces using bayesian methods with pruning system
* Efficiently search large spaces (Using prune unpromising trials for faster results)
  * Asynchronous Successive Halving (ASHA)
    * combine random search with principled early stopping in an asynchronous way
* Easy to parallelize hyperparameter searches over multiple threads/processes
  * We just need to run the python script passing the DB path

# How to install?

```
!pip install optuna
```

# Default pipeline

* Create a wrapper of your main algorithm to be optimized that receives an optuna object called `trial`

```
def objective(trial):
    ...
```

* Inside your `objective` function, use `optuna` methods to define your hiperparameters range:
   * `optuna.trial.Trial.suggest_categorical()` for categorical parameters
   * `optuna.trial.Trial.suggest_int()` for integer parameters
   * `optuna.trial.Trial.suggest_float()` for floating point parameters

E.g.:
```
# Categorical parameter
optimizer = trial.suggest_categorical("optimizer", ["MomentumSGD", "Adam"])

# Integer parameter
num_layers = trial.suggest_int("num_layers", 1, 3)

# Integer parameter (log)
num_channels = trial.suggest_int("num_channels", 32, 512, log=True)

# Integer parameter (discretized)
num_units = trial.suggest_int("num_units", 10, 100, step=5)

# Floating point parameter
dropout_rate = trial.suggest_float("dropout_rate", 0.0, 1.0)
```

* The `objective` function needs to return an evaluation parameter to be `maximized` or `minimized`
   * E.g.:
      * `maximization` of accuracy
      * `minimization` of EMD

* To provide data for `pruning` function works, we need to send an intermediate evaluation parameter value to `optuna` object

```
...
trial.report(accuracy, epoch)

if trial.should_prune():
    raise optuna.exceptions.TrialPruned()
...
```

* Create a `study` object to call the `object` function:
   * set the study name (the name of database table)
   * set a database to store the experiment data
   * set `load_if_exists=True` if you want to use a database that you already created 
   * set the optimization direction (`maximize` or `minimize`)

In [2]:
study = optuna.create_study(
    study_name='opt_tutorial2',
    storage='sqlite:///tutorial.db',
    load_if_exists=True,
    direction="maximize")

[32m[I 2022-04-22 10:49:43,518][0m Using an existing study with name 'opt_tutorial2' instead of creating a new one.[0m


* Run the optimization loop:

```
study.optimize(objective, n_trials=50,timeout=None)
```

* To check the values after (or while) optimization loop:

In [3]:
trial = study.best_trial

print("Best trial:")
print("    Value: ", trial.value)

print("Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

Best trial:
    Value:  0.8296875
Params: 
    dropout_l0: 0.2083603371473296
    dropout_l1: 0.29475991600696116
    lr: 0.004143220502481869
    n_layers: 2
    n_units_l0: 119
    n_units_l1: 89
    optimizer: Adam


## ok, but...
<br>
<div>
    <img src="img/talk-is-cheap-show-me-the-code.jpg" style="width:500px; margin:0 auto;"/>
</div>

This tutorial will use the following notebooks to explain the implementation of the optimization study:

* `opt_tutorial_1.ipynb` (running model without optuna)
* `opt_tutorial_2.ipynb` (running the optimization study with optuna)
* `opt_tutorial_3.ipynb` (Data visualization)

<center><h1>After this tutorial...</h1></center>

<br>

<div>
    <img src="img/peter.gif" style="width:900px; margin:0 auto;"/>
</div>

# Tks!

**e-mail:** 
* jefferson.jesus@itau-unibanco.com.br
* jefferson@fialhocoelho.com.br


**tutorial repo:** [github.com/fialhocoelho/optune-tutorial](https://github.com/fialhocoelho/optune-tutorial)

## Useful links:

* [Why Is Random Search Better Than Grid Search For Machine Learning](https://analyticsindiamag.com/why-is-random-search-better-than-grid-search-for-machine-learning)
* [Hyper Parameter Tuning — A Tutorial](https://towardsdatascience.com/hyper-parameter-tuning-a-tutorial-70dc6c552c54)
* [Hyperparameter Optimization With Random Search and Grid Search](https://machinelearningmastery.com/hyperparameter-optimization-with-random-search-and-grid-search/)