# **Hyperparameter Tuning**

When using a machine learning model there are often set of hyperparamers to set carefully. In this tutorial, we will show how you can do hyperparameter tuning with [Orion](https://orion.readthedocs.io/en/stable/).

## *What is an hyperparameter?*
**Hyperparameters** are variables that need to be set before training a machine learning model. These parameters are responsible for governing the architecture of the model (architecture hyperparameters) or how the model is trained (optimization hyperparameters). For example, in a simple MLP, hyperparameters such as the learning rate, batch size, number of epochs, and number of neurons need to be set.

However, the gradient for these variables **cannot be computed**, and thus, they have to be set manually by the users. Setting hyperparameters can be a challenging and time-consuming task, which may also require expertise in the specific problem at hand. If the task is computationally intensive and resources are limited, manual hyperparameter tuning is often necessary. This practice is commonly known as "*Graduate student descent*" because graduate students often manually adjust hyperparameters to optimize the model's performance. While this approach can be tedious, it is often effective in finding good hyperparameter values.

If there are sufficient computational resources available, however, a more "scientific" approach can be taken by utilizing hyperparameter optimization techniques. There are many such techniques available in the literature, as this is an active area of research.

## Grid Search
The simplest one is called **grid search**. Grid search simply creates a "grid" of all possible hyperparameter combinations.  For example, suppose we have a neural network model and we want to optimize the number of hidden layers and the learning rate. We might define a grid of possible values for these hyperparameters as follows:

Number of hidden layers: [1, 2, 3, 4]
Learning rate: [0.001, 0.01, 0.1, 1.0]

This results in a grid of 16 possible combinations of hyperparameters (4 values for the number of hidden layers multiplied by 4 values for the learning rate). The grid search algorithm then trains the model using each of these hyperparameter combinations and selects the combination that gives the best performance on a validation set. Grid search is simple but computationally feasible for models with a small number of hyperparameters only (as the search space grows very fast when we have many hyperparameters)

If we have more hyperparameters, other optimization techniques such as random search or Bayesian optimization may be more effective.

## Random Search
**Random search simply** involves randomly sampling hyperparameters from a predefined range of possible values and selecting the best-performing combination of hyperparameters on the validation set.  This process is repeated a fixed number of times, or until a certain performance threshold is met. This solution can be more computationally efficient than grid search when the number of hyperparameters or the range of possible values is large.

However, it may still require a large number of trials to find good hyperparameters, and more advanced optimization techniques such as Bayesian optimization may be more effective in some cases.


## Bayesian Optimization
**Bayesian optimization** works by first defining a prior distribution over the hyperparameters, which captures our prior beliefs about them. The algorithm then uses the performance of the model on a validation set to update the prior distribution and construct a posterior distribution that reflects our updated beliefs about the hyperparameters.

The algorithm then uses the posterior distribution to choose the next set of hyperparameters to evaluate, based on a trade-off between exploitation (choosing hyperparameters that are expected to perform well based on the current model) and exploration (choosing hyperparameters that may be less well-known but have a high potential for good performance).

In this tutorial, we will use the Tree-structured Parzen Estimator (TPE). TPE is a variant of Bayesian optimization that aims to improve the efficiency of the optimization process by focusing on promising regions of the hyperparameter space.

There are several toolkits for hyperparameter tuning available. In this tutorial, we will use [Orion](https://orion.readthedocs.io/en/stable/).

## **Dataset**

In this tutorial, we will do hyperparameter tuning on digit classification using the MNIST dataset.


Let's first download the corpus:

In [1]:
!wget -O mnist_train.npz "https://www.dropbox.com/scl/fi/001mmmnzmjrpckhie5obi/mnist_train.npz?rlkey=ylas84sw57rrh8u5s5qgozdhd&dl=0"
!wget -O mnist_test.npz  "https://www.dropbox.com/scl/fi/u8nbdfx9v1w02j5k01f3f/mnist_test.npz?rlkey=5mjubm9xumzpisd0s4lrsfyqz&dl=0"

--2024-03-15 16:27:23--  https://www.dropbox.com/scl/fi/001mmmnzmjrpckhie5obi/mnist_train.npz?rlkey=ylas84sw57rrh8u5s5qgozdhd&dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.80.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.80.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://uc2aba123a6255e0829de46a24d6.dl.dropboxusercontent.com/cd/0/inline/CPIFgB3IyPLboT6ssfitMbnO_Ev6qGnKFArOEO6nf40DFmDolPKOj6Np2VjhOHfs42bHZBFcOcZpVsqFDtPma6izaLsiyoZ9tA68b-nM8CoA2N6SO1TMD_sGGFp86naPKNHr_vRxjtjKIa4gNi3wAn1U/file# [following]
--2024-03-15 16:27:24--  https://uc2aba123a6255e0829de46a24d6.dl.dropboxusercontent.com/cd/0/inline/CPIFgB3IyPLboT6ssfitMbnO_Ev6qGnKFArOEO6nf40DFmDolPKOj6Np2VjhOHfs42bHZBFcOcZpVsqFDtPma6izaLsiyoZ9tA68b-nM8CoA2N6SO1TMD_sGGFp86naPKNHr_vRxjtjKIa4gNi3wAn1U/file
Resolving uc2aba123a6255e0829de46a24d6.dl.dropboxusercontent.com (uc2aba123a6255e0829de46a24d6.dl.dropboxusercontent.com)... 162.125.7

To make the hyperparameter optimization computationally feasible, we will only use a small fraction of the MNIST datasets. Specifically, we will use:
- 1000 samples for training
- 1000 samples for validation

We can now install Orion:

In [2]:
%%capture
!pip install git+https://github.com/epistimio/orion.git@develop
!pip install orion[profet]

# **Model**

In this tutorial, we consider a simple MLP. We want to find proper values for the learning rate, batch size, number of epochs, and number of neurons.
For the sake of compactness, we will implement a model with scikit-learn. Keep in mind, however, that Orion is a **black-box optimizer** that can be used with any toolkit, including *PyTorch*, *TensorFlow*, *Keras*, etc.

Let's write a simple script for training the MLP with the MNIST data:

side note: black-box : you give an input and just get your output without knowing how it works


In [None]:
%%file train.py

import argparse
import numpy as np
import sklearn
import sklearn.preprocessing
import sklearn.neural_network
from orion.client import report_objective # Orion

def train():
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batchsize', type=int, default=64,
                        help='input batch size for training (default: 64)')
    parser.add_argument('--epochs', type=int, default=10,
                        help='number of epochs to train (default: 14)')
    parser.add_argument('--lr', type=float, default=0.01,
                        help='learning rate (default: 1.0)')
    parser.add_argument('--neurons', type=int, default=100,
                        help='number of neurons (default: 100)')
    parser.add_argument('--eval', type=bool, default=False,
                        help='If True it prints the test error (default: False)')
    parser.add_argument("-f", required=False)
    args = parser.parse_args()

    with np.load("mnist_train.npz") as data:
        X_trn = data['X']
        y_trn = data['y']

    with np.load("mnist_test.npz") as data:
        X_tst = data['X']
        y_tst = data['y']

    # Select data
    X_valid = X_trn[1000:]
    y_valid = y_trn[1000:]

    X_trn = X_trn[0:1000]
    y_trn = y_trn[0:1000]

    # Data normalization
    scaler = sklearn.preprocessing.StandardScaler()
    scaler.fit(X_trn)
    X_trn = scaler.transform(X_trn)
    X_valid = scaler.transform(X_valid)


    # MLP classifier
    mlp = sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(args.neurons),
                                              solver='sgd', batch_size=args.batchsize, max_iter=args.epochs,
                                              learning_rate_init=args.lr, momentum=0.9,
                                              verbose=False, random_state=0)
    # Training
    mlp.fit(X_trn, y_trn);

    valid_error = 100*(1 - mlp.score(X_valid, y_valid))
    print("Valid Error (\%): " + str(valid_error))

    report_objective(valid_error)

    if args.eval:
      test_error = 100*(1 - mlp.score(X_tst, y_tst))
      print("Test Error (\%): " + str(test_error))


if __name__ == '__main__':
    train()

Writing train.py


Let's see if the training script runs with a given set of hyperparameters:

In [None]:
!python train.py --lr=0.1 --epochs=25 --batchsize=128 --neurons=20 --eval='True'

Valid Error (\%): 15.303389830508474
[{'name': 'objective', 'type': 'objective', 'value': 15.303389830508474}]
Test Error (\%): 36.33


The training script reports the validation and test errors. In this case, they are both pretty high because we are only using a small fraction of the MNIST corpus.


##  **Random Search**
Running a random search with Orion is very simple. You just need to specify the prior distribution of the targeted hyperparameters. See the [Orion documentation](https://orion.readthedocs.io/en/stable/user/searchspace.html#) for more information about the search space. As a first example, we are only optimizing the learning rate, keeping the other hyperparameters fixed:


In [None]:
!orion hunt -n orion-tutorial1 --exp-max-trials=50 python train.py --lr~'loguniform(1e-4, 0.1)' --epochs=25 --batchsize=128 --neurons=20

Valid Error (\%): 77.72203389830509
Valid Error (\%): 15.210169491525427
Valid Error (\%): 15.46101694915254
Valid Error (\%): 17.532203389830503
Valid Error (\%): 18.569491525423732
Valid Error (\%): 19.401694915254243
Valid Error (\%): 15.188135593220341
Valid Error (\%): 27.908474576271193
Valid Error (\%): 15.301694915254238
Valid Error (\%): 40.74067796610169
Valid Error (\%): 17.91016949152542
Valid Error (\%): 15.8728813559322
Valid Error (\%): 16.003389830508475
Valid Error (\%): 16.103389830508476
Valid Error (\%): 23.2271186440678
Valid Error (\%): 27.50847457627119
Valid Error (\%): 15.471186440677965
Valid Error (\%): 22.391525423728808
Valid Error (\%): 28.423728813559322
Valid Error (\%): 73.47457627118644
Valid Error (\%): 16.11694915254237
Valid Error (\%): 76.35084745762713
Valid Error (\%): 17.874576271186438
Valid Error (\%): 16.300000000000004
Valid Error (\%): 41.8864406779661
Valid Error (\%): 41.21694915254237
Valid Error (\%): 17.708474576271183
Valid Error (\%)

We can run *orion-info** to check the best set of hyperparameters discovered:

In [None]:
!orion info --name orion-tutorial1 --version 1


Identification
name: orion-tutorial1
version: 1
user: root


Commandline
python train.py --lr~loguniform(1e-4, 0.1) --epochs=25 --batchsize=128 --neurons=20


Config
max trials: 50
max broken: 3
working dir: 


Algorithm
random:
    seed: None


Space
=====
/lr: loguniform(0.0001, 0.1)


Meta-data
user: root
datetime: 2024-03-10 18:17:16.603332
orion version: 0.2.6.post343+g1b20511c
VCS:



Parent experiment
root:
parent:
adapter:


Stats
=====
completed: True
trials completed: 50
best trial:
  id: 4d03b49ede70e47e7a3d8398d93ab045
  evaluation: 15.188135593220341
  params:
    /lr: 0.08052
start time: 2024-03-10 18:17:16.603332
finish time: 2024-03-10 18:21:16.151727
elapsed_time: 0:03:58.976015




Now, we can train the model with the best hyperparameters and check the performance on the test set.

In [None]:
!python train.py --lr=0.08052 --epochs=25 --batchsize=128 --neurons=20 --eval='True'

Valid Error (\%): 15.188135593220341
[{'name': 'objective', 'type': 'objective', 'value': 15.188135593220341}]
Test Error (\%): 35.260000000000005


Normally, we run the hyperparameter tuning with more than one hyperparameters.Beyond the learning rate, we now consider the batch size, the number of epochs, and the number on neurons:

In [None]:
!orion hunt -n orion-tutorial2 --exp-max-trials=50 python train.py  --lr~'loguniform(1e-4, 0.1)' --neurons~'uniform(10, 100, discrete=True)' --batchsize~'choices([32,64,128])' --epochs~'uniform(15, 50, discrete=True)'

Valid Error (\%): 38.657627118644065
Valid Error (\%): 14.537288135593219
Valid Error (\%): 28.89322033898305
Valid Error (\%): 14.079661016949153
Valid Error (\%): 40.7728813559322
Valid Error (\%): 14.613559322033897
Valid Error (\%): 13.559322033898303
Valid Error (\%): 14.406779661016945
Valid Error (\%): 13.450847457627123
Valid Error (\%): 13.681355932203388
Valid Error (\%): 14.71694915254237
Valid Error (\%): 13.610169491525426
Valid Error (\%): 16.603389830508476
Valid Error (\%): 13.869491525423728
Valid Error (\%): 13.494915254237283
Valid Error (\%): 13.84406779661017
Valid Error (\%): 14.881355932203387
Valid Error (\%): 12.952542372881359
Valid Error (\%): 14.910169491525426
Valid Error (\%): 42.2677966101695
Valid Error (\%): 14.493220338983049
Valid Error (\%): 25.281355932203386
Valid Error (\%): 14.328813559322029
Valid Error (\%): 50.3457627118644
Valid Error (\%): 13.962711864406785
Valid Error (\%): 61.80847457627119
Valid Error (\%): 28.5728813559322
Valid Error (

In [None]:
!orion info --name orion-tutorial2 --version 1


Identification
name: orion-tutorial2
version: 1
user: root


Commandline
python train.py --lr~loguniform(1e-4, 0.1) --neurons~uniform(10, 100, discrete=True) --batchsize~choices([32,64,128]) --epochs~uniform(15, 50, discrete=True)


Config
max trials: 50
max broken: 3
working dir: 


Algorithm
random:
    seed: None


Space
=====
/batchsize: choices([32, 64, 128])
/epochs: uniform(15, 50, discrete=True)
/lr: loguniform(0.0001, 0.1)
/neurons: uniform(10, 100, discrete=True)


Meta-data
user: root
datetime: 2024-03-10 18:23:02.227110
orion version: 0.2.6.post343+g1b20511c
VCS:



Parent experiment
root:
parent:
adapter:


Stats
=====
completed: True
trials completed: 50
best trial:
  id: e4752341fe119ceeec0e968e017d6f72
  evaluation: 12.952542372881359
  params:
    /batchsize: 64
    /epochs: 41
    /lr: 0.03272
    /neurons: 100
start time: 2024-03-10 18:23:02.227110
finish time: 2024-03-10 18:28:01.379445
elapsed_time: 0:04:58.367799




In [None]:
!python train.py --lr=0.03272 --epochs=41 --batchsize=64 --neurons=100 --eval='True'

Valid Error (\%): 12.952542372881359
[{'name': 'objective', 'type': 'objective', 'value': 12.952542372881359}]
Test Error (\%): 35.129999999999995


Often, you can try a good set of hyperparameters with a random search. However, when you increase the number of hyperparameters you need to increase the number of experiments as well. This makes it hard to use random search with a large number of hyperparameters to set.

# Tree-Structured Parzen Estimator (TPE)

An alternative to random search is the TPE algorithm. If you want to use it, you need to create a config file with the hyperparameters that govern how TPE works.

In [None]:
%%file tpe_config.cfg
experiment:
    algorithms:
        tpe:
            seed: null
            n_initial_points: 10
            n_ei_candidates: 10
            gamma: 0.15

Writing tpe_config.cfg


n_ei_candidates and gamma manage the trade-off between exploration and exploitation.

You can find more info on the role of each hyperparameter to set [here](https://orion.readthedocs.io/en/stable/user/algorithms.html#tpe).

Let's run the hparam search with TPE:

In [None]:
!orion hunt --config tpe_config.cfg -n orion-tutorial3 --exp-max-trials=50 python train.py --lr~'loguniform(1e-4, 0.1)' --neurons~'uniform(10, 100, discrete=True)' --batchsize~'choices([32,64,128])' --epochs~'uniform(15, 50, discrete=True)'

Valid Error (\%): 15.164406779661022
Valid Error (\%): 23.296610169491526
Valid Error (\%): 25.76440677966102
Valid Error (\%): 18.369491525423733
Valid Error (\%): 15.774576271186437
Valid Error (\%): 23.37627118644068
Valid Error (\%): 14.2406779661017
Valid Error (\%): 14.457627118644067
Valid Error (\%): 19.88305084745763
Valid Error (\%): 13.372881355932197
Valid Error (\%): 13.959322033898303
Valid Error (\%): 14.822033898305087
Valid Error (\%): 13.335593220338982
Valid Error (\%): 13.711864406779661
Valid Error (\%): 15.883050847457625
Valid Error (\%): 15.064406779661022
Valid Error (\%): 13.637288135593217
Valid Error (\%): 18.13728813559322
Valid Error (\%): 13.85254237288136
Valid Error (\%): 14.720338983050851
Valid Error (\%): 13.710169491525425
Valid Error (\%): 13.550847457627114
Valid Error (\%): 13.435593220338982
Valid Error (\%): 13.903389830508473
Valid Error (\%): 13.95254237288136
Valid Error (\%): 13.459322033898303
Valid Error (\%): 29.67457627118644
Valid Erro

In [None]:
!orion info --name orion-tutorial3 --version 1


Identification
name: orion-tutorial3
version: 1
user: root


Commandline
python train.py --lr~loguniform(1e-4, 0.1) --neurons~uniform(10, 100, discrete=True) --batchsize~choices([32,64,128]) --epochs~uniform(15, 50, discrete=True)


Config
max trials: 50
max broken: 3
working dir: 


Algorithm
tpe:
    equal_weight: False
    full_weight_num: 25
    gamma: 0.15
    max_retry: 100
    n_ei_candidates: 10
    n_initial_points: 10
    parallel_strategy:
        of_type: StatusBasedParallelStrategy
        strategy_configs:
            broken:
                of_type: MaxParallelStrategy
    prior_weight: 1.0
    seed: None


Space
=====
/batchsize: choices([32, 64, 128])
/epochs: uniform(15, 50, discrete=True)
/lr: loguniform(0.0001, 0.1)
/neurons: uniform(10, 100, discrete=True)


Meta-data
user: root
datetime: 2024-03-10 18:29:10.234556
orion version: 0.2.6.post343+g1b20511c
VCS:



Parent experiment
root:
parent:
adapter:


Stats
=====
completed: True
trials completed: 50
best trial:
 

In [None]:
!python train.py --lr=0.03268 --epochs=20 --batchsize=64 --neurons=73 --eval='True'

Valid Error (\%): 12.89661016949153
[{'name': 'objective', 'type': 'objective', 'value': 12.89661016949153}]
Test Error (\%): 29.330000000000002


Note that not always TPE produces results better than random optimization. In this specific task, due to the little data used for training and validation, there is a lot of "noise" in the model training and evaluation. In this scenario, TPE can perform worse  or better than random search based on the initialization seed.