In [None]:
! nvidia-smi

In [None]:
! pip install -q pytorch-lightning==1.4.9
! pip install -q "lightning-flash[text]==0.5.2"
! pip install -qU pandas

# How do you pick the right set of Hyperparameters for a Machine Learning project ?

## An introduction to Hyperparameter Search using Lightning-Flash with Optuna

Machine Learning involves modelling which is controlled by many parameters. The parameters that control the learning process of the model as known as **Hyperparameters**.

To obtain a well performing model, we need to select the right set of parameters and this is not a very easy task and it scales as the depth of your modelling idea increases.

This issue of obtaining the right set of parameters is solved the **Hyperparameter search** or **Hyperparameter optimization**.

In this post we will be looking at using **Optuna**, a very widely known automatic hyperparameter optimization framework for machine learning, along with **Lightning Flash**, your go to PyTorch AI factory for any machine learning task, to learn how to perform hyperpamater search.

In [None]:
import gc
import os

import pandas as pd
import optuna
import torch
from flash import Trainer
from flash.text import QuestionAnsweringData, QuestionAnsweringTask
from optuna.integration.pytorch_lightning import PyTorchLightningPruningCallback

In [None]:
## Loading the Data and generating splits (Will be used later in the tutorial)
DATASET_PATH = os.environ.get("PATH_DATASETS", "_datasets")
# ../input/chaii-hindi-and-tamil-question-answering/train.csv
CHAII_DATASET_PATH = "../input/chaii-hindi-and-tamil-question-answering/"
INPUT_DATA_PATH = os.path.join(CHAII_DATASET_PATH, "train.csv")
TRAIN_DATA_PATH = os.path.join("./", "_train.csv")
VAL_DATA_PATH = os.path.join("./", "_val.csv")
PREDICT_DATA_PATH = os.path.join(CHAII_DATASET_PATH, "test.csv")

In [None]:
df = pd.read_csv(INPUT_DATA_PATH)
fraction = 0.9

tamil_examples = df[df["language"] == "tamil"]
train_split_tamil = tamil_examples.sample(frac=fraction, random_state=200)
val_split_tamil = tamil_examples.drop(train_split_tamil.index)

hindi_examples = df[df["language"] == "hindi"]
train_split_hindi = hindi_examples.sample(frac=fraction, random_state=200)
val_split_hindi = hindi_examples.drop(train_split_hindi.index)

train_split = pd.concat([train_split_tamil, train_split_hindi]).reset_index(drop=True)
val_split = pd.concat([val_split_tamil, val_split_hindi]).reset_index(drop=True)

train_split.to_csv(TRAIN_DATA_PATH, index=False)
val_split.to_csv(VAL_DATA_PATH, index=False)

## Introduction to Optuna

_| NOTE: This introductory example is inspired from `Optuna` documentation._

To understand how we can use Optuna, let us start with a basic example of trying to find a value that minimizes the value of a quadratic function.

Let us consider the function `f(x) = x²-4x+4` as our complex ML model. We know that its root (optimal parameter) is 2. We can also find this using Optuna.

To understand how we can use Optuna for this, we should first convert this to an optimization objective problem. This will be: `minimize f(x) over a certain range of real values [-∞, ∞]`.

Given we have defined these, the next step is to setup an Optuna Study. An Optuna Study is just optimization on a single objective function. This can be achieved with the following line of code: 

`>>> study = optuna.create_study()`.

An Optuna Study conducts many trials in search of the optimal parameters amongst the provided options which could be a set of discrete values or a continuous range. Each Optuna Trial is a single execution of the objective function with a unique selection of all the hyperparameters.

`>>> study.optimize(objective, n_trials=100)`

We can notice that the optimize method takes an objective function. In our case we have already defined our objective function, the code version would look like this:

In [None]:
def objective(trial: optuna.Trial):
    x = trial.suggest_float("x", -10, 10)
    return x ** 2 - 4 * x + 4

The complete code will look like this:

In [None]:
def objective(trial: optuna.Trial):
    x = trial.suggest_float("x", -10, 10)
    return x ** 2 - 4 * x + 4

study = optuna.create_study()
study.optimize(objective, n_trials=20)

Now that we have conducted the study we want the optimal parameters and this is as easy as running `study.best_params` which returns a Python dictionary that maps each hyperparameter variable to the optimal value obtained from the search.

In [None]:
study.best_params

## Optuna with Lightning Flash

Flash is a high-level deep learning framework for fast prototyping, baselining, finetuning and solving deep learning problems. It features a set of tasks for you to use for inference and finetuning out of the box, and an easy to implement API to customize every step of the process for full flexibility.

Flash is built for beginners with a simple API that requires very little deep learning background, and for data scientists, Kagglers, applied ML practitioners and deep learning researchers that want a quick way to get a deep learning baseline with advanced features [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) offers.

**[DESCRIBE OPTUNA]**

_| NOTE: Further sections of the notebook follows from the [previous blog post of the series](https://devblog.pytorchlightning.ai/question-answering-for-dravidian-languages-hindi-and-tamil-f8ffb9fc0c1)._

### Listing out the Hyperparameters for our task

As mentioned above, each machine learning project requires a careful selection of hyperparameters. Firstly we need to identify and define the set of hyperparameters required for the project. 

The most common and necessary one would be **_Learning Rate_** which defines how fast or slow the model learns from the provided data. 

Another important hyperparameter is the **_Batch Size_** which defines how many samples of the dataset should the model learn from everytime.

The choice of hyperparameters changes for every project but the aforementioned ones are common across all projects.

For our current setup, we will be searching for an optimal _Learning Rate_, _Batch Size_, _Transformer Backbone_, _Optimizer_.

### Defining our search space

Once we have listed out the hyperparameters for our project, we have to define the possible values. 

It is impossible to search over an infinite search space to find an optimal set of values and it is time consuming as well.

Optuna let's us provide choices categorical choice, or a range of float values etc.

Defining our search space beforehand:
- Learning Rate: [1e-5, 5e-4]
- Batch Size: 2 or 4 or 8
- Transformer Backbone: "xlm-roberta-base" or "bert-base-multilingual-uncased"
- Optimizer: "adam" or "adamw"

### Creating an objective function

Given that we have decided on the hyperparameters and the search space for the project, we have to formulate the optimization objective function. We are in luck here because every machine learning problem is an optimization problem in disguise where we want to minimize / maximize something within some well defined constraints.

Here we would like to minimize the loss of the Question Answering model.

Let us first understand the process of Hyperparameter Search using Optuna.

Optuna provides the `Study` class which is responsible for conducting the hyperparameter search by studying the model's performance across all the combinations of the hyperparameters it generates from the search space.

Each such combination of the hyperparameters is generated by a `Trial` class which is instantiated when we run the `Study.optimize()` method.

The `objective` function that we pass onto the `Study.optimize()` exposes the current `Trial` object in action whose interface we use to define our search space.

In [None]:
# The objective function would look something like this
def objective(trial: optuna.Trial):

    # A unique set of hyperparameter combination is sampled.
    learning_rate = trial.suggest_uniform("learning_rate", 1e-5, 5e-4)
    batch_size = trial.suggest_categorical("batch_size", [2, 4])
    backbone = trial.suggest_categorical(
        "backbone", 
        ["xlm-roberta-base", "bert-base-multilingual-uncased"]
    )
    optimizer = trial.suggest_categorical("optimizer", ["adam", "adamw"])

    # Setup the machine learning pipeline with the new set
    # of hyperparameters.
    datamodule = QuestionAnsweringData.from_csv(
        train_file=TRAIN_DATA_PATH,
        val_file=VAL_DATA_PATH,
        backbone=backbone,
        batch_size=batch_size,
    )

    model = QuestionAnsweringTask(
        backbone=backbone,
        learning_rate=learning_rate,
        optimizer=optimizer,
    )

    trainer = Trainer(
        max_epochs=2,
        limit_train_batches=10, # For the tutorial's sake, to provide
        limit_val_batches=10,   # a quicker demonstration.
        gpus= torch.cuda.device_count(),
    )

    # Train the model for Optuna to understand the current
    # Hyperparameter combination's behaviour.
    trainer.fit(model, datamodule=datamodule)

    # The extra step to tell Optuna which value to base the
    # optimization routine on.
    value = trainer.callback_metrics["val_loss"].item()
    
    del datamodule, model, trainer
    gc.collect()
    torch.cuda.empty_cache()
    
    return value

Once we have the objective function down, running the search is very simple as mentioned before.

In [None]:
study = optuna.create_study()

In [None]:
study.optimize(objective, n_trials=5)

And so is obtaining the best combination of hyperparameters:

In [None]:
study.best_params

Add a note on other available inbuilt optuna tools like Sampler, Pruning Callback, etc ?