# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Additional Notebook (ungraded): Tuning hyperparameters using Optuna

# Introduction

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API.

Optuna has modern functionalities as follows:

- Lightweight, versatile, and platform agnostic architecture
  - Handle a wide variety of tasks with a simple installation that has few requirements.
- Pythonic search spaces
  - Define search spaces using familiar Python syntax including conditionals and loops.
- Efficient optimization algorithms
  - Adopt state-of-the-art algorithms for sampling hyperparameters and efficiently pruning unpromising trials.
- Easy parallelization
  - Scale studies to tens or hundreds or workers with little or no changes to the code.
- Quick visualization
  - Inspect optimization histories from a variety of plotting functions.

# Tuning Hyperparameters using Optuna

- Install Optuna
- Write a training algorithm that involves hyperparameters
  - Read train/valid data
  - Define and train the model
  - Evaluate model
- Use Optuna to tune the hyperparameters (hyperparameter optimization)
- Visualize the hyperparameter optimization

## Install `optuna`

Optuna can be installed via `pip` or `conda`.

In [None]:
!pip install --quiet optuna

In [None]:
import optuna

optuna.__version__

## Optimize Hyperparameters

### Define a simple scikit-learn model

We start with a simple random forest model to classify flowers in the Iris dataset.
- Define a function called `objective` that encapsulates the whole training process and outputs the accuracy of the model.

In [None]:
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection

def objective():
    iris = sklearn.datasets.load_iris()  # Prepare the data.

    clf = sklearn.ensemble.RandomForestClassifier(
        n_estimators=5, max_depth=3)  # Define the model.

    return sklearn.model_selection.cross_val_score(
        clf, iris.data, iris.target, n_jobs=-1, cv=3).mean()  # Train and evaluate the model.

print('Accuracy: {}'.format(objective()))

### Optimize hyperparameters of the model

The hyperparameters of the above algorithm are `n_estimators` and `max_depth`, for which we can try different values to see if the model accuracy can be improved. The `objective` function is modified to accept a trial object. This trial has several methods for sampling hyperparameters. We create a study to run the hyperparameter optimization and finally arrive at the most optimal hyperparameters.

In [None]:
import optuna

def objective(trial):
    iris = sklearn.datasets.load_iris()

    n_estimators = trial.suggest_int('n_estimators', 2, 20)
    max_depth = int(trial.suggest_float('max_depth', 1, 32, log=True))

    clf = sklearn.ensemble.RandomForestClassifier(
        n_estimators=n_estimators, max_depth=max_depth)

    return sklearn.model_selection.cross_val_score(
        clf, iris.data, iris.target, n_jobs=-1, cv=3).mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

trial = study.best_trial

print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))

### Plotting the study

Plotting the optimization history of the study.

In [None]:
optuna.visualization.plot_optimization_history(study)

Plotting the accuracies for each hyperparameter for each trial.

In [None]:
optuna.visualization.plot_slice(study)

Plotting the accuracy surface for the hyperparameters involved in the random forest model.

In [None]:
optuna.visualization.plot_contour(study, params=['n_estimators', 'max_depth'])

# Reference Reading:

[Optuna - hyperparameter optimization framework](https://cdn.iisc.talentsprint.com/CDS/Assignments/Module2/Addl_NB_Tuning_hyperparameters_using_Optuna_Reference%20Reading%20Optuna%20a%20flexible%20efficient.pdf)

[Optuna examples](https://github.com/optuna/optuna-examples/)

[Optuna vs Hyperopt](https://neptune.ai/blog/optuna-vs-hyperopt)


