# Advanced Tutorial 12: Hyperparameter Search

## Overview
In this tutorial, we will discuss the following topics:
* [FastEstimator Search API](#ta12searchapi)
    * [Getting the search results](#ta12searchresults)
    * [Saving and loading search results](#ta12saveload)
    * [Interruption-resilient search](#ta12interruption)
* [Example 1: Hyperparameter Tuning by Grid Search](#ta12example1)
    * [Search Visualization](#ta12searchvisualization)
* [Example 2: RUA Augmentation via Golden-Section Search](#ta12example2)

<a id='ta12searchapi'></a>

## Search API

There are many things in life that requires searching for an optimal solution in a given space, regardless of whether deep learning is involved. For example:
* what is the `x` that leads to the minimal value of `(x-3)**2`?
* what is the best `learning rate` and `batch size` combo that can produce the lowest evaluation loss after 2 epochs of training?
* what is the best augmentation magnitude that can lead to the highest evaluation accuracy?

The `fe.search` API is designed to make the search easier, the API can be used independently for any search problem, as it only requires the following two components:
1. objective function to measure the score of a solution.
2. whether a maximum or minimum score is desired.

We will start with a simple example using `Grid Search`. Say we want to find the `x` that produces the minimal value of `(x-3)**2`, where x is chosen from the list: `[0.5, 1.5, 2.9, 4, 5.3]`

In [None]:
from fastestimator.search import GridSearch

def objective_fn(search_idx, x):
    return {"objective": (x-3)**2}

grid_search = GridSearch(eval_fn=objective_fn, params={"x": [0.5, 1.5, 2.9, 4, 5.3]})

Note that in the score function, one of the arguments must be `search_idx`. This is to help user differentiate multiple search runs. To run the search, simply call:

In [None]:
grid_search.fit()

<a id='ta12searchresults'></a>

### Getting the search results
After the search is done, you can also call the `search.get_best_results` or `search.get_search_results` to see the best and overall search history:

In [None]:
print("best search result:")
print(grid_search.get_best_results(best_mode="min", optimize_field="objective"))

In [None]:
print("search history:")
print(grid_search.get_search_summary())

<a id='ta12saveload'></a>

### Saving and loading search results

Once the search is done, you can also save the search results into the disk and later load them back using `save` and `load` methods:

In [None]:
import tempfile
save_dir = tempfile.mkdtemp()

# save the state to save_dir
grid_search.save(save_dir) 

# instantiate a new object
grid_search2 = GridSearch(eval_fn=objective_fn, params={"x": [0.5, 1.5, 2.9, 4, 5.3]}) 

# load the previously saved state
grid_search2.load(save_dir)

# display the best result of the loaded instance
print(grid_search2.get_best_results(best_mode="min", optimize_field="objective")) 

# display the search summary of the loadeded instance
print(grid_search2.get_search_summary())

<a id='ta12interruption'></a>

### Interruption-resilient search
When you run search on a hardware that can be interrupted (like an AWS spot instance), you can provide a `save_dir` argument when calling `fit`. As a result, the search will automatically back up its result after each evaluation. Furthermore, when calling `fit` using the same `save_dir` the second time, it will first load the search results and then pick up from where it left off. 

To demonstrate this, we will use golden-section search on the same optimization problem. To simulate interruption, we will first iterate 10 times, then create a new instance and iterate another 10 times.

In [None]:
from fastestimator.search import GoldenSection
save_dir2 = tempfile.mkdtemp()

gs_search =  GoldenSection(eval_fn=objective_fn, 
                           x_min=0, 
                           x_max=6, 
                           max_iter=10, 
                           integer=False, 
                           optimize_field="objective", 
                           best_mode="min")

gs_search.fit(save_dir=save_dir2)

After interruption, we can create the instance and call `fit` on the same directory:

In [None]:
gs_search2 =  GoldenSection(eval_fn=objective_fn, 
                           x_min=0, 
                           x_max=6, 
                           max_iter=20, 
                           integer=False, 
                           optimize_field="objective", 
                           best_mode="min")

gs_search2.fit(save_dir=save_dir2)

As we can see, the search started from search index 13 and proceeded for another 10 iterations.

<a id='ta12example1'></a>

## Example 1: Hyperparameter Tuning by Grid Search

In this example, we will use `GridSearch` on a real deep learning task to illustrate its usage. Based on number of hyperparameters, the  grid search is performed accordingly.

In [None]:
import tensorflow as tf
import fastestimator as fe
from fastestimator.architecture.tensorflow import LeNet
from fastestimator.dataset.data import mnist
from fastestimator.op.numpyop.univariate import ExpandDims, Minmax, RUA
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp


def get_hypara_tuning_estimator(batch_size, lr, choice):

    pipeline_ops = []

    if choice and isinstance(choice, str):
        pipeline_ops = [RUA(inputs="x", outputs="x", mode="train", choices=[choice])]

    pipeline_ops = pipeline_ops + [ExpandDims(inputs="x", outputs="x"), Minmax(inputs="x", outputs="x")]
    train_data, test_data = mnist.load_data()
    pipeline = fe.Pipeline(train_data=train_data,
                           test_data=test_data,
                           batch_size=batch_size,
                           ops=pipeline_ops,
                           num_process=0)
    model = fe.build(model_fn=LeNet, optimizer_fn=lambda: tf.optimizers.Adam(lr))
    network = fe.Network(ops=[
        ModelOp(model=model, inputs="x", outputs="y_pred"),
        CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
        UpdateOp(model=model, loss_name="ce")
    ])
    estimator = fe.Estimator(pipeline=pipeline, network=network, epochs=1, train_steps_per_epoch=500)
    return estimator

Given a batch size grid `[32, 64]`, we are interested in the optimial parameter that leads to the lowest test loss after 200 steps of training on MNIST dataset.

In [None]:
def eval_fn_v1(search_idx, batch_size):
    est = get_hypara_tuning_estimator(batch_size, lr=1e-3, choice=None)
    est.fit(warmup=False)
    hist = est.test(summary="myexp")
    loss = float(hist.history["test"]["ce"][500])
    return {"test_loss": loss}


mnist_grid_search_single = GridSearch(eval_fn=eval_fn_v1, params={"batch_size": [32, 64]})

mnist_grid_search_single.fit()

mnist_grid_search_single.get_best_results(best_mode="min", optimize_field="test_loss")

Given a batch size grid `[32, 64]` and learning rate grid `[1e-2 and 1e-3]`, we are interested in the optimial parameter that leads to the lowest test loss after 200 steps of training on MNIST dataset.

In [None]:
def eval_fn_v2(search_idx, batch_size, lr):
    est = get_hypara_tuning_estimator(batch_size, lr=lr, choice=None)
    est.fit(warmup=False)
    hist = est.test(summary="myexp")
    loss = float(hist.history["test"]["ce"][500])
    return {"test_loss": loss}

mnist_grid_search_double = GridSearch(eval_fn=eval_fn_v2, params={"batch_size": [32, 64], "lr": [1e-2, 1e-3]})

mnist_grid_search_double.fit()

mnist_grid_search_double.get_best_results(best_mode="min", optimize_field="test_loss")

Given a batch size grid `[32, 64]`, learning rate grid `[1e-2 and 1e-3]` and built-in augmentation `["Rotate", "Brightness"]`, we are interested in the optimial parameter that leads to the lowest test loss after 200 steps of training on MNIST dataset.

In [None]:
def eval_fn_v3(search_idx, batch_size, lr, choices):
    est = get_hypara_tuning_estimator(batch_size, lr=lr, choice=choices)
    est.fit(warmup=False)
    hist = est.test(summary="myexp")
    loss = float(hist.history["test"]["ce"][500])
    return {"test_loss": loss}

mnist_grid_search_multi = GridSearch(
    eval_fn=eval_fn_v3, params={
        "batch_size": [32, 64], "lr": [1e-2, 1e-3], "choices": ["Rotate", "Brightness"]
    })

mnist_grid_search_multi.fit()

mnist_grid_search_multi.get_best_results(best_mode="min", optimize_field="test_loss")

<a id='ta12searchvisualization'></a>

### Search Visualization

Visualization of grid search with single hyperparameter:

In [None]:
from fastestimator.search.visualize import visualize_search

visualize_search(search=mnist_grid_search_single)

Visualization of grid search with two hyperparameters:

In [None]:
from fastestimator.search.visualize import visualize_search

visualize_search(search=mnist_grid_search_double)

Visualization of grid search with more than 2 hyperparameters:

In [None]:
from fastestimator.search.visualize import visualize_search

visualize_search(search=mnist_grid_search_multi)

<a id='ta12example2'></a>

## Example 2: RUA Augmentation via Golden-Section Search

In this example, we will use a built-in augmentation NumpyOp - RUA - and find the optimial level between 0 to 30 using `Golden-Section` search. The test result will be evaluated on the ciFAIR10 dataset after 500 steps of training.

In [None]:
import tensorflow as tf
import fastestimator as fe
from fastestimator.architecture.tensorflow import LeNet
from fastestimator.dataset.data import cifair10
from fastestimator.op.numpyop.univariate import ExpandDims, Minmax, RUA
from fastestimator.op.tensorop.loss import CrossEntropy
from fastestimator.op.tensorop.model import ModelOp, UpdateOp

def get_estimator(level):
    train_data, test_data = cifair10.load_data()
    pipeline = fe.Pipeline(train_data=train_data,
                           test_data=test_data,
                           batch_size=64,
                           ops=[RUA(level=level, inputs="x", outputs="x", mode="train"), 
                                Minmax(inputs="x", outputs="x")],
                           num_process=0)
    model = fe.build(model_fn=lambda: LeNet(input_shape=(32, 32, 3)), optimizer_fn="adam")
    network = fe.Network(ops=[
        ModelOp(model=model, inputs="x", outputs="y_pred"),
        CrossEntropy(inputs=("y_pred", "y"), outputs="ce"),
        UpdateOp(model=model, loss_name="ce")
    ])
    estimator = fe.Estimator(pipeline=pipeline,
                             network=network,
                             epochs=1,
                             train_steps_per_epoch=500)
    return estimator

def eval_fn(search_idx, level):
    est = get_estimator(level)
    est.fit(warmup=False)
    hist = est.test(summary="myexp")
    loss = float(hist.history["test"]["ce"][500])
    return {"test_loss": loss}

cifair10_gs_search = GoldenSection(eval_fn=eval_fn, x_min=0, x_max=30, max_iter=5, best_mode="min", optimize_field="test_loss")

In [None]:
cifair10_gs_search.fit()

In this example, the optimial level we found is 4. We can then train the model again using `level=4` to get the final model. In a real use case you will want to perform parameter search on a held-out evaluation set, and test the best parameters on the test set.