# Tutorial: Accelerated Hyperparameter Tuning For PyTorch

### In this tutorial, we'll show you how to leverage advanced hyperparameter tuning techniques with Tune.

<img src="tune-arch-simple.png" alt="Tune Logo" width="600"/>

Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) without modifying your underlying code.

Tune is a scalable framework for model training and hyperparameter search with a focus on deep learning and deep reinforcement learning.

* **Code**: https://github.com/ray-project/ray/tree/master/python/ray/tune 
* **Examples**: https://github.com/ray-project/ray/tree/master/python/ray/tune/examples
* **Documentation**: http://ray.readthedocs.io/en/latest/tune.html
* **Mailing List** https://groups.google.com/forum/#!forum/ray-dev

In [None]:
## If you are running on Google Colab, uncomment below to install the necessary dependencies 
## before beginning the exercise.

# !pip uninstall -y pyarrow
# !pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev5-cp37-cp37m-manylinux1_x86_64.whl
# !pip install ray[debug]

# # A hack to force the runtime to restart, needed to include the above dependencies.
# import os
# os._exit(0)

### Exercise 1: PyTorch Boilerplate Code

In [None]:
import numpy as np
import torch
import torch.optim as optim
from torchvision import datasets
from ray.tune.examples.mnist_pytorch import train, test, ConvNet, get_data_loaders

from ray import tune
from ray.tune import track
from ray.tune.schedulers import AsyncHyperBandScheduler

%matplotlib inline
import matplotlib.style as style
style.use("ggplot")

datasets.MNIST("~/data", train=True, download=True)

Below, we have some boiler plate code for a PyTorch training function. You can take a look at these functions. For example, `train` is simply a for loop over the data loader.

```python
    def train(model, optimizer, train_loader):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            if batch_idx * len(data) > EPOCH_SIZE:
                return
            optimizer.zero_grad()
            output = model(data)
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()
```

In order to make decisions in the middle of training, we need to let the training function notify Tune. The ``tune.track`` API allows Tune to keep track of current results.

**TODO**: Add `tune.track.log(mean_accuracy=acc)` within the training loop. 

In [None]:
def train_mnist(config):
    model = ConvNet()
    train_loader, test_loader = get_data_loaders()

    optimizer = optim.SGD(
        model.parameters(), lr=config["lr"], momentum=config["momentum"])

    for i in range(20):
        train(model, optimizer, train_loader)  # Train for 1 epoch
        acc = test(model, test_loader)  # Obtain validation accuracy.
        # TODO: Add tune.track.log(mean_accuracy=acc) here
        if i % 5 == 0:
            torch.save(model, "./model.pth") # This saves the model to the trial directory

### Example Trial Run

Let's run 1 trial, randomly sampling from a uniform distribution for learning rate and momentum. 

A "trial" is the execution of training using a set of hyperparameters. An **experiment** is a set of trials (i.e., a hyperparameter search).

Run the below cell to run Tune. 

#### This is one random sample and should perform poorly.

In [None]:
search_space = {
    "lr": tune.sample_from(lambda spec: 10**(-10 * np.random.rand())),
    "momentum": tune.uniform(0.1, 0.9)
}

analysis = tune.run(
    train_mnist, 
    config=search_space, 
    verbose=1,
    name="train_mnist",  # This is used to specify the logging directory.
    stop={"mean_accuracy": 0.98}  # This will stop the trial 
)

#### Plot the performance of this trial.

In [None]:
dfs = analysis.fetch_trial_dataframes()
[d.mean_accuracy.plot() for d in dfs.values()]

### Exercise 2: Efficient Grid Search with Early Stopping


Tune provides a `tune.grid_search` primitive to pass into `tune.run` as follows:
```python
tune.run(config={"variable": tune.grid_search([1, 2, 3])})
```

From this, Tune will run 3 trials, evaluating each value in the grid search. To specify a multi-dimensional grid search, you can use `tune.grid_search` on multiple variables:


```python
tune.run(config={
    "variable1": tune.grid_search([1, 2, 3]),
    "variable2": tune.grid_search([1, 2, 3]),
    "variable3": tune.grid_search([1, 2, 3]),
    "variable4": tune.grid_search([1, 2, 3]),
})
```

This will generate a total $3 * 3 * 3 * 3 = 81$ trials.

**TODO**: Specify a multi-dimensional grid search, gridding over `lr` and `momentum`. Choose 5 values between 0.001 to 0.9 for both values.

In [None]:
# TODO: Specify a multi-dimensional grid search, gridding over lr and momentum. 
# Choose 5 values between 0.001 to 0.9 for both values.
hyperparameter_space = {
    "lr": "TODO"
    "momentum":  "TODO"
}

assert "grid_search" in hyperparameter_space.get("lr") 
assert "grid_search" in hyperparameter_space.get("momentum")

#### Using an early-stopping algorithm

An efficient hyperparameter optimization avoids training low-performing trials. This is one of the main inefficiencies of a grid search. 

In Tune, we can avoid this by using state-of-the-art search algorithms such as ASHA. ASHA is a scalable algorithm for principled early stopping. How does it work? On a high level, it terminates trials that are less promising and allocates more time and resources to more promising trials. 

    The successive halving algorithm begins with all candidate configurations in the base rung and proceeds as follows:

        1. Uniformly allocate a budget to a set of candidate hyperparameter configurations in a given rung.
        2. Evaluate the performance of all candidate configurations.
        3. Promote the top half of candidate configurations to the next rung.
        4. Double the budget per configuration for the next rung and repeat until one configurations remains. 
        
A textual representation:
    
           | Configurations | Epochs per 
           | Remaining      | Configuration
    ---------------------------------------
    Rung 1 | 27             | 1
    Rung 2 | 9              | 3
    Rung 3 | 3              | 9
    Rung 4 | 1              | 27

(from https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/)

**TODO**: Set up ASHA.

1) Create an ASHA "Scheduler" (ASHA). A scheduler decides which trials to run, stop, or pause. 
```python
from ray.tune.schedulers import ASHAScheduler

custom_scheduler = ASHAScheduler(
    metric='mean_accuracy',
    mode="max",
    grace_period=1,
)
```

*Note: Read the documentation on this step at https://ray.readthedocs.io/en/latest/tune-schedulers.html#asynchronous-hyperband or call ``help(tune.schedulers.ASHAScheduler)`` to learn more about the ASHA Scheduler*



#### How do I debug things in Tune?

The `error file` column will show up in the output. Run the below cell with the ``error file`` path to diagnose your issue.

```
! cat /home/ubuntu/tune_iris/tune_iris_c66e1100_2019-10-09_17-13-24x_swb9xs/error_2019-10-09_17-13-29.txt
```

In [None]:
from ray.tune.schedulers import ASHAScheduler

custom_scheduler = None

analysis = tune.run(
    train_mnist, 
    scheduler=custom_scheduler, 
    config=hyperparameter_space, 
    verbose=1,
    name="train_mnist"  # This is used to specify the logging directory.
)

#### Let's plot our results by wall-clock time and epoch. 

In [None]:
# Plot by wall-clock time

dfs = analysis.fetch_trial_dataframes()
# This plots everything on the same plot
ax = None
for d in dfs.values():
    ax = d.plot("timestamp", "mean_accuracy", ax=ax, legend=False)

In [None]:
# Plot by epoch
ax = None
for d in dfs.values():
    ax = d.mean_accuracy.plot(ax=ax, legend=False)

### Exercise 3: Search Algorithms in Tune

Tune enables you to scale existing hyperparameter search libraries such as HyperOpt (https://github.com/hyperopt/hyperopt). In this setting, use the external library's hyperparameter space specification instead of Tune's configuration.

Search algorithms can limit the number of concurrent hyperparameters are being evaluated. This is necessary because sometimes the external library is more effective when evaluations are sequential.

**TODO:** Create a HyperOptSearch object by passing in a HyperOpt specific search space. Also enforce that only 2 trials can run concurrently:

```python
    hyperopt_search = HyperOptSearch(space, max_concurrent=2, metric="mean_accuracy", mode="max")
```

Then, plug in `HyperOptSearch` into `tune.run`.

In [None]:
from hyperopt import hp
from ray.tune.suggest.hyperopt import HyperOptSearch

# This is a HyperOpt specific hyperparameter space configuration.
space = {
    "lr": hp.loguniform("lr", -10, -1),
    "momentum": hp.uniform("momentum", 0.1, 0.9),
}

# TODO: Create a HyperOptSearch object by passing in a HyperOpt specific search space.
# Also enforce that only 2 trials can run concurrently:
hyperopt_search = "TODO" # TODO: Change this


! rm -rf ~/ray_results/search_algorithm
analysis = tune.run(
    train_mnist, 
    num_samples=10,  
    search_alg="TODO",  #  TODO: Change this
    verbose=1,
    name="search_algorithm"  # This is used to specify the logging directory.
)

## Extra - use Tensorboard for results

You can use TensorBoard to view trial performances. If the graphs do not load, click `Toggle All Runs`.

In [None]:
%load_ext tensorboard

In [None]:
%tensorboard --logdir ~/ray_results/search_algorithm

# Please: fill out this form to provide feedback on this tutorial!

https://goo.gl/forms/NVTFjUKFz4TH8kgK2

# Extra: Using GPUs.

If your machine has a GPU, you can use the `resources_per_trial` argument to specify that the trial should use a GPU. This allows Tune to automatically set the `CUDA_VISIBLE_DEVICES` for the trial and enforce resource isolation (i.e., 1 trial per GPU at a time).

In [None]:
analysis = tune.run(
    train_mnist, 
    num_samples=10,  
    resources_per_trial={"GPU": 1},
    search_alg="TODO",  #  TODO: Change this
    verbose=1,
    name="search_algorithm"  # This is used to specify the logging directory.
)