# Tutorial: Accelerated Hyperparameter Tuning For PyTorch

### In this tutorial, we'll show you how to use state-of-the-art hyperparameter tuning with Tune and PyTorch.

<img src="tune-arch-simple.png" alt="Tune Logo" width="600"/>

Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) without modifying your underlying code.

Tune is a scalable framework for model training and hyperparameter search with a focus on deep learning and deep reinforcement learning.

* **Code**: https://github.com/ray-project/ray/tree/master/python/ray/tune 
* **Examples**: https://github.com/ray-project/ray/tree/master/python/ray/tune/examples
* **Documentation**: http://ray.readthedocs.io/en/latest/tune.html
* **Mailing List** https://groups.google.com/forum/#!forum/ray-dev

### Exercise 1: PyTorch Boilerplate Code

Run the below cells to see what you would do with Tune without any additional optimization techniques. You'll see that integrating Tune with PyTorch **requires 1 line of code**!

In [None]:
# This is some basic imports. 
# Original Code here:
# https://github.com/pytorch/examples/blob/master/mnist/main.py
import numpy as np
import torch
import torch.optim as optim
from torchvision import datasets
from helper import train, test, ConvNet, get_data_loaders

from ray import tune
from ray.tune import track
from ray.tune.schedulers import AsyncHyperBandScheduler

%matplotlib inline
import matplotlib.style as style
style.use("ggplot")

datasets.MNIST("~/data", train=True, download=True)

Below, we have some boiler plate code for a PyTorch training function. You can take a look at these functions in `helper.py`; there's no black magic happening. For example, `train` is simply a for loop over the data loader.

```python
    def train(model, optimizer, train_loader):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            if batch_idx * len(data) > EPOCH_SIZE:
                return
            optimizer.zero_grad()
            output = model(data)
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()
```

**TODO**: Add `track.log(mean_accuracy=acc)` within the training loop. `tune.track` allows Tune to keep track of current results.

In [None]:
def train_mnist(config):
    model = ConvNet(config)
    train_loader, test_loader = get_data_loaders()

    optimizer = optim.SGD(
        model.parameters(), lr=config["lr"], momentum=config["momentum"])

    for i in range(20):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)
        # TODO: Add track.log(mean_accuracy=acc) here
        if i % 5 == 0:
            torch.save(model, "./model.pth") # This saves the model to the trial directory

#### Let's run 1 trial, randomly sampling from a uniform distribution for learning rate and momentum. 
Run the below cell to run Tune.

In [None]:
experiment_config = dict(
    name="train_mnist",
    stop={"mean_accuracy": 0.98},
    return_trials=False
)

search_space = {
    "lr": tune.sample_from(lambda spec: 10**(-10 * np.random.rand())),
    "momentum": tune.uniform(0.1, 0.9)
}

# Note: use `ray.init(redis_address=...)` to enable distributed execution
analysis = tune.run(train_mnist, config=search_space, **experiment_config)

#### Plot the performance of this trial.

In [None]:
dfs = analysis.get_all_trial_dataframes()
[d.mean_accuracy.plot() for d in dfs.values()]

### Exercise 2: Early Stopping with ASHA

ASHA is a scalable algorithm for principled early stopping. How does it work? On a high level, it terminates trials that are less promising and allocates more time and resources to more promising trials. 

    The successive halving algorithm begins with all candidate configurations in the base rung and proceeds as follows:

        1. Uniformly allocate a budget to a set of candidate hyperparameter configurations in a given rung.
        2. Evaluate the performance of all candidate configurations.
        3. Promote the top half of candidate configurations to the next rung.
        4. Double the budget per configuration for the next rung and repeat until one configurations remains. 
        
A textual representation:
    
           | Configurations | Epochs per 
           | Remaining      | Configuration
    ---------------------------------------
    Rung 1 | 27             | 1
    Rung 2 | 9              | 3
    Rung 3 | 3              | 9
    Rung 4 | 1              | 27

(from https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/)

Now, let's integrate this with a PyTorch codebase.

**TODO**: Set up ASHA.

1) Create an Asynchronous HyperBand Scheduler (ASHA).
```python
from ray.tune.schedulers import ASHAScheduler

custom_scheduler = ASHAScheduler(
    reward_attr='mean_accuracy',
    grace_period=1,
)
```

*Note: Read the documentation on this step at https://ray.readthedocs.io/en/latest/tune-schedulers.html#asynchronous-hyperband or call ``help(tune.schedulers.AsyncHyperBandScheduler)`` to learn more about the Asynchronous Hyperband Scheduler*

2) With this, we can afford to **increase the search space by 5x**. To do this, set the parameter `num_samples`. For example,

```python
tune.run(... num_samples=30)
```

In [None]:
from ray.tune.schedulers import ASHAScheduler

custom_scheduler = "FIX ME"

analysis = tune.run(
    train_mnist, 
    num_samples="FIX ME", 
    scheduler=custom_scheduler, 
    config=search_space, 
    **experiment_config)

In [None]:
# Plot by wall-clock time

dfs = analysis.get_all_trial_dataframes()
# This plots everything on the same plot
ax = None
for d in dfs.values():
    ax = d.plot("timestamp", "mean_accuracy", ax=ax, legend=False)

In [None]:
# Plot by epoch
ax = None
for d in dfs.values():
    ax = d.mean_accuracy.plot(ax=ax, legend=False)

### Exercise 3: Search Algorithms in Tune

With Tune you can combine powerful Hyperparameter Search libraries such as HyperOpt (https://github.com/hyperopt/hyperopt) with state-of-the-art algorithms such as HyperBand without modifying any model training code. Tune allows you to use different search algorithms in combination with different trial schedulers. 

The documentation to doing this is here: https://ray.readthedocs.io/en/latest/tune-searchalg.html#hyperopt-search-tree-structured-parzen-estimators

Currently, Tune offers the following search algorithms (and library integrations):

* Grid Search and Random Search
* BayesOpt
* HyperOpt
* SigOpt
* Nevergrad
* Scikit-Optimize
* Ax

Check out more at https://ray.readthedocs.io/en/latest/tune-searchalg.html

**TODO:** Plug in `HyperOptSearch` into `tune.run` and enforce that only 2 trials can run concurrently, like this - 

```python
    hyperopt_search = HyperOptSearch(space, max_concurrent=2, reward_attr="mean_accuracy")
```

In [None]:
from hyperopt import hp
from ray.tune.suggest.hyperopt import HyperOptSearch

space = {
    "lr": hp.loguniform("lr", 1e-10, 0.1),
    "momentum": hp.uniform("momentum", 0.1, 0.9),
}

hyperopt_search = "FIX ME"  # TODO: Change this

analysis = tune.run(
    train_mnist, 
    num_samples=10,  
    search_alg="FIX ME",  #  TODO: Change this
    **experiment_config)

# Please: fill out this form to provide feedback on this tutorial!

https://goo.gl/forms/NVTFjUKFz4TH8kgK2