<center><font size="3.5"><b>Hyperparameter optimization using raytune / RISELab</b></font></center>

In [1]:
# Installables
#!pip install ray[tune] torch torchvision filelock

# Dependencies 

import torch.optim as optim
from ray import tune
from ray.tune.examples.mnist_pytorch import get_data_loaders, ConvNet, train, test

### Grid search to train CNN using PyTorch and tune

In [2]:
def train_mnist(config):
    train_loader, test_loader = get_data_loaders()
    model = ConvNet()
    optimizer = optim.SGD(model.parameters(), lr=config["lr"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)
        tune.track.log(mean_accuracy=acc)


analysis = tune.run(
    train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})

print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))

# Get a dataframe for analyzing trial results.
df = analysis.dataframe()

2019-11-25 10:42:56,851	INFO resource_spec.py:205 -- Starting Ray with 11.62 GiB memory available for workers and up to 5.82 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2019-11-25 10:42:59,033	INFO function_runner.py:254 -- tune.track signature detected.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/11.62 GiB heap, 0.0/4.0 GiB objects
Memory usage on this node: 12.6/31.6 GiB

[2m[33m(pid=raylet)[0m E1125 10:42:57.841189900    2145 socket_utils_common_posix.cc:202] check for SO_REUSEPORT: {"created":"@1574707377.841170700","description":"Protocol not available","errno":92,"file":"external/com_github_grpc_grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":179,"os_error":"Protocol not available","syscall":"getsockopt(SO_REUSEPORT)"}
[2m[33m(pid=raylet)[0m E1125 10:42:57.841419100    2145 socket_utils_common_posix.cc:303] setsockopt(TCP_USER_TIMEOUT) Protocol not available
[2m[33m(pid=raylet)[0m E1125 10:42:57.911507000    2145 socket_utils_common_posix.cc:303] setsockopt(TCP_USER_TIMEOUT) Protocol not available


2019-11-25 10:43:20,503	ERROR log_sync.py:28 -- Log sync requires rsync to be installed.
2019-11-25 10:43:20,549	ERROR log_sync.py:28 -- Log sync requires rsync to be installed.
2019-11-25 10:43:20,588	ERROR log_sync.py:28 -- Log sync requires rsync to be installed.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 1/8 CPUs, 0/0 GPUs, 0.0/11.62 GiB heap, 0.0/4.0 GiB objects
Memory usage on this node: 13.2/31.6 GiB
Result logdir: /home/berkeleylab/ray_results/train_mnist
Number of trials: 3 ({'RUNNING': 1, 'PENDING': 2})
PENDING trials:
 - train_mnist_1_lr=0.01:	PENDING
 - train_mnist_2_lr=0.1:	PENDING
RUNNING trials:
 - train_mnist_0_lr=0.001:	RUNNING

Result for train_mnist_2_lr=0.1:
  date: 2019-11-25_10-43-22
  done: false
  experiment_id: 6374e3ea813b454e99af8e7ef6061734
  hostname: KiranCHHATRE
  iterations_since_restore: 1
  mean_accuracy: 0.5375
  node_ip: 10.0.0.194
  pid: 2160
  time_since_restore: 1.2691292762756348
  time_this_iter_s: 1.2691292762756348
  time_total_s: 1.2691292762756348
  timestamp: 1574707402
  timesteps_since_restore: 0
  training_iteration: 0
  trial_id: 67a9baa8
  
Result for train_mnist_0_lr=0.001:
  date: 2019-11-25_10-43-22
  done: false
  experiment_id: f530b5ec3a6f44888eb22c79bcd4ae76
  hostna

2019-11-25 10:43:25,542	INFO tune.py:276 -- Returning an analysis object by default. You can call `analysis.trials` to retrieve a list of trials. This message will be removed in future versions of Tune.


== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/11.62 GiB heap, 0.0/4.0 GiB objects
Memory usage on this node: 13.3/31.6 GiB
Result logdir: /home/berkeleylab/ray_results/train_mnist
Number of trials: 3 ({'TERMINATED': 3})
TERMINATED trials:
 - train_mnist_0_lr=0.001:	TERMINATED, [1 CPUs, 0 GPUs], [pid=2161], 4 s, 9 iter, 0.275 acc
 - train_mnist_1_lr=0.01:	TERMINATED, [1 CPUs, 0 GPUs], [pid=2164], 4 s, 9 iter, 0.722 acc
 - train_mnist_2_lr=0.1:	TERMINATED, [1 CPUs, 0 GPUs], [pid=2160], 4 s, 9 iter, 0.881 acc

== Status ==
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/11.62 GiB heap, 0.0/4.0 GiB objects
Memory usage on this node: 13.3/31.6 GiB
Result logdir: /home/berkeleylab/ray_results/train_mnist
Number of trials: 3 ({'TERMINATED': 3})
TERMINATED trials:
 - train_mnist_0_lr=0.001:	TERMINATED, [1 CPUs, 0 GPUs], [pid=2161], 4 s, 9 iter, 0.275 acc
 - train_mnist_1_lr=0.01:	TERMINATED, [1 CPUs, 0 GPUs], [pid=2164]

In [4]:
%load_ext tensorboard
%tensorboard --logdir ~/ray_results


The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard



Features | Resources 
--- | ---  
**Compatible Frameworks** | PyTorch, Keras 
**Visualization** | TensorBoard
**SOTA Algorithms** | Population Based Training (DeepMind), Viziers Median Stopping Rule (Google), HypeBand/ASHA (CMU)
**Tuning Libraries** | Ax (Facebook), HyperOpt, BayesOpt, Grid/ Random search, SigOpt, Nevergrad, Scikit-Optimize, BOHB
**Contribution/ Extension** | Yes


> #### Contribution of new algorithms/ model-based suggestion algorithms: API setup documentation available


### Distributed Experimentation on local/ cloud cluster

- Independent of other responses
- Controlled for variance and uncontrollable error
- Reproducible between model candidates