<a href="https://colab.research.google.com/github/kiritowu/Great-Lunar-Lander/blob/main/Setting_Up_RL_Hyperparameter_Sweeps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Tuning of RL Models

The purpose of this notebook is to set up hyperparameter tuning using Weights and Biases. Using W&B lets us coordinate a hyperparameter search across several machines, speeding up the learning process.


## Setup


In [None]:
%%capture
!pip install wandb --upgrade # Don't run if not on Google Colab unless you don't have W&B installed.

In [None]:
import wandb

wandb.login()


True

## Define Sweep


### Selecting a Search Method

There are three methods available:

- Grid Search
- Random Search
- Bayesian Search


In [None]:
sweep_config = {
    "method": "random",
    "metric" : {
        "name" : "Avg-Reward-100e",
        "goal" : "maximize",
        "target" : 300
        }
    }


### Selecting Hyperparameters


In [None]:
parameters_ddqn = {
    "lr" : {
        "min" : 0.0001,
        "max" : 0.01,
        "distribution" : "uniform"
    },
    "gamma" : {
        "value" : 0.99
    },
    "epsilon" : {
        "value" : 1.0
    },
    "epsilon_decay" : {
        "min" : 0.95,
        "max" : 0.995,
        "distribution" : "uniform"
    },
    "update_target_net_interval" : {
        "values" : [1, 5, 10, 20, 30, 50, 100]
    },
    "episodes" : {
        "value" : 500
    }
}

parameters_sarsa = {
    "lr" : {
        "min" : 0.0001,
        "max" : 0.01,
        "distribution" : "uniform"
    },
    "gamma" : {
        "value" : 0.99
    },
    "epsilon" : {
        "value" : 1.0
    },
    "epsilon_decay" : {
        "min" : 0.95,
        "max" : 0.995,
        "distribution" : "uniform"
    },
    "episodes" : {
        "value" : 500
    }
}


In [None]:
sweep_config["parameters"] = parameters_sarsa

# Initialize Sweep


In [None]:
project_name = "SARSA-Tuning"

In [None]:
sweep_id = wandb.sweep(sweep_config, project=project_name, entity="onsen")


Create sweep with ID: f1zo1g3l
Sweep URL: https://wandb.ai/onsen/SARSA-Tuning/sweeps/f1zo1g3l


# Next Steps

Now that the sweep has been initialized, we need to do the following:

1. Define a train function for the model, that does the following:

- Accepts a single argument `config`
- Initializes a new run, `wandb.init(config=config)`
- Builds the model with the selected hyperparameters
- Trains the model and logs the performance of the model.

An example is shown below:

```python
import torch
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets, transforms
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        loader = build_dataset(config.batch_size)
        network = build_network(config.fc_layer_size, config.dropout)
        optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

        for epoch in range(config.epochs):
            avg_loss = train_epoch(network, loader, optimizer)
            wandb.log({"loss": avg_loss, "epoch": epoch})
```

2. On each machine you want to train, run the following:

```python
wandb.agent(sweep_id, train_function, count = num_runs)
```

`count` is the number of runs to perform. If not specified, search will perform forever unless grid search is used.
