Hyperparameter sweeps using weights and biases using Pytorch

Introduction to Hyperparameter Sweeps using weights and biases.    
Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy very fast. Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and pick the most accurate model. They enable this by automatically searching through combinations of hyperparameter values (e.g. learning rate, batch size, number of hidden layers, optimizer type) to find the most optimal values.       


How to run hyperparameter sweep with weights and biases :     
1. define sweep
2. Initialize sweep
3. Run the sweep agent

1. Intsall libraries
2. import libraries
3. login to , to get log metrics 

In [1]:
%%capture
!pip install wandb --upgrade

# workaround to fetch MNIST data
!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz 

import libraries.    
login to account to get API

In [2]:
import wandb

wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

Step1: Define the Sweep.     
Fundamentally, a Sweep combines a strategy for trying out a bunch of hyperparameter values with the code that evalutes them. Whether that strategy is as simple as trying every option or as complex as BOHB, Weights & Biases Sweeps have you covered. You just need to define your strategy in the form of a configuration.

When you're setting up a Sweep in a notebook like this, that config object is a nested dictionary. When you run a Sweep via the command line, the config object is a YAML file.

Let's walk through the definition of a Sweep config together. We'll do it slowly, so we get a chance to explain each component. In a typical Sweep pipeline, this step would be done in a single assignment.    


Different search methods:
1. grid search
2. random search
3. bayes lan search

In [3]:
sweep_config = {
    'method': 'random'
    }

metrics maximum is used

In [4]:
metric = {
    'name': 'loss',
    'goal': 'minimize'   
    }

sweep_config['metric'] = metric

Name hyperparameters.   

1. for a method try out new values of hyperprarameters and define what those parameters are.
2. give a parameter a name and specify the legal values of the parameter
3. choose an optimizer and finite number of options - optimizer used are adam,sgd
4. hyperparameters used are layer_size and dropout

In [5]:
parameters_dict = {
    'optimizer': {
        'values': ['adam', 'sgd']
        },
    'fc_layer_size': {
        'values': [128, 256, 512]
        },
    'dropout': {
          'values': [0.3, 0.4, 0.5]
        },
    }

sweep_config['parameters'] = parameters_dict

set sweep config value to one

In [6]:
parameters_dict.update({
    'epochs': {
        'value': 1}
    })

set learning rate and batch size values

In [7]:
import math

parameters_dict.update({
    'learning_rate': {
        # a flat distribution between 0 and 0.1
        'distribution': 'uniform',
        'min': 0,
        'max': 0.1
      },
    'batch_size': {
        # integers between 32 and 256
        # with evenly-distributed logarithms 
        'distribution': 'q_log_uniform',
        'q': 1,
        'min': math.log(32),
        'max': math.log(256),
      }
    })

In [8]:
import pprint

pprint.pprint(sweep_config)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'loss'},
 'parameters': {'batch_size': {'distribution': 'q_log_uniform',
                               'max': 5.545177444479562,
                               'min': 3.4657359027997265,
                               'q': 1},
                'dropout': {'values': [0.3, 0.4, 0.5]},
                'epochs': {'value': 1},
                'fc_layer_size': {'values': [128, 256, 512]},
                'learning_rate': {'distribution': 'uniform',
                                  'max': 0.1,
                                  'min': 0},
                'optimizer': {'values': ['adam', 'sgd']}}}


Step2: Initialize the sweep.   
The clockwork taskmaster in charge of our Sweep is known as the Sweep Controller. As each run completes, it will issue a new set of instructions describing a new run to execute. These instructions are picked up by agents who actually perform the runs.

In a typical Sweep, the Controller lives on our machine, while the agents who complete runs live on your machine(s), like in the diagram below. This division of labor makes it super easy to scale up Sweeps by just adding more machines to run agents!

In [9]:
sweep_id = wandb.sweep(sweep_config, project="pytorch-sweeps-demo")

Create sweep with ID: mqp52ele
Sweep URL: https://wandb.ai/harikanalam/pytorch-sweeps-demo/sweeps/mqp52ele


Step3: Run the sweep agent.   
In the functions below, we define a simple fully-connected neural network in PyTorch, and add the following wandb tools to log model metrics, visualize performance and output and track our experiments:

1. wandb.init() – Initialize a new W&B Run. Each Run is a single execution of the training function.
2. wandb.config – Save all your hyperparameters in a configuration object so they can be logged. Read more about how to use wandb.config here.
3. wandb.log() – log model behavior to W&B. Here, we just log the performance; see this Colab for all the other rich media that can be logged with wandb.log.

In [10]:
import torch
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        loader = build_dataset(config.batch_size)
        network = build_network(config.fc_layer_size, config.dropout)
        optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

        for epoch in range(config.epochs):
            avg_loss = train_epoch(network, loader, optimizer)
            wandb.log({"loss": avg_loss, "epoch": epoch})     

Four pieces of training procedure:   
1. build_dataset
2. build_network
3. build_optimizer
4. train_epoch

All these part of basic pyTorch pipeline and these are unaffected by weights and biases

In [11]:
def build_dataset(batch_size):
   
    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.1307,), (0.3081,))])
    # download MNIST training dataset
    dataset = datasets.MNIST(".", train=True, download=True,
                             transform=transform)
    sub_dataset = torch.utils.data.Subset(
        dataset, indices=range(0, len(dataset), 5))
    loader = torch.utils.data.DataLoader(sub_dataset, batch_size=batch_size)

    return loader


def build_network(fc_layer_size, dropout):
    network = nn.Sequential(  # fully-connected, single hidden layer
        nn.Flatten(),
        nn.Linear(784, fc_layer_size), nn.ReLU(),
        nn.Dropout(dropout),
        nn.Linear(fc_layer_size, 10),
        nn.LogSoftmax(dim=1))

    return network.to(device)
        

def build_optimizer(network, optimizer, learning_rate):
    if optimizer == "sgd":
        optimizer = optim.SGD(network.parameters(),
                              lr=learning_rate, momentum=0.9)
    elif optimizer == "adam":
        optimizer = optim.Adam(network.parameters(),
                               lr=learning_rate)
    return optimizer


def train_epoch(network, loader, optimizer):
    cumu_loss = 0
    for _, (data, target) in enumerate(loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()

        # ➡ Forward pass
        loss = F.nll_loss(network(data), target)
        cumu_loss += loss.item()

        # ⬅ Backward pass + weight update
        loss.backward()
        optimizer.step()

        wandb.log({"batch loss": loss.item()})

    return cumu_loss / len(loader)

Sweep Controllers, like the one we made by running wandb.sweep, sit waiting for someone to ask them for a config to try out.    

That someone is an agent, and they are created with wandb.agent. To get going, the agent just needs to know

1. which Sweep it's a part of (sweep_id)
2. which function it's supposed to run (here, train)
3. how many configs to ask the Controller for (count) (optional)


In [12]:
wandb.agent(sweep_id, train, count=5)

[34m[1mwandb[0m: Agent Starting Run: epoem1um with config:
[34m[1mwandb[0m: 	batch_size: 72
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.06763092711816189
[34m[1mwandb[0m: 	optimizer: sgd
[34m[1mwandb[0m: Currently logged in as: [33mharikanalam[0m (use `wandb login --relogin` to force relogin)





VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
batch loss,█▅▃▃▃▂▂▂▁▂▂▃▂▁▁▂▃▂▂▂▃▄▂▂▂▁▂▂▁▂▁▃▁▁▁▁▁▁▂▁
epoch,▁
loss,▁

0,1
batch loss,0.3766
epoch,0.0
loss,0.55065


[34m[1mwandb[0m: Agent Starting Run: bz52x565 with config:
[34m[1mwandb[0m: 	batch_size: 46
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.05032450702412986
[34m[1mwandb[0m: 	optimizer: sgd





VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
batch loss,█▅▄▄▄▅▁▂▄▄▃▃▂▅▃▂▅▁▂▃▃█▂▄▄▃▃▂▃▄▃▂▁▃▁▃▃▁▂▁
epoch,▁
loss,▁

0,1
batch loss,1.36097
epoch,0.0
loss,0.80923


[34m[1mwandb[0m: Agent Starting Run: 5nampqi8 with config:
[34m[1mwandb[0m: 	batch_size: 97
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 512
[34m[1mwandb[0m: 	learning_rate: 0.019399021655713455
[34m[1mwandb[0m: 	optimizer: adam





VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
batch loss,▂█▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁
loss,▁

0,1
batch loss,0.43632
epoch,0.0
loss,1.17396


[34m[1mwandb[0m: Agent Starting Run: d6wbkiey with config:
[34m[1mwandb[0m: 	batch_size: 78
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 512
[34m[1mwandb[0m: 	learning_rate: 0.02377128697854677
[34m[1mwandb[0m: 	optimizer: adam





VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
batch loss,▅█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁
loss,▁

0,1
batch loss,1.14029
epoch,0.0
loss,1.73011


[34m[1mwandb[0m: Agent Starting Run: lxa5sx2g with config:
[34m[1mwandb[0m: 	batch_size: 241
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.03134856500908412
[34m[1mwandb[0m: 	optimizer: sgd





VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
batch loss,███▇▆▅▅▄▃▃▃▃▂▂▂▂▂▂▂▁▂▃▁▂▁▂▂▂▁▁▁▁▁▂▁▁▁▁▁▁
epoch,▁
loss,▁

0,1
batch loss,0.31536
epoch,0.0
loss,0.74084
