<a href="https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Organizing_Hyperparameter_Sweeps_in_PyTorch_with_W&B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{sweeps-video} -->

In [1]:
import wandb
wandb.login()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mlinfeng-wang[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

For `bayes`ian Sweeps,
you also need to tell us a bit about your `metric`.
We need to know its `name`, so we can find it in the model outputs
and we need to know whether your `goal` is to `minimize` it
(e.g. if it's the squared error)
or to `maximize` it
(e.g. if it's the accuracy).

In [11]:
sweep_config = {
    'method': 'random'
    }

metric = {
    'name': 'loss',
    'goal': 'minimize'
    }

sweep_config['metric'] = metric

parameters_dict = {
    'optimizer': {
        'values': ['adam', 'sgd']
        },
    'fc_layer_size': {
        'values': [128, 256, 512]
        },
    'dropout': {
          'values': [0.3, 0.4, 0.5]
        },
    }

sweep_config['parameters'] = parameters_dict

parameters_dict.update({
    'epochs': {
        'value': 1}
    })

parameters_dict.update({
    'learning_rate': {
        # a flat distribution between 0 and 0.1
        'distribution': 'uniform',
        'min': 0,
        'max': 0.1
      },
    'batch_size': {
        # integers between 32 and 256
        # with evenly-distributed logarithms
        'distribution': 'q_log_uniform_values',
        'q': 8,
        'min': 32,
        'max': 256,
      }
    })

In [4]:
import pprint

pprint.pprint(sweep_config)

{'method': 'random',
 'metric': {'goal': 'minimize', 'name': 'loss'},
 'parameters': {'batch_size': {'distribution': 'q_log_uniform_values',
                               'max': 256,
                               'min': 32,
                               'q': 8},
                'dropout': {'values': [0.3, 0.4, 0.5]},
                'epochs': {'value': 1},
                'fc_layer_size': {'values': [128, 256, 512]},
                'learning_rate': {'distribution': 'uniform',
                                  'max': 0.1,
                                  'min': 0},
                'optimizer': {'values': ['adam', 'sgd']}}}


But that's not all of the configuration options!

For example, we also offer the option to `early_terminate` your runs with the [HyperBand](https://arxiv.org/pdf/1603.06560.pdf) scheduling algorithm. See more [here](https://docs.wandb.com/sweeps/configuration#stopping-criteria).

You can find a list of all configuration options [here](https://docs.wandb.com/library/sweeps/configuration)
and a big collection of examples in YAML format [here](https://github.com/wandb/examples/tree/master/examples/keras/keras-cnn-fashion).



# Step 2️⃣. Initialize the Sweep

Once you've defined the search strategy, it's time to set up something to implement it.

The clockwork taskmaster in charge of our Sweep is known as the _Sweep Controller_.
As each run completes, it will issue a new set of instructions
describing a new run to execute.
These instructions are picked up by _agents_
who actually perform the runs.

In a typical Sweep, the Controller lives on _our_ machine,
while the agents who complete runs live on _your_ machine(s),
like in the diagram below.
This division of labor makes it super easy to scale up Sweeps
by just adding more machines to run agents!

<img src="https://i.imgur.com/zlbw3vQ.png" alt="sweeps-diagram" width="500">

We can wind up a Sweep Controller by calling `wandb.sweep` with the appropriate `sweep_config` and `project` name.

This function returns a `sweep_id` that we will later user to assign agents to this Controller.

> _Side Note_: on the command line, this function is replaced with
```
wandb sweep config.yaml
```
[Learn more about using Sweeps in the command line ➡](https://docs.wandb.com/sweeps/quickstart)

In [5]:
sweep_id = wandb.sweep(sweep_config, project="torch_rif-sweeps")

Create sweep with ID: ovg4v13n
Sweep URL: https://wandb.ai/linfeng-wang/torch_rif-sweeps/sweeps/ovg4v13n


# Step 3️⃣. Run the Sweep agent

### 💻 Define Your Training Procedure

Before we can actually execute the sweep,
we need to define the training procedure that uses those values.

In the functions below, we define a simple fully-connected neural network in PyTorch, and add the following `wandb` tools to log model metrics, visualize performance and output and track our experiments:
* [**`wandb.init()`**](https://docs.wandb.com/library/init) – Initialize a new W&B Run. Each Run is a single execution of the training function.
* [**`wandb.config`**](https://docs.wandb.com/library/config) – Save all your hyperparameters in a configuration object so they can be logged. Read more about how to use `wandb.config` [here](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-config/Configs_in_W%26B.ipynb).
* [**`wandb.log()`**](https://docs.wandb.com/library/log) – log model behavior to W&B. Here, we just log the performance; see [this Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-log/Log_(Almost)_Anything_with_W%26B_Media.ipynb) for all the other rich media that can be logged with `wandb.log`.

For more details on instrumenting W&B with PyTorch, see [this Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/pytorch/Simple_PyTorch_Integration.ipynb).

In [8]:
#%%
from array import array
from cmath import nan
from pyexpat import model
import statistics
from tkinter.ttk import Separator
import numpy as np
import pandas as pd
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchviz import make_dot
from torch.utils.data import Dataset, TensorDataset, DataLoader
from torch.utils.data.dataset import random_split
from torchvision import datasets, transforms
from torch.autograd import variable
from itertools import chain
from sklearn import metrics as met
import pickle
from icecream import ic

import matplotlib.pyplot as plt
import pathlib
from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from importlib import reload
# import util
# import model_torch_simple
# from torchmetrics import Accuracy
from tqdm import tqdm
import argparse
from icecream import ic
import numpy as np
from PIL import Image
device = 'cuda' if torch.cuda.is_available() else 'cpu'
torch.manual_seed(42)

<torch._C.Generator at 0x7f1e94041030>

In [6]:
import torch
import torch.optim as optim
import torch.nn.functional as F
import torch.nn as nn
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def train(config=None):
    # Initialize a new wandb run
    with wandb.init(config=config):
        # If called by wandb.agent, as below,
        # this config will be set by Sweep Controller
        config = wandb.config

        loader = build_dataset(config.batch_size)
        network = build_network(config.fc_layer_size, config.dropout)
        optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

        for epoch in range(config.epochs):
            avg_loss = train_epoch(network, loader, optimizer)
            wandb.log({"loss": avg_loss, "epoch": epoch})

This cell defines the four pieces of our training procedure:
`build_dataset`, `build_network`, `build_optimizer`, and `train_epoch`.

All of these are a standard part of a basic PyTorch pipeline,
and their implementation is unaffected by the use of W&B,
so we won't comment on them.

In [9]:
original_data = pd.read_csv('data_aa/aa_rpoB.csv', header=None)
original_featrues = pd.read_csv('data_aa/RIF_MIC.csv', header=None)
data = original_data

target = original_featrues

train_data_index = np.random.choice(data.shape[0], size=int(data.shape[0]*0.8), replace=False)
all_indices = np.arange(data.shape[0])
test_data_index = np.setdiff1d(all_indices, train_data_index)

train_data = data.iloc[train_data_index,:]
train_target = target.iloc[train_data_index,:]
train_data = train_data.reset_index(drop=True)
train_target = train_target.reset_index(drop=True)
#don't touch test data, split out validation data from training data during training
test_data = data.iloc[test_data_index,:]
test_target = target.iloc[test_data_index,:]
test_data = test_data.reset_index(drop=True)
test_target = test_target.reset_index(drop=True)

class Dataset(torch.utils.data.Dataset): #? what's the difference between using inheritance and not?
    def __init__(
        self,
        train_df,
        mic_df,
        transform = None,
    ):
        self.transform = transform
        self.train_df = train_df
        self.mic_df = mic_df
        if not self.train_df.index.equals(self.mic_df.index):
            raise ValueError(
                "Indices of training data and resistance data don't match up"
            )

    def __getitem__(self, index):
        """
        numerical index --> get `index`-th sample
        string index --> get sample with name `index`
        """
        if isinstance(index, int):
            train = self.train_df.iloc[index]
            mic = self.mic_df.loc[index]
            
        elif isinstance(index, str):
            trains = self.train_df.loc[index]
            mic = self.mic_df.loc[index]
        else:
            raise ValueError(
                "Index needs to be an integer or a sample name present in the dataset"
            )
        
        if self.transform:
            self.mic_mean = self.mic_df.mean()
            self.mic_std = self.mic_df.std()
            mic = (mic - self.mic_mean) / self.mic_std
        
        return  torch.tensor(train),  torch.tensor(mic)
    def __len__(self):
        return self.mic_df.shape[0]
    
training_dataset = Dataset(train_data, train_target, transform=False)
train_dataset, val_dataset = random_split(training_dataset, [int(len(training_dataset)*0.8), len(training_dataset)-int(len(training_dataset)*0.8)])

In [None]:
class Model(nn.Module):
    def __init__(self, in_channel = 869, first_h_layer = 469, fc_layer_size = 100, out_channel=1, batch_size=1, dropout_rate=0.2, num_dense_layers=3, filter_scaling_factor=1.5):
        super(Model, self).__init__()
        self.batch_size = batch_size
        self.in_channel = in_channel
        self.first_h_layer = first_h_layer
        self.out_channel = out_channel
        self.dense_dropout_rate = dropout_rate
        self.num_dense_layers = num_dense_layers
        self.filter_scaling_factor=filter_scaling_factor
        self.fc_layer_size = fc_layer_size
        
        self.dense_layers = nn.ModuleList()
        for i in range(self.num_dense_layers):
            layer = self._dense_layer(100,100)
            self.dense_layers.append(layer)
            # current_num_filters = int(current_num_filters * filter_scaling_factor)

        # self.feature_extraction = nn.Conv1d(in_channels, hidden, kernel_size=kernel_size),]
        self.starting_layers = nn.Sequential(
            nn.Linear(self.in_channel, self.first_h_layer),
            nn.BatchNorm1d(self.first_h_layer),
            nn.ReLU(),
            nn.Dropout(self.dense_dropout_rate),  # Dropout layer after the first ReLU
            nn.Linear(self.first_h_layer, 100),
            nn.BatchNorm1d(100),
            nn.ReLU(),
            nn.Dropout(self.dense_dropout_rate))  # Dropout layer after the first ReLU

        self.out_layer = nn.Linear(100, self.out_channel)

    def _dense_layer(self, n_in, n_out):
        return nn.Sequential(
            nn.Linear(n_in, n_out),
            nn.BatchNorm1d(n_out),
            nn.ReLU(),
            nn.Dropout(p=self.dense_dropout_rate)
        )

        
    def init_weights(self, m):
        if isinstance(m, nn.Linear):
            nn.init.kaiming_normal_(m.weight, mode='fan_in', nonlinearity='relu')
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)
                
    def forward(self, x):
        x = self.starting_layers(x)
        for layer in self.dense_layers:
            x = layer(x)
        out = self.out_layer(x)
        return out

torch.cuda.empty_cache()

epoch = 50
batch_size = 32
lr = 0.001

model = Model(in_channel = 869, 
              first_h_layer = 469, 
              out_channel=1, 
              num_dense_layers=4,
              batch_size=batch_size)

model = model.float()
model = model.to(device)

train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
test_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, drop_last=True)
# criterion = nn.MSELoss()
criterion = masked_MSE
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, patience=2, verbose=True)

In [12]:
def get_masked_loss(loss_fn):
    """
    Returns a loss function that ignores NaN values
    """

    def masked_loss(y_true, y_pred):
        y_pred = y_pred.view(-1, 1)  # Ensure y_pred has the same shape as y_true and non_nan_mask
        # ic(y_true)
        non_nan_mask = ~y_true.isnan()
        # ic(non_nan_mask)
        y_true_non_nan = y_true[non_nan_mask]
        y_pred_non_nan = y_pred[non_nan_mask]

        return loss_fn(y_pred_non_nan, y_true_non_nan)

    return masked_loss

masked_MSE = get_masked_loss(torch.nn.MSELoss())

def build_dataset(batch_size):
    loader =  DataLoader(dataset=val_dataset, batch_size=batch_size, drop_last=True)
    return loader


def build_network(fc_layer_size, dropout):    
    network = Model(in_channel = 869, 
                first_h_layer = 600,
                fc_layer_size=fc_layer_size,
                out_channel=1, 
                num_dense_layers=4,
                batch_size=batch_size,
                dropout_rate=dropout)

    return network.to(device)


def build_optimizer(network, optimizer, learning_rate):
    if optimizer == "sgd":
        optimizer = optim.SGD(network.parameters(),
                              lr=learning_rate, momentum=0.9)
    elif optimizer == "adam":
        optimizer = optim.Adam(network.parameters(),
                               lr=learning_rate)
    return optimizer


def train_epoch(network, loader, optimizer):
    cumu_loss = 0
    for _, (data, target) in enumerate(loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()

        # ➡ Forward pass
        loss = masked_MSE(network(data), target)
        cumu_loss += loss.item()

        # ⬅ Backward pass + weight update
        loss.backward()
        optimizer.step()

        wandb.log({"batch loss": loss.item()})

    return cumu_loss / len(loader)

Now, we're ready to start sweeping! 🧹🧹🧹

Sweep Controllers, like the one we made by running `wandb.sweep`,
sit waiting for someone to ask them for a `config` to try out.

That someone is an `agent`, and they are created with `wandb.agent`.
To get going, the agent just needs to know
1. which Sweep it's a part of (`sweep_id`)
2. which function it's supposed to run (here, `train`)
3. (optionally) how many configs to ask the Controller for (`count`)

FYI, you can start multiple `agent`s with the same `sweep_id`
on different compute resources,
and the Controller will ensure that they work together
according to the strategy laid out in the `sweep_config`.
This makes it trivially easy to scale your Sweeps across as many nodes as you can get ahold of!

> _Side Note:_ on the command line, this function is replaced with
```
wandb agent sweep_id
```
[Learn more about using Sweeps in the command line ➡](https://docs.wandb.com/sweeps/quickstart)

The cell below will launch an `agent` that runs `train` 5 times,
usingly the randomly-generated hyperparameter values returned by the Sweep Controller. Execution takes under 5 minutes.

In [13]:
wandb.agent(sweep_id, train, count=10)

[34m[1mwandb[0m: Agent Starting Run: 4lzib78t with config:
[34m[1mwandb[0m: 	batch_size: 56
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 256
[34m[1mwandb[0m: 	learning_rate: 0.08597444812291988
[34m[1mwandb[0m: 	optimizer: sgd
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Traceback (most recent call last):
  File "/tmp/ipykernel_97903/693470131.py", line 17, in train
    network = build_network(config.fc_layer_size, config.dropout)
  File "/tmp/ipykernel_97903/3754763044.py", line 26, in build_network
    network = Model(in_channel = 869,
NameError: name 'Model' is not defined


VBox(children=(Label(value='0.010 MB of 0.010 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

Run 4lzib78t errored: NameError("name 'Model' is not defined")
[34m[1mwandb[0m: [32m[41mERROR[0m Run 4lzib78t errored: NameError("name 'Model' is not defined")
[34m[1mwandb[0m: Agent Starting Run: nuafb2j3 with config:
[34m[1mwandb[0m: 	batch_size: 240
[34m[1mwandb[0m: 	dropout: 0.5
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 128
[34m[1mwandb[0m: 	learning_rate: 0.0681133495749379
[34m[1mwandb[0m: 	optimizer: sgd
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Traceback (most recent call last):
  File "/tmp/ipykernel_97903/693470131.py", line 17, in train
    network = build_network(config.fc_layer_size, config.dropout)
  File "/tmp/ipykernel_97903/3754763044.py", line 26, in build_network
    network = Model(in_channel = 869,
NameError: name 'Model' is not defined


VBox(children=(Label(value='0.010 MB of 0.010 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

Run nuafb2j3 errored: NameError("name 'Model' is not defined")
[34m[1mwandb[0m: [32m[41mERROR[0m Run nuafb2j3 errored: NameError("name 'Model' is not defined")
[34m[1mwandb[0m: Agent Starting Run: unqpvb2a with config:
[34m[1mwandb[0m: 	batch_size: 96
[34m[1mwandb[0m: 	dropout: 0.3
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	fc_layer_size: 512
[34m[1mwandb[0m: 	learning_rate: 0.03469990162973129
[34m[1mwandb[0m: 	optimizer: sgd
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Traceback (most recent call last):
  File "/tmp/ipykernel_97903/693470131.py", line 17, in train
    network = build_network(config.fc_layer_size, config.dropout)
  File "/tmp/ipykernel_97903/3754763044.py", line 26, in build_network
    network = Model(in_channel = 869,
NameError: name 'Model' is not defined


VBox(children=(Label(value='0.010 MB of 0.010 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

Run unqpvb2a errored: NameError("name 'Model' is not defined")
[34m[1mwandb[0m: [32m[41mERROR[0m Run unqpvb2a errored: NameError("name 'Model' is not defined")
Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: [32m[41mERROR[0m Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: To disable this check set WANDB_AGENT_DISABLE_FLAPPING=true


# 👀 Visualize Sweep Results



## 🔀 Parallel Coordinates Plot
This plot maps hyperparameter values to model metrics. It’s useful for honing in on combinations of hyperparameters that led to the best model performance.

![](https://assets.website-files.com/5ac6b7f2924c652fd013a891/5e190366778ad831455f9af2_s_194708415DEC35F74A7691FF6810D3B14703D1EFE1672ED29000BA98171242A5_1578695138341_image.png)


## 📊 Hyperparameter Importance Plot
The hyperparameter importance plot surfaces which hyperparameters were the best predictors of your metrics.
We report feature importance (from a random forest model) and correlation (implicitly a linear model).

![](https://assets.website-files.com/5ac6b7f2924c652fd013a891/5e190367778ad820b35f9af5_s_194708415DEC35F74A7691FF6810D3B14703D1EFE1672ED29000BA98171242A5_1578695757573_image.png)

These visualizations can help you save both time and resources running expensive hyperparameter optimizations by honing in on the parameters (and value ranges) that are the most important, and thereby worthy of further exploration.


# 🧤 Get your hands dirty with sweeps

We created a simple training script and [a few flavors of sweep configs](https://github.com/wandb/examples/tree/master/examples/keras/keras-cnn-fashion) for you to play with. We highly encourage you to give these a try.

That repo also has examples to help you try more advanced sweep features like [Bayesian Hyperband](https://app.wandb.ai/wandb/examples-keras-cnn-fashion/sweeps/us0ifmrf?workspace=user-lavanyashukla), and [Hyperopt](https://app.wandb.ai/wandb/examples-keras-cnn-fashion/sweeps/xbs2wm5e?workspace=user-lavanyashukla).