# Weights and Bias Integration

Following the official tutorial of `wandb`, this notebook integrates `wandb` with a multilayer preceptron model trained on Fashion-MNIST dataset.

In [12]:
import os
import random

import numpy as np
import matplotlib.pyplot as plt

import torch
from torch import nn
from torch.utils.data import DataLoader, Subset
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda, Compose

# from tqdm.notebook import tqdm # progress bar


# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
random.seed(hash("setting random seeds") % 2**32 - 1)
np.random.seed(hash("improves reproducibility") % 2**32 - 1)
torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1)
torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1)

# Device Configuration
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cuda device


## Preliminaries
We must login to wandb account, in order to record our training on the platform's dashborad. Make sure to install wandb library in our virtual conda environment. 

In [13]:
import wandb
wandb.login()

True

## Define Data Loading and Model

In [14]:
# Get training and testing data with config (slice)
def get_data(slice=5, train=True):
    
    full_dataset = datasets.FashionMNIST(
        root="data",
        train=train,
        download=True,
        transform=ToTensor(),
    )
    # equiv to slicing iwth [::slice]
    sub_dataset = Subset(full_dataset, indices=range(0, len(full_dataset), slice))
    
    return sub_dataset

# Make the dataloader with config (dataset, batch_size
def make_loader(dataset, batch_size):
    
    loader = DataLoader(dataset=dataset, batch_size=batch_size)
    return loader

In [15]:
# Define model: a simple multilayer preceptron
class MLP(nn.Module):
    def __init__(self, classes=10):
        super(MLP, self).__init__()
        
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, classes)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

## Define Training Logic

`wandb.watch` will log the gradients and the parameters of your model, every `log_freq` steps of training.

In [16]:
def train(model, loader, criterion, optimizer, config):
    
    # Tell wandb to watch what the model gets up to: gradients, weights, and more!
    wandb.watch(model, criterion, log="all", log_freq=10)
    
    # Run training and track with wandb
    total_batches = len(loader) * config.epochs
    example_ct = 0  # number of examples seen
    batch_ct = 0
    for epoch in range(config.epochs):
        for _, (images, labels) in enumerate(loader):

            loss = train_batch(images, labels, model, optimizer, criterion)
            example_ct +=  len(images)
            batch_ct += 1

            # Report metrics every 25th batch
            if ((batch_ct + 1) % 25) == 0:
                train_log(loss, example_ct, epoch)

def train_batch(images, labels, model, optimizer, criterion):
    images, labels = images.to(device), labels.to(device)
    
    # Forward pass 
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # Backward pass 
    optimizer.zero_grad()
    loss.backward()

    # Step with optimizer
    optimizer.step()

    return loss

`wandb.log` records the reported metrics to their server. 

In [17]:
def train_log(loss, example_ct, epoch):
    # Where the magic happens
    wandb.log({"epoch": epoch, "loss": loss}, step=example_ct)
    print(f"Loss after " + str(example_ct).zfill(5) + f" examples: {loss:.3f}")

## Define Testing Logic

Once the model is done training, we want to test it: run it against some fresh data from production.

We can save the model's architecture and final parameters to disk. We'll `export` our model in the
[Open Neural Network eXchange (ONNX) format](https://onnx.ai/).

Passing that filename to `wandb.save` ensures that the model parameters are saved to W&B's servers: no more losing track of which `.h5` or `.pb` corresponds to which training runs!

In [18]:
def test(model, test_loader):
    
    model.eval()

    # Run the model on some test examples
    with torch.no_grad():
        correct, total = 0, 0
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print(f"Accuracy of the model on the {total} " +
              f"test images: {100 * correct / total}%")
        
        wandb.log({"test_accuracy": correct / total})

    # Save the model in the exchangeable ONNX format
    torch.onnx.export(model, images, "model.onnx")
    wandb.save("model.onnx")

## Define the experiment and pipeline

### Config

Hyperparameters and metadata for our model is stored in a dictionary `config`.

In [19]:
config = dict(
    epochs=5,
    classes=10,
    batch_size=64,
    learning_rate=0.001,
    dataset="Fashion-MNIST",
    architecture="MLP"
)

### Make

To ensure the values we chose and logged are always the ones that get used
in our model, we use the `wandb.config` copy of your object.

In [20]:
def make(config):
    # Make the data
    train, test = get_data(train=True), get_data(train=False)
    train_loader = make_loader(train, batch_size=config.batch_size)
    test_loader = make_loader(test, batch_size=config.batch_size)

    # Make the model
    model = MLP(config.classes).to(device)

    # Make the loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(
        model.parameters(), lr=config.learning_rate)
    
    return model, train_loader, test_loader, criterion, optimizer

### Pipeline

The overall pipeline is structured as the following:
1. we first `make` a model, plus associated data and optimizer, then
2. we `train` the model accordingly and finally
3. `test` it to see how training went.

In [21]:
def model_pipeline(hyperparameters):

    # tell wandb to get started
    with wandb.init(project="pytorch-demo", config=hyperparameters):
      
        # access all HPs through wandb.config, so logging matches execution!
        config = wandb.config

        # make the model, data, and optimization problem
        model, train_loader, test_loader, criterion, optimizer = make(config)
        print(model)

        # and use them to train the model
        train(model, train_loader, criterion, optimizer, config)

        # and test its final performance
        test(model, test_loader)

    return model

In [22]:
# Build, train and analyze the model with the pipeline
model = model_pipeline(config)

MLP(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Loss after 01536 examples: 2.301
Loss after 03136 examples: 2.308
Loss after 04736 examples: 2.292
Loss after 06336 examples: 2.293
Loss after 07936 examples: 2.278
Loss after 09536 examples: 2.271
Loss after 11136 examples: 2.268
Loss after 12704 examples: 2.267
Loss after 14304 examples: 2.264
Loss after 15904 examples: 2.259
Loss after 17504 examples: 2.267
Loss after 19104 examples: 2.258
Loss after 20704 examples: 2.259
Loss after 22304 examples: 2.241
Loss after 23904 examples: 2.246
Loss after 25472 examples: 2.243
Loss after 27072 examples: 2.237
Loss after 28672 examples: 2.228
Loss after 30272 examples: 2.232
Loss after 31872 examples: 2.222
Loss after 33472 examples: 2.21

VBox(children=(Label(value=' 2.56MB of 2.56MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,▁▁▁▁▁▁▁▃▃▃▃▃▃▃▃▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆███████
loss,██▇▇▇▆▆▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▃▄▃▃▄▄▃▂▂▂▃▁▂▂
test_accuracy,▁

0,1
epoch,4.0
loss,2.15772
test_accuracy,0.481
