# Exploring Tooling with Weights and Biases
Similar to tensorboard, weights and biases is an application that tracks all your training metrics, and performs visualizations for you. This tool allows you to cleanly sort, organize, and visualize your experiments. In this notebook, we will go through an example of how to use wandb.ai and have you practice.

1. Make an account at https://wandb.ai/site

2. pip install wandb

3. wandb login

4. After step 3, please paste your wandb API key


In [None]:
!git clone https://github.com/hingma/cs182fa25_public.git

In [3]:
%cd /hw06/code/

[Errno 2] No such file or directory: '/hw06/code/'
/Users/mox/Study/UCB/CS/CS182/cs182fa25_public/hw06/code


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import wandb
from architectures import BasicConvNet, ResNet18, MLP

## Organizing wandb Projects

With each run, you will want to have a set of parameters associated with it. For example, I want to be able to log different hyperparameters that I am using, so let's clearly list them below

In [None]:
project = 'CS182 WANDB.AI Practice Notebook'
learning_rate = 0.01
epochs = 2
architecture ='CNN'
dataset = 'CIFAR-10'
batch_size = 64
momentum = 0.9
log_freq = 20
print_freq = 200
cuda = torch.cuda.is_available()
device = torch.device("cuda" if cuda else "cpu")

### Initializing the Run

In [None]:
wandb.init(
    # set the wandb project where this run will be logged
    project=project,
    
    # track hyperparameters and run metadata
    config={
    "learning_rate": learning_rate,
    "architecture": architecture,
    "dataset": dataset,
    "epochs": epochs,
    "batch_size": batch_size,
    "momentum": momentum
    }
)

From here on, we have some standard CIFAR training definitions.

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [None]:
net = Net()

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=momentum)

### Training with wandb

As you can see, similar to tensorboard, each gradient step we will want to log the accuracy and loss. See below for an example.

In [None]:
for epoch in range(epochs):  # loop over the dataset multiple times
    running_loss = 0.0
    running_acc = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        accuracy = torch.mean((torch.argmax(outputs, dim=1) == labels).float()).item() * 100

        # print statistics
        running_acc += accuracy
        running_loss += loss.item()
        if i % log_freq == log_freq - 1:
            wandb.log({'accuracy': accuracy, 'loss': loss.item()})
            
        if i % print_freq == print_freq - 1:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / print_freq:.5f} accuracy: {running_acc/print_freq:.5f}')
            running_loss = 0.0
            running_acc = 0.0
        
            
            
        
        
            


After we are done with this run, we will want to call 
 `wandb.finish()`

In [None]:
wandb.finish()

## Your Task
We will be once again building classifiers for the CIFAR-10. There are various architectures set up for you to use in the architectures.py file. Using wandb, please search through 10 different hyperparameter configurations. Examples of choices include: learning rate, batch size, architecture, optimization algorithm, etc. Please submit the hyperparameters that result in the highest accuracies for this classification task. Please then explore wandb for all the visualization that you may need. In addition, feel free to run as many epochs as you like.

In [None]:
def run(params):
    # Extract parameters
    learning_rate = params.get('learning_rate', 0.01)
    architecture = params.get('architecture', 'BasicConvNet')
    batch_size = params.get('batch_size', 64)
    epochs = params.get('epochs', 5)
    momentum = params.get('momentum', 0.9)
    optimizer_name = params.get('optimizer', 'sgd')
    weight_decay = params.get('weight_decay', 0)
    log_freq = 20
    print_freq = 200
    resize_for_resnet = architecture == 'ResNet18'  # ResNet18 requires 224x224 images
    
    # Set device
    cuda = torch.cuda.is_available()
    device = torch.device("cuda" if cuda else "cpu")
    
    # Initialize wandb
    run = wandb.init(
        project='CS182 CIFAR-10 Hyperparameter Search',
        config=params,
        reinit=True  # Allow multiple runs in the same process
    )
    
    # Create transformations for data
    transform_list = [transforms.ToTensor()]
    if resize_for_resnet:
        transform_list.insert(0, transforms.Resize((224, 224)))
    transform_list.append(transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)))
    transform = transforms.Compose(transform_list)
    
    # Load datasets
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                              shuffle=True, num_workers=2)

    testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                           download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                             shuffle=False, num_workers=2)
    
    # Create the model based on architecture parameter
    if architecture == 'BasicConvNet':
        net = BasicConvNet()
    elif architecture == 'ResNet18':
        net = ResNet18()
    elif architecture == 'MLP':
        hidden_layers = params.get('hidden_layers', 7)
        hidden_size = params.get('hidden_size', 2048)
        net = MLP(num_layers=hidden_layers, size=hidden_size)
    else:
        raise ValueError(f"Unknown architecture: {architecture}")
    
    # Move model to device
    net = net.to(device)
    
    # Define loss function
    criterion = nn.CrossEntropyLoss()
    
    # Choose optimizer based on parameter
    if optimizer_name.lower() == 'sgd':
        optimizer = optim.SGD(net.parameters(), lr=learning_rate, 
                             momentum=momentum, weight_decay=weight_decay)
    elif optimizer_name.lower() == 'adam':
        optimizer = optim.Adam(net.parameters(), lr=learning_rate, 
                             weight_decay=weight_decay)
    elif optimizer_name.lower() == 'rmsprop':
        optimizer = optim.RMSprop(net.parameters(), lr=learning_rate, 
                                weight_decay=weight_decay)
    else:
        raise ValueError(f"Unknown optimizer: {optimizer_name}")
    
    # Optional: learning rate scheduler
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, 'min', patience=1, factor=0.5) if params.get('use_scheduler', False) else None
    
    # Training loop
    best_accuracy = 0.0
    for epoch in range(epochs):
        # Training phase
        net.train()
        running_loss = 0.0
        running_acc = 0.0
        for i, data in enumerate(trainloader):
            # Get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            
            # For MLP, we need to flatten the inputs
            if architecture == 'MLP':
                inputs = inputs.view(inputs.size(0), -1)
            
            # Zero the parameter gradients
            optimizer.zero_grad()

            # Forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            accuracy = torch.mean((torch.argmax(outputs, dim=1) == labels).float()).item() * 100

            # Log statistics
            running_acc += accuracy
            running_loss += loss.item()
            if i % log_freq == log_freq - 1:
                wandb.log({'train_accuracy': accuracy, 'train_loss': loss.item()})
                
            if i % print_freq == print_freq - 1:
                print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / print_freq:.5f} accuracy: {running_acc/print_freq:.5f}')
                running_loss = 0.0
                running_acc = 0.0
        
        # Evaluation phase
        net.eval()
        test_loss = 0.0
        correct = 0
        total = 0
        with torch.no_grad():
            for data in testloader:
                images, labels = data
                images, labels = images.to(device), labels.to(device)
                
                # For MLP, we need to flatten the inputs
                if architecture == 'MLP':
                    images = images.view(images.size(0), -1)
                
                outputs = net(images)
                loss = criterion(outputs, labels)
                test_loss += loss.item()
                
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
        
        epoch_test_accuracy = 100 * correct / total
        epoch_test_loss = test_loss / len(testloader)
        
        print(f'Epoch {epoch+1} Test Accuracy: {epoch_test_accuracy:.2f}%')
        
        wandb.log({
            'epoch': epoch + 1,
            'test_accuracy': epoch_test_accuracy,
            'test_loss': epoch_test_loss
        })
        
        # Update best accuracy
        if epoch_test_accuracy > best_accuracy:
            best_accuracy = epoch_test_accuracy
        
        # Update learning rate if using scheduler
        if scheduler is not None:
            scheduler.step(epoch_test_loss)
    
    # Log best accuracy as summary metric
    wandb.run.summary['best_accuracy'] = best_accuracy
    
    # Finish the wandb run
    wandb.finish()
    
    return best_accuracy

In [None]:
# Define hyperparameter configurations to try
hyperparameter_configs = [
    {'architecture': 'BasicConvNet', 'learning_rate': 0.01, 'batch_size': 64, 'epochs': 10, 'optimizer': 'sgd', 'momentum': 0.9},
    {'architecture': 'BasicConvNet', 'learning_rate': 0.001, 'batch_size': 128, 'epochs': 10, 'optimizer': 'adam'},
    {'architecture': 'ResNet18', 'learning_rate': 0.01, 'batch_size': 64, 'epochs': 10, 'optimizer': 'sgd', 'momentum': 0.9},
    {'architecture': 'ResNet18', 'learning_rate': 0.001, 'batch_size': 64, 'epochs': 10, 'optimizer': 'adam'},
    {'architecture': 'MLP', 'learning_rate': 0.01, 'batch_size': 64, 'epochs': 10, 'optimizer': 'sgd', 'momentum': 0.9, 'hidden_layers': 5, 'hidden_size': 1024},
    {'architecture': 'MLP', 'learning_rate': 0.001, 'batch_size': 128, 'epochs': 10, 'optimizer': 'adam', 'hidden_layers': 3, 'hidden_size': 2048},
    {'architecture': 'BasicConvNet', 'learning_rate': 0.005, 'batch_size': 32, 'epochs': 10, 'optimizer': 'rmsprop'},
    {'architecture': 'ResNet18', 'learning_rate': 0.0005, 'batch_size': 32, 'epochs': 10, 'optimizer': 'adam', 'weight_decay': 1e-4},
    {'architecture': 'BasicConvNet', 'learning_rate': 0.01, 'batch_size': 64, 'epochs': 10, 'optimizer': 'sgd', 'momentum': 0.9, 'weight_decay': 1e-4, 'use_scheduler': True},
    {'architecture': 'ResNet18', 'learning_rate': 0.01, 'batch_size': 128, 'epochs': 10, 'optimizer': 'sgd', 'momentum': 0.95, 'weight_decay': 5e-4, 'use_scheduler': True}
]


In [None]:
# Run each configuration and track results
results = []
for i, config in enumerate(hyperparameter_configs):
    print(f"Running configuration {i+1}/10: {config}")
    accuracy = run(config)
    results.append((config, accuracy))
    print(f"Configuration {i+1} finished with best accuracy: {accuracy:.2f}%")
    print("-" * 50)

# Sort and display results
results.sort(key=lambda x: x[1], reverse=True)
print("\nBest hyperparameter configurations:")
for i, (config, accuracy) in enumerate(results[:3]):
    print(f"{i+1}. Accuracy: {accuracy:.2f}%, Config: {config}")


This software/tutorial is based on PyTorch, an open-source project available at https://github.com/pytorch/tutorials/

There is a BSD 3-Clause License as seen here: https://github.com/pytorch/tutorials/blob/main/LICENSE