<a href="https://colab.research.google.com/github/kevnantony/miscellaneous/blob/main/Debug_this_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Debug this Colab!

## Section 1

This colab represents a simple ML pipline, loading data, defining a model and fitting the model to the data. It has also been instrumented with Weights and Biases logging tools.

At Weights and Biases, we often help our users debug their pipelines -- both the ML code and the logging code from `wandb` integrated into it.

Your task is to debug this simple pipeline such that the model is able to learn and <u>perform reasonably well</u> on the given task, without changing the general structure of the model. As you do so, use comments and markdown cells to explain a bit about your process.


In [None]:
!pip install wandb



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader

import torchvision
from torchvision import transforms

import wandb

# Data Preprocessing

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # CIFAR10 has 3 channels
])

batch_size = 32

# Use the defined transform instead of just ToTensor()
cifar10 = torchvision.datasets.CIFAR10(root='./data', download=True, transform=transform)
pivot = 40000
# No need to sort by label - it disrupts the data distribution
train_set = torch.utils.data.Subset(cifar10, range(pivot))
val_set = torch.utils.data.Subset(cifar10, range(pivot, len(cifar10)))
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=False)  # No need to shuffle validation data

Files already downloaded and verified


In [None]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)  # CIFAR10 has 3 input channels
        self.pool = nn.MaxPool2d(2, 2)  # Fixed: MaxPool2d, not MaxPooling2D
        self.conv2 = nn.Conv2d(6, 16, 5)
        # Fix the calculation for the linear layer input size
        # After two conv+pool operations on 32x32 image:
        # 32x32 -> conv1 -> 28x28 -> pool -> 14x14 -> conv2 -> 10x10 -> pool -> 5x5
        # So with 16 filters, the flattened size is 16*5*5 = 400
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)  # Add an intermediate layer
        self.fc3 = nn.Linear(84, 10)  # CIFAR10 has 10 classes

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)  # Fixed: torch.flatten instead of Flatten
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Network()

In [None]:
# Set reasonable optimization parameters
criterion = nn.CrossEntropyLoss()
# Lower the learning rate significantly
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Training and Validation

In this part, you will also need to additionally calculate training and validation accuracy and log it to Weights and Biases.

In [None]:
with wandb.init(project='Tier-1-Test', save_code=True) as run:
    # Log hyperparameters
    wandb.config.update({
        "learning_rate": 0.01,
        "epochs": 5,
        "batch_size": batch_size
    })

    for epoch in range(5):
        # Training phase
        model.train()
        train_loss = 0
        correct_train = 0
        total_train = 0

        for i, data in enumerate(train_loader):
            images, labels = data

            # Zero the parameter gradients (missing in original)
            optimizer.zero_grad()

            outputs = model(images)
            loss = criterion(outputs, labels)

            loss.backward()
            optimizer.step()

            train_loss += loss.item()

            # Calculate accuracy
            _, predicted = torch.max(outputs.data, 1)
            total_train += labels.size(0)
            correct_train += (predicted == labels).sum().item()

        # Log training metrics
        train_accuracy = 100 * correct_train / total_train
        run.log({
            'epoch': epoch,
            'train_loss': train_loss / len(train_loader),
            'train_accuracy': train_accuracy
        })

        # Validation phase
        model.eval()
        val_loss = 0
        correct_val = 0
        total_val = 0

        with torch.no_grad():  # No gradient calculation during validation
            for i, data in enumerate(val_loader):
                images, labels = data
                outputs = model(images)

                loss = criterion(outputs, labels)
                val_loss += loss.item()

                # Calculate accuracy
                _, predicted = torch.max(outputs.data, 1)
                total_val += labels.size(0)
                correct_val += (predicted == labels).sum().item()

        # Log validation metrics
        val_accuracy = 100 * correct_val / total_val
        run.log({
            'val_loss': val_loss / len(val_loader),
            'val_accuracy': val_accuracy
        })

        print(f'Epoch {epoch+1}: '
              f'Train Loss: {train_loss/len(train_loader):.3f}, '
              f'Train Acc: {train_accuracy:.1f}%, '
              f'Val Loss: {val_loss/len(val_loader):.3f}, '
              f'Val Acc: {val_accuracy:.1f}%')

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mkevinantony[0m ([33mkevinantony-nit-jalandhar[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch 1: Train Loss: 1.784, Train Acc: 33.7%, Val Loss: 1.487, Val Acc: 46.0%
Epoch 2: Train Loss: 1.394, Train Acc: 49.9%, Val Loss: 1.329, Val Acc: 52.7%
Epoch 3: Train Loss: 1.244, Train Acc: 55.9%, Val Loss: 1.266, Val Acc: 54.9%
Epoch 4: Train Loss: 1.142, Train Acc: 59.6%, Val Loss: 1.216, Val Acc: 57.3%
Epoch 5: Train Loss: 1.069, Train Acc: 62.4%, Val Loss: 1.184, Val Acc: 58.6%


0,1
epoch,▁▃▅▆█
train_accuracy,▁▅▆▇█
train_loss,█▄▃▂▁
val_accuracy,▁▅▆▇█
val_loss,█▄▃▂▁

0,1
epoch,4.0
train_accuracy,62.355
train_loss,1.06907
val_accuracy,58.65
val_loss,1.18392


Looking at the code, I identified several issues that need to be fixed:

1.   In the data preprocessing, you're using a different transform for the CIFAR10 dataset than what you defined.
2.   The `Network` class has errors in the architecture
3.   The optimization parameters need adjustment
4.   The training loop is missing some critical steps
5.   Accuracy calculation and logging are missing






**My Approach to Debugging the ML Pipeline:**

I identified and fixed several critical issues in the ML pipeline:

1.  Fixed the transform to properly normalize CIFAR10 (which has 3 channels, not 1)
2.  Corrected the model architecture, particularly fixing `MaxPool2d`, properly calculating the flattened dimension size, and using `torch.flatten`
3.  Reduced the learning rate from 1e3 (1000) to 0.01, which was causing instability
4.  Added `optimizer.zero_grad()` in the training loop to prevent accumulating gradients
5.  Implemented proper accuracy calculation and logging for both training and validation sets
6.  Added proper logging of hyperparameters and metrics to Weights & Biases



## Section 2

For this section, your task is to write code which can solve the following problem. We have provided a few unit tests to aid you with your task, but these are not comprehensive and you *should* write a few tests of your own to make sure your task runs perfectly.


```
Write a program to calculate the nth root of a given number x. You will be given the variables x and n. Your root should be close to the true value to atleast 2 decimal points of precision.

Constraints:
- You are NOT allowed to use exp(), pow(), the exponentiation operator (**) or any pre-built exponentiation methods.
- x is guaranteed to be a positive real number.
- n is guaranteed to be a positive integer.
```

In [None]:
def root(x: float, n: int):
    """
    Calculate the nth root of x using Newton's method.

    Args:
        x: positive real number
        n: positive integer

    Returns:
        The nth root of x with at least 2 decimal places of precision
    """
    # Handle edge cases
    if x == 0:
        return 0
    if x == 1 or n == 1:
        return x

    # Initial guess - a reasonable starting point helps convergence
    guess = x / n

    # Precision threshold
    epsilon = 1e-10

    # Newton's method implementation
    while True:
        # Formula: x_{n+1} = ((n-1) * x_n + num / (x_n)^(n-1)) / n
        # This avoids using ** for exponentiation

        # Calculate x_n^(n-1) without using **
        power = 1
        for _ in range(n-1):
            power *= guess

        # Apply Newton's method formula
        next_guess = ((n - 1) * guess + x / power) / n

        # Check for convergence
        if abs(next_guess - guess) < epsilon:
            break

        guess = next_guess

    return guess

**My Approach to the Nth Root Problem**

To calculate the nth root without using exponentiation operators:

1.  I implemented Newton's method, which is an efficient iterative approach for finding roots
2.  The key formula I used is: `x_{n+1} = ((n-1) * x_n + num / (x_n)^(n-1)) / n`
3.  To calculate powers without using `**`, I implemented a simple loop that multiplies the base by itself n-1 times
4.  I added handling for edge cases like x=0, x=1, and n=1
5.  I implemented additional tests beyond the provided ones to verify correctness across a wider range of inputs

**Note:**
The implementation has O(log(x)) complexity in terms of the number of iterations needed for convergence, and each iteration takes O(n) time to calculate the power, making it efficient for both small and large values.


### Unit Tests

(You are allowed to use pre built exponentiation methods to test against)

In [None]:
def run_tests():
    THRESHOLD = 1e-2

    def is_approximately_equal(a, b):
        return abs(a - b) <= THRESHOLD

    # Original tests
    assert is_approximately_equal(root(100, 2), 100 ** 0.5)
    assert is_approximately_equal(root(50, 2), 50 ** 0.5)
    assert is_approximately_equal(root(30, 5), 30 ** 0.2)
    assert is_approximately_equal(root(1, 2), 1)

    # Additional tests
    assert is_approximately_equal(root(8, 3), 2.0)  # Cube root of 8 is 2
    assert is_approximately_equal(root(16, 4), 2.0)  # 4th root of 16 is 2
    assert is_approximately_equal(root(1000000, 6), 10.0)  # 6th root of 1,000,000 is 10
    assert is_approximately_equal(root(2, 1), 2.0)  # 1st root is the number itself
    assert is_approximately_equal(root(0.25, 2), 0.5)  # Square root of 0.25 is 0.5
    assert is_approximately_equal(root(27, 3), 3.0)  # Cube root of 27 is 3

    print("All tests passed!")

# Run the tests
run_tests()


All tests passed!


## **Additional Info**

### **Section 1:**

1.  **Data Augmentation:** Consider adding data augmentation to improve model performance.

In [None]:
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

2.  **Learning Rate Scheduler:** Add a learning rate scheduler to improve convergence:




In [None]:
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.5)
# Add scheduler.step() at the end of each epoch

3.  **Batch Normalization:** Adding batch normalization to the model could improve training stability and speed up convergence:


In [None]:
self.conv1 = nn.Conv2d(3, 6, 5)
self.bn1 = nn.BatchNorm2d(6)
# Then in forward: x = self.pool(F.relu(self.bn1(self.conv1(x))))

4.  **Model Saving:** Consider adding code to save the best model based on validation accuracy:


In [None]:
best_val_acc = 0
# Within your validation loop:
if val_accuracy > best_val_acc:
    best_val_acc = val_accuracy
    # Save model
    torch.save(model.state_dict(), 'best_model.pth')
    wandb.save('best_model.pth')

### **Section 2:**

1.  **Efficiency:** Could be optimized by tracking the power as you go

In [None]:
# Instead of recalculating the power each time
prev_power = 1
for _ in range(n-1):
    prev_power *= guess

# Then for the next iteration, you could do:
next_power = 1
for _ in range(n-1):
    next_power *= next_guess

2.  **More Edge Cases**: Could consider handling very large values of x or n:

In [None]:
# For very large values of x, adjust initial guess
if x > 1e10:
    guess = x / (2 * n)  # Might converge better for large values