<a href="https://colab.research.google.com/github/nil3sh99/ML/blob/master/Debug_this_Colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Debug this Colab!

This colab represents a simple ML pipline, loading data, defining a model and fitting the model to the data. It has also been instrumented with Weights and Biases logging tools.

At Weights and Biases, we often help our users debug their pipelines -- both the ML code and the logging code from `wandb` integrated into it.

Your task is to debug this simple pipeline such that the model is able to learn and <u>perform reasonably well</u> (hint: Sweeps) on the given task, without changing the general structure of the model. As you do so, use comments and markdown cells to explain a bit about your process.

In [None]:
!pip install wandb

In [45]:
import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import DataLoader

import torchvision
from torchvision import transforms

import wandb


# Data Preprocessing

In [46]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

batch_size = 32

cifar10 = torchvision.datasets.CIFAR10(root='./data', download=True, transform=torchvision.transforms.ToTensor())
pivot = 40000
cifar10 = sorted(cifar10, key=lambda x: x[1])
train_set = torch.utils.data.Subset(cifar10, range(pivot))
val_set = torch.utils.data.Subset(cifar10, range(pivot, len(cifar10)))
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_set, batch_size=batch_size, shuffle=True)

Files already downloaded and verified


In [58]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5) 
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # print(self.conv2)
        self.fc1 = nn.Linear(16*5*5, 120)
        # print(self.fc1)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Network()

In [48]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e3, momentum=0.9)

# Training and Validation

In this part, you will also need to additionally calculate training and validation accuracy and log it to Weights and Biases.

In [52]:
def wandbSweep(config = None): 
  with wandb.init(project = 'Tier-1-Test', save_code=True) as run:
      for epoch in range(5):
          current_loss = 0

          model.train()

          for i, data in enumerate(train_loader):
              images, labels = data

              outputs = model(images)
              loss = criterion(outputs, labels)

              loss.backward()
              optimizer.step()

              current_loss += loss
              
          run.log({
                  'train_loss': current_loss / (i + 1)
              }) 
          
          model.eval()

          current_loss = 0

          for i, data in enumerate(val_loader):
              images, labels = data
              outputs = model(images)

              loss = criterion(outputs, labels)

              current_loss += loss

          run.log({
                  'val_loss': current_loss / (i + 1)
              })

# Sweep configuration for minimizing the validation_loss   
sweep_configuration = {
    'method': 'random',
    'name': 'sweep',
    'metric': {
        'goal': 'minimize', 
        'name': 'val_loss'
		},
    'parameters': {
        'batch_size': {'values': [16, 32, 64]},
        'epochs': {'values': [5, 10, 15]},
        'lr': {'max': 0.1, 'min': 0.0001}
     }
}

sweep_id = wandb.sweep(sweep=sweep_configuration, project="Tier-1-Test")

wandb.agent(sweep_id, wandbSweep, count=10)

Create sweep with ID: g9a21xg7
Sweep URL: https://wandb.ai/nil3sh99/Tier-1-Test/sweeps/g9a21xg7


[34m[1mwandb[0m: Agent Starting Run: hn5a16q3 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	lr: 0.0779112948832967
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01667099575000369, max=1.0)…

VBox(children=(Label(value='0.070 MB of 0.070 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▁▇▂▃█
val_loss,▁▇▁▇█

0,1
train_loss,2378980608.0
val_loss,2701510656.0


[34m[1mwandb[0m: Agent Starting Run: uovg93ku with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	lr: 0.060674546789598494
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.079 MB of 0.079 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▅▇▁▁█
val_loss,▆▅▄█▁

0,1
train_loss,2762804736.0
val_loss,1869490816.0


[34m[1mwandb[0m: Agent Starting Run: totnkw5c with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	lr: 0.041060286917498266
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.089 MB of 0.089 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▁▄▆██
val_loss,▄▅██▁

0,1
train_loss,3146843904.0
val_loss,2108091008.0


[34m[1mwandb[0m: Agent Starting Run: tff6wf8t with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 10
[34m[1mwandb[0m: 	lr: 0.08905916872429007
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.099 MB of 0.099 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▁▇█▇▇
val_loss,▁█▄█▆

0,1
train_loss,3203400192.0
val_loss,3337771776.0


[34m[1mwandb[0m: Agent Starting Run: bpadcnx8 with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	lr: 0.05517258873729125
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.109 MB of 0.109 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▃▄▆█▁
val_loss,▃▇█▇▁

0,1
train_loss,2467547136.0
val_loss,2630499584.0


[34m[1mwandb[0m: Agent Starting Run: 3ocx449c with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	lr: 0.06771344818523717
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.118 MB of 0.118 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▁▅▁█▅
val_loss,▇▁▁▆█

0,1
train_loss,4131813888.0
val_loss,4335378944.0


[34m[1mwandb[0m: Agent Starting Run: dzbl20ky with config:
[34m[1mwandb[0m: 	batch_size: 16
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	lr: 0.0918474140650209
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


0,1
train_loss,▂▁▇█▆
val_loss,▁▅█▇▇

0,1
train_loss,4317155328.0
val_loss,4649993216.0


[34m[1mwandb[0m: Agent Starting Run: 0o8w65zd with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	lr: 0.08535177555822147
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.131 MB of 0.131 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▃▁▃██
val_loss,▅▁▇█▇

0,1
train_loss,5053338624.0
val_loss,5194276864.0


[34m[1mwandb[0m: Agent Starting Run: gtkiynih with config:
[34m[1mwandb[0m: 	batch_size: 32
[34m[1mwandb[0m: 	epochs: 15
[34m[1mwandb[0m: 	lr: 0.043441640598588176
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.141 MB of 0.141 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▅▁▆█▇
val_loss,▁▁▄█▆

0,1
train_loss,4877503488.0
val_loss,4888024064.0


[34m[1mwandb[0m: Agent Starting Run: 41ns4pvc with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 5
[34m[1mwandb[0m: 	lr: 0.02664033915777308
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


VBox(children=(Label(value='0.150 MB of 0.150 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
train_loss,▅█▄▁▂
val_loss,██▃▁▆

0,1
train_loss,3931218432.0
val_loss,5344293376.0


Now that you have completed the task, please write 3-5 lines sharing your approach to the problem and how you went about solving this task.

# Approach

I ran the code block by block and resolved the conflicting errors presented in the output frame by referring them with the official documentation of PyTorch and wandb, making sure that the syntax is correct and parameters are valid.
For sweeping add-on, I read code examples on GitHub and took some motivation from the official wandb repository.

<br>

# For Reference
# Debugging

## Incorrect function names

1.   MaxPooling2D(2,2)  --> MaxPool2d(2, 2)
2.   torch.Flatten(x,1) --> torch.flatten(x, 1)

## Invalid format for wandb.log()

While returning inside the wandb.log(), the body of the function should be a  dictionary

```
run.log({
          'train_loss': current_loss / (i + 1)
        }) 
```
```
run.log({
          'val_loss': current_loss / (i + 1)
        })
```

## mat1 and mat2 shapes cannot be multiplied

#### Original Code
```
self.conv2 = nn.Conv2d(6, 16, 5) <br>
self.fc1 = nn.Linear(600, 120) <br>
self.fc2 = nn.Linear(120, 2) <br>
self.fc3 = nn.Linear(2, 10) <br>
```

#### Modified Code
The output channel value from conv2d should be ```NumberOfChannels * Kernel_Size```, and thus the value will be 400, rather than 600. <br>

```
self.conv2 = nn.Conv2d(6, 16, 5) <br>
self.fc1 = nn.Linear(16 x 5 x 5, 120) <br>
self.fc2 = nn.Linear(120, 84) <br>
self.fc3 = nn.Linear(84, 10) <br>

```






<!-- Debugging -->
<!-- 1.  -->