<a href="https://colab.research.google.com/github/step-cheng/cs496_gradienttheory/blob/main/CS496_HW_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from tqdm import tqdm

In [2]:
# Define transformations with data augmentation
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])


In [3]:
# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform_test)

trainloader = DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:13<00:00, 12.7MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [4]:
# Define the model with Batch Normalization
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 100, 3),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 10, 5),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(360, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10),
        )

    def forward(self, x):
        return self.model(x)

In [17]:
# Training loop
def train(model, criterion, optimizer, scheduler, device):
  num_epochs = 10
  criterion = nn.CrossEntropyLoss()
  optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1e-4)

  for epoch in range(num_epochs):
    print(f"epoch {epoch +1}")
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for inputs, labels in tqdm(trainloader):
      inputs, labels = inputs.to(device), labels.to(device)
      optimizer.zero_grad()
      outputs = model(inputs)
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()

      running_loss += loss.item()
      _, predicted = outputs.max(1)
      total += labels.size(0)
      correct += predicted.eq(labels).sum().item()

    if scheduler is not None:
      scheduler.step()
    train_acc = 100 * correct / total
    print(f"Epoch {epoch+1}: Loss: {running_loss/len(trainloader):.4f}, Train Accuracy: {train_acc:.2f}%")

  return model


In [9]:
# Evaluation on test set
def evaluate(model):
  model.eval()
  total = 0
  correct = 0
  with torch.no_grad():
      for inputs, labels in testloader:
          inputs, labels = inputs.to(device), labels.to(device)
          outputs = model(inputs)
          _, predicted = outputs.max(1)
          total += labels.size(0)
          correct += predicted.eq(labels).sum().item()

  test_acc = 100 * correct / total
  print(f"Test Accuracy: {test_acc:.2f}%")


In [12]:
# Initialize model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvNet().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, weight_decay=1e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)  # Reduce LR by half every 5 epochs

model = train(model, criterion, optimizer, scheduler, device)
evaluate(model)


epoch 1


100%|██████████| 500/500 [00:29<00:00, 17.02it/s]


Epoch 1: Loss: 1.6646, Train Accuracy: 38.35%
epoch 2


100%|██████████| 500/500 [00:29<00:00, 16.99it/s]


Epoch 2: Loss: 1.2713, Train Accuracy: 54.70%
epoch 3


100%|██████████| 500/500 [00:28<00:00, 17.31it/s]


Epoch 3: Loss: 1.0893, Train Accuracy: 61.54%
epoch 4


100%|██████████| 500/500 [00:29<00:00, 17.14it/s]


Epoch 4: Loss: 0.9662, Train Accuracy: 65.91%
epoch 5


100%|██████████| 500/500 [00:29<00:00, 17.00it/s]


Epoch 5: Loss: 0.8750, Train Accuracy: 69.20%
epoch 6


100%|██████████| 500/500 [00:29<00:00, 17.22it/s]


Epoch 6: Loss: 0.7976, Train Accuracy: 72.20%
epoch 7


100%|██████████| 500/500 [00:28<00:00, 17.27it/s]


Epoch 7: Loss: 0.7349, Train Accuracy: 74.40%
epoch 8


100%|██████████| 500/500 [00:29<00:00, 17.16it/s]


Epoch 8: Loss: 0.6873, Train Accuracy: 76.30%
epoch 9


100%|██████████| 500/500 [00:29<00:00, 16.91it/s]


Epoch 9: Loss: 0.6364, Train Accuracy: 78.08%
epoch 10


100%|██████████| 500/500 [00:28<00:00, 17.26it/s]

Epoch 10: Loss: 0.5937, Train Accuracy: 79.57%





Test Accuracy: 75.49%


I made two modifications

*   Set a Learning Rate Scheduler to decay the learning rate by a factor of 0.5 after 5 epochs
*   Used Batch Normalization after every convolutional layer

To ablate on the learning rate, I remove the learning rate scheduler and use a constant learning rate of 0.001.
To ablate on batch normalization, I remove batch normalization from the model.


In [11]:
# ablate on learning rate scheduler

# Initialize model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvNet().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, weight_decay=1e-4)

model = train(model, criterion, optimizer, None, device)
evaluate(model)

epoch 1


100%|██████████| 500/500 [00:29<00:00, 17.04it/s]


Epoch 1: Loss: 1.6851, Train Accuracy: 36.66%
epoch 2


100%|██████████| 500/500 [00:30<00:00, 16.48it/s]


Epoch 2: Loss: 1.2946, Train Accuracy: 53.51%
epoch 3


100%|██████████| 500/500 [00:29<00:00, 17.03it/s]


Epoch 3: Loss: 1.1071, Train Accuracy: 61.14%
epoch 4


100%|██████████| 500/500 [00:29<00:00, 17.12it/s]


Epoch 4: Loss: 0.9741, Train Accuracy: 65.94%
epoch 5


100%|██████████| 500/500 [00:29<00:00, 17.15it/s]


Epoch 5: Loss: 0.8745, Train Accuracy: 69.28%
epoch 6


100%|██████████| 500/500 [00:29<00:00, 17.20it/s]


Epoch 6: Loss: 0.8012, Train Accuracy: 72.17%
epoch 7


100%|██████████| 500/500 [00:29<00:00, 17.14it/s]


Epoch 7: Loss: 0.7423, Train Accuracy: 74.39%
epoch 8


100%|██████████| 500/500 [00:28<00:00, 17.25it/s]


Epoch 8: Loss: 0.6840, Train Accuracy: 76.60%
epoch 9


100%|██████████| 500/500 [00:29<00:00, 17.09it/s]


Epoch 9: Loss: 0.6370, Train Accuracy: 77.89%
epoch 10


100%|██████████| 500/500 [00:29<00:00, 16.95it/s]


Epoch 10: Loss: 0.6022, Train Accuracy: 79.47%
Test Accuracy: 74.87%


In [18]:
# ablate on batch norm, do none and do alternating

class ConvNet_no_BN(nn.Module):
    def __init__(self):
        super(ConvNet_no_BN, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 100, 3),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.BatchNorm2d(100),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.ReLU(),

            nn.Conv2d(100, 100, 5),
            nn.ReLU(),

            nn.Conv2d(100, 10, 5),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(360, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10),
        )

    def forward(self, x):
        return self.model(x)

# Initialize model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvNet_no_BN().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, weight_decay=1e-4)

model = train(model, criterion, optimizer, None, device)
evaluate(model)

epoch 1


100%|██████████| 500/500 [00:27<00:00, 18.38it/s]


Epoch 1: Loss: 2.0030, Train Accuracy: 23.63%
epoch 2


100%|██████████| 500/500 [00:27<00:00, 18.10it/s]


Epoch 2: Loss: 1.6879, Train Accuracy: 37.24%
epoch 3


100%|██████████| 500/500 [00:26<00:00, 18.66it/s]


Epoch 3: Loss: 1.4381, Train Accuracy: 47.95%
epoch 4


100%|██████████| 500/500 [00:28<00:00, 17.80it/s]


Epoch 4: Loss: 1.2434, Train Accuracy: 55.77%
epoch 5


100%|██████████| 500/500 [00:26<00:00, 18.52it/s]


Epoch 5: Loss: 1.1037, Train Accuracy: 61.11%
epoch 6


100%|██████████| 500/500 [00:26<00:00, 18.59it/s]


Epoch 6: Loss: 1.0044, Train Accuracy: 64.75%
epoch 7


100%|██████████| 500/500 [00:26<00:00, 18.62it/s]


Epoch 7: Loss: 0.9153, Train Accuracy: 67.89%
epoch 8


100%|██████████| 500/500 [00:26<00:00, 18.60it/s]


Epoch 8: Loss: 0.8499, Train Accuracy: 70.41%
epoch 9


100%|██████████| 500/500 [00:27<00:00, 18.31it/s]


Epoch 9: Loss: 0.7869, Train Accuracy: 72.82%
epoch 10


100%|██████████| 500/500 [00:26<00:00, 18.58it/s]

Epoch 10: Loss: 0.7344, Train Accuracy: 74.76%





Test Accuracy: 74.08%


It seems that the learning rate scheduler slightly improved the results from 74.87% to 75.49%. This is not very significant. I expected this, as annealing the learning rate only matters when the model starts to reach its optimal parameters, and having an accuracy of around 75% could be interpreted as too far from the optimal for a smaller learning rate to be necessary.

Using Batch Normalization slightly improved the results from 74.08% to 75.49%. This is not very significant. This is unexpected because in class, we discussed how multiple layers can exponentially increase the norm of the input, which means that batch normalization is necessary to ensure stable learning. However, the test accuracies between using batch normalization and not using batch normalization indicate that the batch normalization is not necessary for a model of this size.