<a href="https://colab.research.google.com/github/rubymanderna/ML_ECGR5105/blob/main/Assignment_7/Homework7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Problem 1 (50pts):
a. Build a Convolutional Neural Network, like what we built in lectures to classify the images across all 10 classes in CIFAR 10. You need to adjust the fully connected layer at the end properly concerning the number of output classes. Train your network for 300 epochs. Report your training time, training loss, and evaluation accuracy after 300 epochs. Analyze your results in your report and compare them against a fully connected network (homework 2) on training time, achieved accuracy, and model size.
Make sure to submit your code by providing the GitHub URL of your course repository for this course.




In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define the CNN architecture
class SimpleCNN(nn.Module):


    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(64 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(-1, 64 * 8 * 8)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Set the device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4)

# Initialize the model, loss function, and optimizer
model = SimpleCNN(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop with modifications
num_epochs = 300
early_stopping_threshold = 10
best_validation_loss = float('inf')
early_stopping_counter = 0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    # Validation
    model.eval()
    validation_loss = 0.0
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            validation_loss += criterion(outputs, labels).item()

    # Print and check for early stopping
    average_loss = total_loss / len(train_loader)
    average_validation_loss = validation_loss / len(test_loader)
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {average_loss}, Validation Loss: {average_validation_loss}")

    if average_validation_loss < best_validation_loss:
        best_validation_loss = average_validation_loss
        early_stopping_counter = 0
    else:
        early_stopping_counter += 1

    if early_stopping_counter >= early_stopping_threshold:
        print(f"Early stopping after {early_stopping_threshold} epochs without improvement.")
        break

# Evaluation
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Training Time: {num_epochs} epochs, Accuracy: {accuracy}")


Files already downloaded and verified
Files already downloaded and verified
Epoch 1/300, Loss: 1.361691037940857, Validation Loss: 1.0899869195974556
Epoch 2/300, Loss: 0.9794067739297057, Validation Loss: 0.9373889838813976
Epoch 3/300, Loss: 0.8099364701591795, Validation Loss: 0.9106965148524874
Epoch 4/300, Loss: 0.699265246462944, Validation Loss: 0.8195717889032547
Epoch 5/300, Loss: 0.5885692591709859, Validation Loss: 0.8635666307750022
Epoch 6/300, Loss: 0.49118287442132946, Validation Loss: 0.8391760082761194
Epoch 7/300, Loss: 0.4010898284137706, Validation Loss: 0.8895851544513824
Epoch 8/300, Loss: 0.3176781895387051, Validation Loss: 1.0276734631532316
Epoch 9/300, Loss: 0.24845535959810247, Validation Loss: 1.079126067601951
Epoch 10/300, Loss: 0.19249336764006816, Validation Loss: 1.192319303561168
Epoch 11/300, Loss: 0.14785454512271276, Validation Loss: 1.3344713544390003
Epoch 12/300, Loss: 0.11291855411685031, Validation Loss: 1.5035629325611577
Epoch 13/300, Loss: 

Homework 2  results or simple Neural network
With one hidden layer:
Training Time: 447.68 seconds
Training Loss: 1.4940
Validation Accuracy: 0.4517
Test Accuracy: 0.4517

compare with this simple CNN

Epoch 14/300, Loss: 0.08390522266135496, Validation Loss: 1.7134962480538969
Early stopping after 10 epochs without improvement.
Training Time: 300 epochs, Accuracy: 0.7029

Comparing these results, the simple CNN achieved a higher accuracy (70.29%) compared to the neural network (45.17%) on the validation set. Additionally, the CNN had a lower training loss, indicating better convergence during training. The early stopping mechanism was triggered after 10 epochs without improvement in validation loss.

In summary, the simple CNN seems to outperform the neural network based on the provided metrics.





b. Extend your CNN by adding one more additional convolution layer followed by an activation function and pooling function. You also need to adjust your fully connected layer properly with respect to intermediate feature dimensions. Train your network for 300 epochs. Report your training time, loss, and evaluation accuracy after 300 epochs.
Analyze your results in your report and compare your model size and accuracy over the baseline implementation in Problem 1.a. Do you see any over-fitting? Make sure to submit your code by providing the GitHub URL of your course repository for this course.

In [3]:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define the extended CNN architecture
class ExtendedCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(ExtendedCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128 * 4 * 4, 256)  # Adjusted fully connected layer
        self.fc2 = nn.Linear(256, num_classes)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = self.pool(self.relu(self.conv3(x)))
        x = x.view(-1, 128 * 4 * 4)  # Adjusted view based on the dimensions
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Set the device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4)

# Initialize the model, loss function, and optimizer
model = ExtendedCNN(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop with modifications for early stopping
num_epochs = 300
early_stopping_threshold = 10
best_validation_loss = float('inf')
early_stopping_counter = 0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    # Validation
    model.eval()
    validation_loss = 0.0
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            validation_loss += criterion(outputs, labels).item()

    # Print and check for early stopping
    average_loss = total_loss / len(train_loader)
    average_validation_loss = validation_loss / len(test_loader)
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {average_loss}, Validation Loss: {average_validation_loss}")

    if average_validation_loss < best_validation_loss:
        best_validation_loss = average_validation_loss
        early_stopping_counter = 0
    else:
        early_stopping_counter += 1

    if early_stopping_counter >= early_stopping_threshold:
        print(f"Early stopping after {early_stopping_threshold} epochs without improvement.")
        break


# Evaluation
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Training Time: {num_epochs} epochs, Accuracy: {accuracy}")


Files already downloaded and verified
Files already downloaded and verified
Epoch 1/300, Loss: 1.359804761150609, Validation Loss: 1.0365461164219365
Epoch 2/300, Loss: 0.923181984849903, Validation Loss: 0.8744377083839125
Epoch 3/300, Loss: 0.7380762781633441, Validation Loss: 0.7795194197612204
Epoch 4/300, Loss: 0.6055254779965676, Validation Loss: 0.7402167462619247
Epoch 5/300, Loss: 0.5048171597559129, Validation Loss: 0.782935691866905
Epoch 6/300, Loss: 0.4096251341330883, Validation Loss: 0.7575188702458788
Epoch 7/300, Loss: 0.32877071233242366, Validation Loss: 0.8034545285686566
Epoch 8/300, Loss: 0.2531358630341642, Validation Loss: 0.8860977834956661
Epoch 9/300, Loss: 0.1947809270752208, Validation Loss: 0.9268578143825956
Epoch 10/300, Loss: 0.15679256678761347, Validation Loss: 1.0980416898894463
Epoch 11/300, Loss: 0.12297829178631153, Validation Loss: 1.2072431223028024
Epoch 12/300, Loss: 0.11625367859044991, Validation Loss: 1.2491631274390373
Epoch 13/300, Loss: 

problem 1 b

 Modified CNN -by adding one more additional convolution layer followed by an activation function and pooling function.

Epoch 14/300, Loss: 0.08758060977189228, Validation Loss: 1.3732742405241463
Early stopping after 10 epochs without improvement.
Training Time: 300 epochs, Accuracy: 0.761

Earlier simeple CNN output

Epoch 14/300, Loss: 0.08390522266135496, Validation Loss: 1.7134962480538969
Early stopping after 10 epochs without improvement.
Training Time: 300 epochs, Accuracy: 0.7029

comparision Comparing these results:

Accuracy:

Modified CNN: Accuracy of 76.1%
Simple CNN: Accuracy of 70.29%
The modified CNN achieved a higher accuracy compared to the simple CNN.

Training Loss:

Modified CNN: Loss of 0.0876
Simple CNN: Loss of 0.0839
The training loss of the modified CNN is slightly higher than that of the simple CNN.

Validation Loss:

Modified CNN: Validation loss of 1.3733
Simple CNN: Validation loss of 1.7135
The modified CNN has a lower validation loss compared to the simple CNN, indicating better generalization.

Training Time:

Both models were trained for 300 epochs, and early stopping was triggered after 10 epochs without improvement.

 the modified CNN with an additional convolution layer, activation function, and pooling function achieved higher accuracy and a lower validation loss compared to the simple CNN.


Problem 2 (50pts)
a. Build a ResNet-based Convolutional Neural Network, like what we built in lectures (with skip connections), to classify the images across all 10 classes in CIFAR 10. For this problem, let's use 10 blocks for ResNet and call it ResNet-10. Use similar dimensions and channels as we need in lectures.
Train your network for 300 epochs. Report your training time, training loss, and evaluation accuracy after 300 epochs. Analyze your results in your report and compare them against problem 1.b on training time, achieved accuracy, and model size.
Make sure to submit your code by providing the GitHub URL of your course repository for this course.


In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define the Residual Block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += self.downsample(identity)
        out = self.relu(out)
        return out

# Define the ResNet-10 architecture
class ResNet10(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet10, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self.make_layer(16, 16, num_blocks=2, stride=1)
        self.layer2 = self.make_layer(16, 32, num_blocks=2, stride=2)
        self.layer3 = self.make_layer(32, 64, num_blocks=2, stride=2)
        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(64, num_classes)

    def make_layer(self, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels, stride=1))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.avg_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# Set the device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4)

# Initialize the ResNet-10 model, loss function, and optimizer
model = ResNet10(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop with modifications for early stopping
num_epochs = 300
early_stopping_threshold = 10
best_validation_loss = float('inf')
early_stopping_counter = 0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    # Validation
    model.eval()
    validation_loss = 0.0
    with torch.no_grad():


     for inputs, labels in test_loader:
       inputs, labels = inputs.to(device), labels.to(device)
       outputs = model(inputs)
       validation_loss += criterion(outputs, labels).item()

    # Print and check for early stopping
    average_loss = total_loss / len(train_loader)
    average_validation_loss = validation_loss / len(test_loader)
    print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {average_loss}, Validation Loss: {average_validation_loss}")

    if average_validation_loss < best_validation_loss:
        best_validation_loss = average_validation_loss
        early_stopping_counter = 0
    else:
        early_stopping_counter += 1

    if early_stopping_counter >= early_stopping_threshold:
        print(f"Early stopping after {early_stopping_threshold} epochs without improvement.")
        break

# Evaluation
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f"Training Time: {num_epochs} epochs, Accuracy: {accuracy}")


Files already downloaded and verified
Files already downloaded and verified
Epoch 1/300, Loss: 1.3188096072210376, Validation Loss: 1.1471581941197633
Epoch 2/300, Loss: 0.9111275082963812, Validation Loss: 0.8510392513244774
Epoch 3/300, Loss: 0.7477234565007412, Validation Loss: 0.7468047850071244
Epoch 4/300, Loss: 0.6438134951359781, Validation Loss: 0.6981770506330357
Epoch 5/300, Loss: 0.5668380812305929, Validation Loss: 0.7218952287154593
Epoch 6/300, Loss: 0.5071670342131954, Validation Loss: 0.6313433089074055
Epoch 7/300, Loss: 0.4576592723960462, Validation Loss: 0.5929360507400172
Epoch 8/300, Loss: 0.41199194761874425, Validation Loss: 0.5576162442659877
Epoch 9/300, Loss: 0.37443249370626475, Validation Loss: 0.638221031920925
Epoch 10/300, Loss: 0.3364449573390167, Validation Loss: 0.6587496407472404
Epoch 11/300, Loss: 0.30171978665168026, Validation Loss: 0.6108303006477417
Epoch 12/300, Loss: 0.26873460459663434, Validation Loss: 0.687041980161029
Epoch 13/300, Loss:

Problem 2 a answer -
Below is modified CNN output

Epoch 14/300, Loss: 0.08758060977189228, Validation Loss: 1.3732742405241463
Early stopping after 10 epochs without improvement.
Training Time: 300 epochs, Accuracy: 0.761

ResNET output

Epoch 18/300, Loss: 0.14074797088475635, Validation Loss: 0.7536989632685474
Early stopping after 10 epochs without improvement.
Training Time: 300 epochs, Accuracy: 0.8054

Comparing the modified CNN and ResNet outputs:

Accuracy:

Modified CNN: Accuracy of 76.1%
ResNet: Accuracy of 80.54%
ResNet achieved a higher accuracy compared to the modified CNN.

Training Loss:

Modified CNN: Loss of 0.0876
ResNet: Loss of 0.1407
The modified CNN has a lower training loss compared to ResNet.

Validation Loss:

Modified CNN: Validation loss of 1.3733
ResNet: Validation loss of 0.7537
ResNet has a significantly lower validation loss, indicating better generalization.

Training Time:

Both models were trained for 300 epochs, and early stopping was triggered after 10 epochs without improvement.
In summary, while the modified CNN performed well, ResNet achieved higher accuracy and lower validation loss, suggesting that ResNet has better overall performance in this scenario.


b. Perform three additional training and evaluations for your ResNet-10 to assess the impacts of regularization on your ResNet-10.
* ﻿﻿Weight Decay with lambda of 0.001
* ﻿﻿Dropout with p=0.3
* ﻿﻿Batch Normalization
Report and compare your training time, training loss, and evaluation accuracy after 300 epochs across these three different pieces of training.
Analyze your results in your report and compare them against problem 1. On training time, you achieved accuracy.

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define the Residual Block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += self.downsample(identity)
        out = self.relu(out)
        return out

# Define the ResNet-10 architecture
class ResNet10(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet10, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self.make_layer(16, 16, num_blocks=2, stride=1)
        self.layer2 = self.make_layer(16, 32, num_blocks=2, stride=2)
        self.layer3 = self.make_layer(32, 64, num_blocks=2, stride=2)
        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(64, num_classes)

    def make_layer(self, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels, stride=1))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.avg_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# Set the device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load CIFAR-10 dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=4)

# Function to train and evaluate the model with early stopping
def train_and_evaluate_with_early_stopping(model, criterion, optimizer, num_epochs=20, regularization_name="None", early_stopping_threshold=10):
    print(f"Training ResNet-10 with {regularization_name} regularization and early stopping:")

    best_validation_loss = float('inf')
    early_stopping_counter = 0

    for epoch in range(num_epochs):
        model.train()
        total_loss = 0.0
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        # Validation
        model.eval()
        validation_loss = 0.0
        with torch.no_grad():
            for inputs, labels in test_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                validation_loss += criterion(outputs, labels).item()

        # Print and check for early stopping
        average_loss = total_loss / len(train_loader)
        average_validation_loss = validation_loss / len(test_loader)
        print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {average_loss}, Validation Loss: {average_validation_loss}")

        if average_validation_loss < best_validation_loss:
            best_validation_loss = average_validation_loss
            early_stopping_counter = 0
        else:
            early_stopping_counter += 1

        if early_stopping_counter >= early_stopping_threshold:
            print(f"Early stopping after {early_stopping_threshold} epochs without improvement.")
            break

    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = correct / total
    print(f"Training Time: {epoch + 1} epochs, Accuracy: {accuracy}\n")

# Initialize the ResNet-10 model, loss function, and optimizer for each scenario
model_no_regularization = ResNet10(num_classes=10).to(device)
model_weight_decay = ResNet10(num_classes=10).to(device)
model_dropout = ResNet10(num_classes=10).to(device)
model_batch_norm = ResNet10(num_classes=10).to(device)

criterion = nn.CrossEntropyLoss()

# # Training loop for ResNet-10 without regularization with early stopping
# optimizer_no_regularization = optim.Adam(model_no_regularization.parameters(), lr=0.001, weight_decay=0)
# train_and_evaluate_with_early_stopping(model_no_regularization, criterion, optimizer_no_regularization, num_epochs=300, regularization_name="No Regularization")

# # Training loop for ResNet-10 with Weight Decay (L2 regularization) with early stopping
# optimizer_weight_decay = optim.Adam(model_weight_decay.parameters(), lr=0.001, weight_decay=0.001)
# train_and_evaluate_with_early_stopping(model_weight_decay, criterion, optimizer_weight_decay, num_epochs=300, regularization_name="Weight Decay")

# Training loop for ResNet-10 with Dropout with early stopping
model_dropout.fc = nn.Sequential(nn.Dropout(0.3), nn.Linear(64, 10)).to(device)  # Add dropout to the fully connected layer
optimizer_dropout = optim.Adam(model_dropout.parameters(), lr=0.01)
train_and_evaluate_with_early_stopping(model_dropout, criterion, optimizer_dropout, num_epochs=20, regularization_name="Dropout")

# Training loop for ResNet-10 with Batch Normalization with early stopping
optimizer_batch_norm = optim.Adam(model_batch_norm.parameters(), lr=0.01)
train_and_evaluate_with_early_stopping(model_batch_norm, criterion, optimizer_batch_norm, num_epochs=20, regularization_name="Batch Normalization")


Files already downloaded and verified
Files already downloaded and verified
Training ResNet-10 with Dropout regularization and early stopping:
Epoch 1/20, Loss: 1.5658981261198477, Validation Loss: 1.2364503786822034
Epoch 2/20, Loss: 1.1134927833781523, Validation Loss: 0.9994412189835955
Epoch 3/20, Loss: 0.911093364705515, Validation Loss: 0.7939224448173668
Epoch 4/20, Loss: 0.7742542710984149, Validation Loss: 0.7534522323092078
Epoch 5/20, Loss: 0.6779960084068196, Validation Loss: 0.8197496548579757
Epoch 6/20, Loss: 0.6086168395512549, Validation Loss: 0.710800096107896
Epoch 7/20, Loss: 0.5524985785679439, Validation Loss: 0.6717706421378312
Epoch 8/20, Loss: 0.5010968905390071, Validation Loss: 0.5725557973999886
Epoch 9/20, Loss: 0.4592542575905695, Validation Loss: 0.6131337188231717
Epoch 10/20, Loss: 0.4274180505967811, Validation Loss: 0.5628867433139473
Epoch 11/20, Loss: 0.3857717586066717, Validation Loss: 0.6865508981571076
Epoch 12/20, Loss: 0.35878281939365064, Val

Training ResNet-10 with No Regularization regularization:

Epoch 11/300, Loss: 0.3103035500516062
Validation Loss: 0.6360743996823669
Early stopping at epoch 11 as validation loss did not improve for 5 epochs.
Training Time: 300 epochs, Accuracy: 0.7985

Training loop for ResNet-10 with Weight Decay (L2 regularization) with early stopping

Epoch 20/20, Loss: 0.20441421747798352, Validation Loss: 0.6777726767738913
Early stopping after 10 epochs without improvement.
Training Time: 20 epochs, Accuracy: 0.8161


Training ResNet-10 with Batch Normalization regularization and early stopping:

Epoch 15/15, Loss: 0.16131393375384914, Validation Loss: 0.7098411087207733
Training Time: 15 epochs, Accuracy: 0.8099