**Model 1:**

For our First Model we are using Stochastic Gradient Descent (SGD) as the optimizer. The learning rate, momentum, and regularization weight decay values are as follows:

Learning rate (lr): 0.1, Momentum: 0.9, Regularization weight decay: 0.0001

A learning rate scheduler is also used, which adjusts the learning rate during training. The milestones for the scheduler are set at epochs 30, 60, and 90, and the learning rate is multiplied by a factor of 0.1 (gamma) at each milestone. 


Installing required packages:

In [None]:
pip install torch torchvision


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Importing necessary modules

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader


In [None]:
import torch
import torch.nn as nn

class BasicBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = None
        if stride != 1 or in_channels != out_channels:
            self.downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
            
    def forward(self, x):
        identity = x
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity
        out = self.relu(out)
        return out

class ModifiedResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ModifiedResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self._make_layer(32, 32, 3, stride=1)
        self.layer2 = self._make_layer(32, 64, 3, stride=2)
        self.layer3 = self._make_layer(64, 128, 3, stride=2)
        self.layer4 = self._make_layer(128, 256, 3, stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(256, num_classes)


    def _make_layer(self, in_channels, out_channels, num_blocks, stride):
        layers = []
        layers.append(BasicBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(BasicBlock(out_channels, out_channels, stride=1))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

model = ModifiedResNet()


Defining the ModifiedResNet and BasicBlock classes (use the code provided in the previous response).
Set up the device (use GPU if available):

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


Defining data augmentation and normalization:

In [None]:
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.RandomAffine(degrees=10, translate=(0.1, 0.1), scale=(0.8, 1.2)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])


Loading the CIFAR-10 dataset:

In [None]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

trainloader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)


Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:13<00:00, 13098455.13it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


Instantiating the model, loss function, and optimizer:

In [None]:
model = ModifiedResNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)


Defining the learning rate scheduler:

In [None]:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30, 60, 90], gamma=0.1)


Add a new variable to track the best validation loss and create a counter to keep track of epochs with no improvement.

In [1]:
best_val_loss = float('inf')
counter = 0
patience = 10


Splitting the training data into a training set and a validation set.

In [None]:
val_size = 0.1
trainset, valset = torch.utils.data.random_split(trainset, [int((1-val_size)*len(trainset)), int(val_size*len(trainset))])

trainloader = DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
valloader = DataLoader(valset, batch_size=100, shuffle=False, num_workers=2)


We are going to use the torchsummary.summary function after defining the model and before the training loop to check the number of parameters in your model.

In [None]:
pip install torchsummary


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Then, importing the torchsummary module in your Python script:

In [None]:
from torchsummary import summary


checking the number of parameters using torchsummary.summary, and defining the loss function and optimizer:

In [None]:
model = ModifiedResNet().to(device)
summary(model, input_size=(3, 32, 32))
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 32, 32]             864
       BatchNorm2d-2           [-1, 32, 32, 32]              64
              ReLU-3           [-1, 32, 32, 32]               0
            Conv2d-4           [-1, 32, 32, 32]           9,216
       BatchNorm2d-5           [-1, 32, 32, 32]              64
              ReLU-6           [-1, 32, 32, 32]               0
            Conv2d-7           [-1, 32, 32, 32]           9,216
       BatchNorm2d-8           [-1, 32, 32, 32]              64
              ReLU-9           [-1, 32, 32, 32]               0
       BasicBlock-10           [-1, 32, 32, 32]               0
           Conv2d-11           [-1, 32, 32, 32]           9,216
      BatchNorm2d-12           [-1, 32, 32, 32]              64
             ReLU-13           [-1, 32, 32, 32]               0
           Conv2d-14           [-1, 32,

Training the model:

In [None]:
num_epochs = 100

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for i, (inputs, labels) in enumerate(trainloader):
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    print(f"Epoch: {epoch+1}, Loss: {running_loss/(i+1)}, Train accuracy: {100*correct/total}")

    scheduler.step()


Epoch: 1, Loss: 2.085784676738761, Train accuracy: 23.7




Epoch: 2, Loss: 1.6759634840894828, Train accuracy: 37.45777777777778
Epoch: 3, Loss: 1.4603323939849029, Train accuracy: 46.522222222222226
Epoch: 4, Loss: 1.2505592922256752, Train accuracy: 54.84444444444444
Epoch: 5, Loss: 1.10368228669871, Train accuracy: 60.611111111111114
Epoch: 6, Loss: 1.003720301118764, Train accuracy: 64.41777777777777
Epoch: 7, Loss: 0.9114242409440604, Train accuracy: 67.75111111111111
Epoch: 8, Loss: 0.8278325725008141, Train accuracy: 70.73555555555555
Epoch: 9, Loss: 0.7670673512938347, Train accuracy: 72.91333333333333
Epoch: 10, Loss: 0.7081148137592457, Train accuracy: 75.42
Epoch: 11, Loss: 0.6673356741666794, Train accuracy: 76.67333333333333
Epoch: 12, Loss: 0.6374069205061956, Train accuracy: 77.80888888888889
Epoch: 13, Loss: 0.6088407468897375, Train accuracy: 78.82888888888888
Epoch: 14, Loss: 0.5844868937168609, Train accuracy: 79.72666666666667
Epoch: 15, Loss: 0.5577307285910303, Train accuracy: 80.58444444444444
Epoch: 16, Loss: 0.54070641

Testing the model:

In [None]:
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for inputs, labels in testloader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

print(f"Test accuracy: {100*correct/total}")

Test accuracy: 90.6


**Result:**

The final model architecture achieved a test accuracy of 90.6%, demonstrating the effectiveness of our design choices and optimization techniques for improving the performance on the CIFAR-10 dataset.