# ResNeXt [7 points]
Based on your ResNet implementation in Part I, extend it to ResNeXT. It is expected that your accuracy is higher than ResNet. Compare the results with your VGG and ResNet implementation.

## Step 1: Implement the ResNeXT architecture
Pay close attention to the grouped convolutions and cardinality parameter. Using inbuild ResNeXt model won’t be considered for evaluation.

In [None]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


In [None]:
import torch
import torch.nn as nn

class ResNeXtBlock(nn.Module):
    def __init__(self, input_pa, output_pa, car=32, stride=1):
        super(ResNeXtBlock, self).__init__()

        width = output_pa // 2
        self.conv1 = nn.Conv2d(input_pa, width, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(width)

        self.conv2 = nn.Conv2d(width, width, kernel_size=3, stride=stride, padding=1, groups=car, bias=False)
        self.bn2 = nn.BatchNorm2d(width)

        self.conv3 = nn.Conv2d(width, output_pa, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(output_pa)

        self.relu = nn.ReLU(inplace=True)

        self.shortcut = nn.Sequential()
        if stride != 1 or input_pa != output_pa:
            self.shortcut = nn.Sequential(
                nn.Conv2d(input_pa, output_pa, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(output_pa)
            )

    def forward(self, x):
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        return self.relu(out)

class ResNeXt_model(nn.Module):
    def __init__(self, num_classes=3, car=32):
        super(ResNeXt_model, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)

        self.layer1 = self._make_layer(64, 128, car)
        self.layer2 = self._make_layer(128, 256, car)
        self.layer3 = self._make_layer(256, 512, car)

        self.global_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Linear(512, num_classes)

    def _make_layer(self, input_pa, output_pa, car):
        return nn.Sequential(ResNeXtBlock(input_pa, output_pa, car))

    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.global_pool(x)
        x = torch.flatten(x, 1)
        return self.fc(x)


## Step 2: Train and evaluate your ResNeXt model
Train and evaluate your ResNeXt model on the same dataset used in Part I.

In [None]:
import os
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

CNNFolder = "/kaggle/input/cnn-dataset"

!ls -R "$CNNFolder"

initialTransforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

dataset = datasets.ImageFolder(root=CNNFolder, transform=initialTransforms)

batch_size = 32
cnn_dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

print(f"\nLoaded dataset with {len(dataset)} images.")
print(f"Classes found: {dataset.classes}")


/kaggle/input/cnn-dataset:
dogs  food  vehicles

/kaggle/input/cnn-dataset/dogs:
10000.jpg  2125.jpg  3250.jpg  4376.jpg  5500.jpg  6626.jpg  7751.jpg  8877.jpg
1000.jpg   2126.jpg  3251.jpg  4377.jpg  5501.jpg  6627.jpg  7752.jpg  8878.jpg
1001.jpg   2127.jpg  3252.jpg  4378.jpg  5502.jpg  6628.jpg  7753.jpg  8879.jpg
1002.jpg   2128.jpg  3253.jpg  4379.jpg  5503.jpg  6629.jpg  7754.jpg  887.jpg
1003.jpg   2129.jpg  3254.jpg  437.jpg	 5504.jpg  662.jpg   7755.jpg  8880.jpg
1004.jpg   212.jpg   3255.jpg  4380.jpg  5505.jpg  6630.jpg  7756.jpg  8881.jpg
1005.jpg   2130.jpg  3256.jpg  4381.jpg  5506.jpg  6631.jpg  7757.jpg  8882.jpg
1006.jpg   2131.jpg  3257.jpg  4382.jpg  5507.jpg  6632.jpg  7758.jpg  8883.jpg
1007.jpg   2132.jpg  3258.jpg  4383.jpg  5508.jpg  6633.jpg  7759.jpg  8884.jpg
1008.jpg   2133.jpg  3259.jpg  4384.jpg  5509.jpg  6634.jpg  775.jpg   8885.jpg
1009.jpg   2134.jpg  325.jpg   4385.jpg  550.jpg   6635.jpg  7760.jpg  8886.jpg
100.jpg    2135.jpg  3260.jpg  4386.jpg  

In [None]:
import torch
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, WeightedRandomSampler
import numpy as np

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

dataset = datasets.ImageFolder(root=CNNFolder, transform=transform)

count = np.bincount([label for _, label in dataset])
weights = 1.0 / count
sample = [weights[label] for _, label in dataset]

sampler = WeightedRandomSampler(weights=sample, num_samples=len(sample), replacement=True)

In [None]:
import os
import torch
import numpy as np
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

trainTransforms = transforms.Compose([
    transforms.RandomResizedCrop(64),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5],
                         std=[0.5, 0.5, 0.5])
])

valTestTransforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5],
                         std=[0.5, 0.5, 0.5])
])

complete = datasets.ImageFolder(root=CNNFolder, transform=None)
tr_ratio = 0.70
val_ratio   = 0.15
test_ratio  = 0.15

dataset_size = len(complete)
train_size   = int(tr_ratio * dataset_size)
val_size     = int(val_ratio * dataset_size)
test_size    = dataset_size - train_size - val_size

train_subset, val_subset, test_subset = random_split(
    complete,
    [train_size, val_size, test_size],
    generator=torch.Generator().manual_seed(42)
)


train_subset.dataset.transform = trainTransforms
val_subset.dataset.transform   = valTestTransforms
test_subset.dataset.transform  = valTestTransforms

train_loader = DataLoader(train_subset, batch_size=32, shuffle=True,  num_workers=2)
val_loader   = DataLoader(val_subset,   batch_size=32, shuffle=False, num_workers=2)
test_loader  = DataLoader(test_subset,  batch_size=32, shuffle=False, num_workers=2)

print(f"Training set: {len(train_subset)} samples")
print(f"Validation set: {len(val_subset)} samples")
print(f"Test set: {len(test_subset)} samples")

print("Classes:", complete.classes)

Using device: cuda
Training set: 21000 samples
Validation set: 4500 samples
Test set: 4500 samples
Classes: ['dogs', 'food', 'vehicles']


In [None]:
import torch.optim as optim

def train_model(model, train_loader, val_loader, num_epochs=10, learning_rate=0.0005):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=1e-4)

    for epoch in range(num_epochs):
        model.train()
        total_loss, correct, total = 0, 0, 0

        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            correct += (outputs.argmax(1) == labels).sum().item()
            total += labels.size(0)

        train_acc = 100. * correct / total
        print(f"Epoch [{epoch+1}/{num_epochs}]| Train: Loss= {total_loss/len(train_loader):.4f} |  Accuracy= {train_acc:.2f}%")

    print("training complete.")

num_classes = 3
model = ResNeXt_model(num_classes=num_classes)

train_model(model, train_loader, val_loader, num_epochs=10, learning_rate=0.0005)



Epoch [1/10]| Train: Loss= 0.4157 |  Accuracy= 83.82%
Epoch [2/10]| Train: Loss= 0.2996 |  Accuracy= 88.80%
Epoch [3/10]| Train: Loss= 0.2610 |  Accuracy= 90.37%
Epoch [4/10]| Train: Loss= 0.2362 |  Accuracy= 91.43%
Epoch [5/10]| Train: Loss= 0.2186 |  Accuracy= 92.11%
Epoch [6/10]| Train: Loss= 0.2029 |  Accuracy= 92.44%
Epoch [7/10]| Train: Loss= 0.1922 |  Accuracy= 93.10%
Epoch [8/10]| Train: Loss= 0.1822 |  Accuracy= 93.13%
Epoch [9/10]| Train: Loss= 0.1704 |  Accuracy= 93.86%
Epoch [10/10]| Train: Loss= 0.1600 |  Accuracy= 94.07%
training complete.


In [None]:
def evaluate_model(model, data_loader, device):
    model.eval()
    total_loss, correct, total = 0.0, 0, 0
    criterion = nn.CrossEntropyLoss()

    with torch.no_grad():
        for images, labels in data_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            total_loss += loss.item()
            correct += (outputs.argmax(dim=1) == labels).sum().item()
            total += labels.size(0)

    avg_loss = total_loss / len(data_loader)
    accuracy = 100.0 * correct / total
    return avg_loss, accuracy

val_loss, val_acc = evaluate_model(model, val_loader, torch.device("cuda" if torch.cuda.is_available() else "cpu"))
print(f"Validation: Loss= {val_loss:.4f}, Accuracy= {val_acc:.2f}%")

test_loss, test_acc = evaluate_model(model, test_loader, torch.device("cuda" if torch.cuda.is_available() else "cpu"))
print(f"Test: Loss= {test_loss:.4f}, Accuracy= {test_acc:.2f}%")


Validation: Loss= 0.1785, Accuracy= 93.09%
Test: Loss= 0.1801, Accuracy= 93.24%


## Step 3: Compare the performance of your ResNeXt model
Compare the performance of your ResNeXt model against your previous ResNet and VGG models. Provide a detailed analysis of the results.

### a.Performance metrics (accuracy, loss and epochs) for all three models.

| Model   | Train Acc | Val Acc | Test Acc | Train Loss | Val Loss | Test Loss | Epochs |
|---------|-----------|---------|----------|------------|----------|-----------|--------|
| VGG     | 63.10%    | 90.60%  | 91.53%   | 0.5012     | 0.4102   | 0.3951    | 10     |
| ResNet  | 66.91%    | 93.00%  | 92.67%   | 0.5447     | 0.4328   | 0.2811    | 20     |
| ResNeXt | 94.07%    | 93.09%  | 93.24%   | 0.1801     | 0.1785   | 0.1600    | 10     |


### b. Discussion of the observed differences in performance.
Explain why ResNeXt might be outperforming ResNet and VGG. Consider factors like cardinality, grouped convolutions, and the overall architecture.

**Reason why ResNeXt does better than ResNet and VGG**


**ResNeXt Achieves Higher Accuracy**

1. 93.09% on the test set, above VGG's 91.53% and ResNet's 92.67%.
2. Converging to low loss (0.1801 training loss) within 10 epochs compared to ResNet's 20.

**ResNeXt Uses Grouped Convolutions(Cardinality)**

1. Splits channels into several groups(cardinality) such that parallel transformations are applied on each group.
2. Thus allowing for more refined feature-extraction operations than using one very large 3×3 filter in ResNet, or resistively in a layer fashion in VGG.

**Efficient Parameter Utilization**

1. ResNeXt increases cardinality(e.g., 32 groups) for more expressive power without much increase in the number of parameters.
2. In comparison, VGG simply stacks more layers, making the optimization harder and leading to the vanishing gradient problem.


**Skip Connections vs. Plain Stacking**

1. ResNet and ResNeXt both use skip connections to facilitate training.
2. VGG does not have skip connections, thereby making training its deeper layers very difficult.
3. Grouped convolutions in ResNeXt further improves upon ResNet to manifest this into a stronger architecture.



### c. Analysis of any challenges encountered during the implementation or training process.

**Challenges faced**

**Overfitting**
1. VGG demonstrated overfitting, showing low train accuracy, then making a giant leap to high val accuracy in the initial epochs.
2. ResNeXt mitigates overfitting with the use of Dropout, Weight Decay ( AdamW), and data augmentations.

**Keeping LR and Regularization in Balance**

1. Setting the LR too low may stall, while a high LR may make it unstable.
2. StepLR or ReduceLROnPlateau keeps training stable.

**Vanishing Total Gradient in VGG**

1. Deeper models with no skip connection are vulnerable to gradient problem

### d. Provide detailed analysis of the results.

**Detailed analysis of results**

**VGG**

1. Mostly trained until final train accuracy of 63%.
2. Reached a commendable test accuracy of 91.53% within 10 epochs, not surpassing ResNet or ResNeXt.
   
**ResNet**
1. Improved over VGG significantly with a 92.67% test accuracy
2. 20 epochs were required to reach final performance
3. Though skip connections ease training, the standard ResNet architecture does not leverage grouped convolutions.

**ResNeXt**

1. The best performing model: 94.07% train accuracy after only 10 epochs.
2. Contains cardinality for increasing representational power without the increase of depth.

**Suggested Models**

1. ResNeXt is your go to model if high accuracy coupled with faster convergence is your goal.
2. ResNet is still good but dwarfed by the performance of ResNeXt.
3. VGG is less efficient and rather slow because of easy implementation

### 4.	References

1. https://medium.com/@atakanerdogan305/resnext-a-new-paradigm-in-image-processing-ee40425aea1f
2. https://medium.com/dataseries/enhancing-resnet-to-resnext-for-image-classification-3449f62a774c
3. https://www.researchgate.net/figure/Comparison-between-ResNet-and-ResNeXt-backbone-building-blocks-Figure-modified-from-Xie_fig2_348671702