<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/alexnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AlexNet: A Deep Convolutional Neural Network

## Introduction

AlexNet is a convolutional neural network architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. It was presented in their 2012 paper "ImageNet Classification with Deep Convolutional Neural Networks" and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 with a top-5 error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

This network marked a breakthrough in the field of computer vision and deep learning, showing that deep learning models could significantly outperform traditional computer vision techniques on large-scale image classification tasks.

## Historical Importance

AlexNet's victory in the 2012 ILSVRC is often cited as a key moment in the history of deep learning, often referred to as the moment when deep learning "took off". The success of AlexNet demonstrated:

1. The power of deep convolutional neural networks for image recognition
2. The importance of GPUs for training large neural networks
3. The effectiveness of techniques like ReLU activations and dropout for deep network training

Following AlexNet, CNNs became the dominant approach to many computer vision tasks, leading to the development of other influential architectures like VGGNet, GoogLeNet, and ResNet.

## Architecture Overview

AlexNet consists of 8 layers:
- 5 convolutional layers
- 3 fully-connected layers

The network also includes max-pooling layers, ReLU (Rectified Linear Unit) activations, and dropout regularization.

### Key architectural features:

1. **Input**: 227×227×3 images
2. **First Convolutional Layer**: 96 kernels of size 11×11×3 with a stride of 4, followed by max pooling
3. **Second Convolutional Layer**: 256 kernels of size 5×5×48, followed by max pooling
4. **Third, Fourth, and Fifth Convolutional Layers**: Various kernel sizes (3×3)
5. **Fully-Connected Layers**: Two layers of 4096 neurons each
6. **Output Layer**: 1000-way softmax (for ImageNet's 1000 classes)

### Notable techniques used in AlexNet:

- **ReLU Activation**: Used instead of tanh or sigmoid, which helped reduce training time
- **Local Response Normalization (LRN)**: Applied after some layers to aid generalization
- **Overlapping Pooling**: Improved accuracy and reduced overfitting
- **Dropout**: Applied to fully connected layers to prevent overfitting
- **Data Augmentation**: Random crops and horizontal flips to artificially expand the training set

## Implementation with PyTorch

Let's implement the AlexNet architecture using PyTorch:

In [None]:
# Install required packages if needed
!pip install torch torchvision matplotlib numpy

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# Check if CUDA is available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

In [None]:
# Define the AlexNet model
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        
        # Feature extraction layers
        self.features = nn.Sequential(
            # Layer 1
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            
            # Layer 2
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            
            # Layer 3
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            
            # Layer 4
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            
            # Layer 5
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        
        # Classification layers
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            
            nn.Linear(4096, num_classes),
        )
        
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# Create an instance of AlexNet
model = AlexNet(num_classes=1000).to(device)
print(model)

## Using Pre-Trained AlexNet from torchvision

In [None]:
# Load pre-trained AlexNet model from torchvision
pretrained_model = torchvision.models.alexnet(pretrained=True)
pretrained_model.eval()  # Set to evaluation mode

# Load ImageNet class labels
import json
import urllib.request

# Download ImageNet class labels if needed
try:
    url = "https://raw.githubusercontent.com/pytorch/examples/master/imagenet/imagenet_classes.txt"
    with urllib.request.urlopen(url) as response:
        classes = [line.decode('utf-8').strip() for line in response.readlines()]
except:
    # Fallback to a smaller subset if download fails
    classes = [f"Class_{i}" for i in range(1000)]

## Image Classification with Pre-trained AlexNet

In [None]:
from PIL import Image
from torchvision import transforms

# Define image preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Function to make predictions on an image
def predict_image(image_path):
    # Load and preprocess the image
    img = Image.open(image_path)
    img_t = preprocess(img)
    batch_t = torch.unsqueeze(img_t, 0).to(device)
    
    # Make a prediction
    with torch.no_grad():
        output = pretrained_model(batch_t)
    
    # Get the top 5 predictions
    _, indices = torch.sort(output, descending=True)
    percentages = torch.nn.functional.softmax(output, dim=1)[0] * 100
    results = [(classes[idx], percentages[idx].item()) for idx in indices[0][:5]]
    
    # Display the image
    plt.figure(figsize=(8, 6))
    plt.imshow(img)
    plt.axis('off')
    plt.title("Top predictions:")
    
    # Display the top 5 predictions
    for i, (cls, prob) in enumerate(results):
        plt.text(5, 30 + i*20, f"{cls}: {prob:.2f}%", fontsize=12, 
                 bbox=dict(facecolor='white', alpha=0.8))
    
    plt.tight_layout()
    plt.show()

# To use this function:
# predict_image('path/to/your/image.jpg')

## Visualizing AlexNet Filters

In [None]:
def visualize_filters(layer_index=0):
    """
    Visualize filters from a specific convolutional layer
    layer_index: Index of the convolutional layer (0 for the first layer)
    """
    # Get the filter weights
    filters = pretrained_model.features[layer_index].weight.data.cpu().numpy()
    
    # Number of filters
    num_filters = filters.shape[0]
    n_cols = 8  # Number of columns in the grid
    n_rows = num_filters // n_cols + (1 if num_filters % n_cols != 0 else 0)
    
    # Create figure for all filters
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, n_rows * 2))
    
    for i in range(n_rows * n_cols):
        row, col = i // n_cols, i % n_cols
        if i < num_filters:
            # Normalize the filter for better visualization
            filt = filters[i].transpose(1, 2, 0)
            filt = (filt - filt.min()) / (filt.max() - filt.min() + 1e-5)
            
            if n_rows > 1:
                axes[row, col].imshow(filt)
                axes[row, col].set_title(f'Filter {i}')
                axes[row, col].axis('off')
            else:
                axes[col].imshow(filt)
                axes[col].set_title(f'Filter {i}')
                axes[col].axis('off')
        else:
            if n_rows > 1:
                axes[row, col].axis('off')
            else:
                axes[col].axis('off')
                
    plt.tight_layout()
    plt.suptitle(f"Layer {layer_index} Filters", fontsize=16)
    plt.subplots_adjust(top=0.92)
    plt.show()

# Visualize the first convolutional layer filters (96 filters)
visualize_filters(0)

## Feature Map Visualization

In [None]:
def visualize_feature_maps(image_path):
    """
    Visualize feature maps produced by the first convolutional layer
    """
    # Load and preprocess image
    img = Image.open(image_path)
    img_t = preprocess(img)
    batch_t = torch.unsqueeze(img_t, 0).to(device)
    
    # Create a hook to capture feature maps
    feature_maps = []
    
    def hook_fn(module, input, output):
        feature_maps.append(output.detach().cpu())
    
    # Register the hook on the first convolutional layer
    hook = pretrained_model.features[0].register_forward_hook(hook_fn)
    
    # Forward pass
    with torch.no_grad():
        pretrained_model(batch_t)
    
    # Remove the hook
    hook.remove()
    
    # Get feature maps
    feature_map = feature_maps[0][0]
    
    # Plot the original image
    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.imshow(img)
    plt.title('Original Image')
    plt.axis('off')
    
    # Plot feature maps
    n = min(16, feature_map.size(0))  # Display up to 16 feature maps
    fig = plt.figure(figsize=(15, 15))
    
    for i in range(n):
        a = fig.add_subplot(4, 4, i+1)
        img_map = feature_map[i].numpy()
        img_map = (img_map - img_map.min()) / (img_map.max() - img_map.min() + 1e-5)
        plt.imshow(img_map, cmap='viridis')
        plt.axis('off')
        a.set_title(f'Feature Map {i}')
        
    plt.tight_layout()
    plt.suptitle('First Layer Feature Maps', fontsize=20)
    plt.subplots_adjust(top=0.93)
    plt.show()

# To use this function:
# visualize_feature_maps('path/to/your/image.jpg')

## AlexNet Performance and Historical Context

### Performance on ImageNet

| Model | Top-1 Accuracy | Top-5 Accuracy |
|-------|---------------|---------------|
| AlexNet (2012) | 57.1% | 80.2% |
| VGG-16 (2014) | 71.3% | 90.1% |
| ResNet-50 (2015) | 76.0% | 92.9% |
| EfficientNet-B7 (2019) | 84.3% | 97.0% |

### Impact and Legacy

AlexNet's impact on deep learning and computer vision has been profound:

1. **Paradigm Shift**: Demonstrated that deep learning can outperform traditional computer vision methods
2. **GPU Adoption**: Popularized the use of GPUs for deep learning
3. **Architectural Innovations**: ReLU activations, dropout regularization, and data augmentation became standard practices
4. **Foundation for Future Networks**: Inspired subsequent CNN architectures like VGGNet, GoogLeNet, and ResNet

### Limitations

While groundbreaking in 2012, AlexNet has several limitations:

- High computational requirements relative to its accuracy
- Large number of parameters (60 million) leading to overfitting risk
- Limited depth compared to modern architectures
- Uses Local Response Normalization, which has been largely replaced by Batch Normalization in modern networks

## Conclusion

AlexNet represents a pivotal moment in the history of artificial intelligence and computer vision. Its success in the 2012 ImageNet competition demonstrated the potential of deep learning approaches and sparked the deep learning revolution that continues to this day. While newer architectures have surpassed its performance, AlexNet's historical importance and influence on neural network design principles ensure its place in the history of artificial intelligence.