# AlexNet Project

**AlexNet** is a groundbreaking convolutional neural network (CNN) architecture that has  a profound impact on the field of deep learning and computer vision. Here is an introduction to AlexNet:

### Background
- **Year of Introduction**: AlexNet was introduced in 2012 by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton.
- **Significance**: It was the winning entry in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, which was a major turning point in the field of computer vision. Its success demonstrated the power of deep learning and convolutional neural networks for image classification tasks.

### Key Innovations
- **Deep Architecture**: AlexNet was one of the first deep CNNs with multiple convolutional and fully connected layers. This allowed it to learn more complex and abstract features from the input images.
- **ReLU Activation Function**: The use of ReLU activation functions instead of traditional sigmoid or tanh functions was a significant innovation. ReLU is computationally efficient and helps to alleviate the vanishing gradient problem, allowing the network to train deeper architectures more effectively.
- **Dropout Regularization**: Dropout was used to prevent overfitting. During training, randomly selected neurons are "dropped out" (i.e., their outputs are set to zero) with a certain probability. This forces the network to learn more robust features and prevents it from relying too heavily on any single neuron.
- **Data Augmentation**: AlexNet also employed data augmentation techniques to increase the size and diversity of the training dataset. This included random cropping, flipping, and color jittering of the input images, which helped the network to generalize better to new, unseen images.

In summary, AlexNet is a landmark model in the history of deep learning and computer vision. Its innovative architecture and techniques have had a lasting impact on the field and continue to influence the development of new models and applications.

### Here is an illustration of the architecture of AlexNet.

```markdown

| Layer Type       | Parameters                         | Output Size (C×H×W) |
|------------------|------------------------------------|---------------------|
| Input Image      | -                                  | 3×224×224           |
| Conv1            | kernels:96, size:11×11, stride:4   | 96×55×55            |
| MaxPool1         | size:3×3, stride:2                 | 96×27×27            |
| Conv2            | kernels:256, size:5×5, padding:2   | 256×27×27           |
| MaxPool2         | size:3×3, stride:2                 | 256×13×13           |
| Conv3            | kernels:384, size:3×3, padding:1   | 384×13×13           |
| Conv4            | kernels:384, size:3×3, padding:1   | 384×13×13           |
| Conv5            | kernels:256, size:3×3, padding:1   | 256×13×13           |
| MaxPool3         | size:3×3, stride:2                 | 256×6×6             |
| Flatten          | -                                  | 9216 (256×6×6)      |
| FC1              | 4096 neurons                       | 4096                |
| Dropout          | p=0.5                              | 4096                |
| FC2              | 4096 neurons                       | 4096                |
| Dropout          | p=0.5                              | 4096                |
| FC3 (Output)     | num_classes neurons                | num_classes         |


```

The following is the complete process of using the Fashion-MNIST dataset for classification based on the AlexNet architecture, which is divided into the following steps:

**step 1 :Packages**

**step 2 :Load and preprocess the Fashion-MNIST dataset**

**step 3 :Define the AlexNet model**

**step 4 :Define training and evaluation functions**

**step 5 :Train the model**



Here we should know that  **Fashion-MNIST** is a clothing image dataset that serves as an alternative to MNIST, consisting of 10 classes of grayscale images with a size of 28x28.

**AlexNet**: The original input size is 224x224x3. **It is necessary to perform size enlargement and channel expansion on the Fashion-MNIST images.**

# 1-Packages

In [2]:
##pip install torch torchvision numpy matplotlib   可以自行下载
import torch
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# 2- Load and preprocess the Fashion-MNIST dataset

In the context of training AlexNet, data augmentation techniques are typically applied during the data loading and preprocessing stage. These techniques help to artificially expand the training dataset by creating modified versions of the images, which can **improve the model's ability to generalize and prevent overfitting.**

For the original AlexNet model, the following data augmentation techniques were commonly used:
**Random Cropping**: Randomly cropping the input images to a smaller size (e.g., 224x224) from the original larger images (e.g., 256x256). This helps the model to learn from different parts of the image.

**Horizontal Flipping**: Randomly flipping the images horizontally. This is a simple and effective way to increase the diversity of the training data.

**Color Jittering**: Randomly changing the brightness, contrast, saturation, and hue of the images. This helps the model to become more robust to variations in color and lighting conditions.

**Normalization**: Normalizing the pixel values of the images to have zero mean and unit variance. This helps in stabilizing and speeding up the training process.

In [4]:
###1.download the data from the website(dataset)
###2.batch processing,shuffling(dataloader)
###3.define data augmentation pipeline(if you are not familiar with 'transform' operation , I highly recommend you to learn it from pytorch website)

In [3]:
###YOUR CODE BEGINES HERE

#download the data form the website(dataset)

train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

#batch processing,shuffling(dataloader)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)



# define data augmentation pipeline
transform = transforms.Compose([
    transforms.Resize(224),   #resize the image to 224*224
    transforms.Grayscale(num_output_channels=3), #transfer the channels to 3
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  
])

###YOUR CODE ENDS HERE###


100%|█████████▉| 26.4M/26.4M [00:04<00:00, 5.53MB/s]


RuntimeError: File not found or corrupted.

# 3-Define the AlexNet Model

We have learnt the architecture of AlexNet model before,then you can build that depending on the illustration shown before.

In [6]:
###YOUR CODE BEGINES HERE###


class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),  
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1) 
        x = self.classifier(x)
        return x


###YOUR CODE ENDS HERE###

# 4-Define training and evaluation functions

Then you can use the training data to train your model and evaluate your model on the testing data.

below you may define the training process

In [8]:
###1.set model to train mode
###2.iterate over batches(here train_loader yields batches of(data,labels))
###3.move data to device(GPU/CPU)
###4.reset gradients
###5.forward pass
###6.compute loss
###7.backward pass(gradients)
###8.update model parameters
###9.log progress



###YOUR CODE BEGINS HERE###
def train(model, device, train_loader, optimizer, criterion, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        if batch_idx % 100 == 0:
            print(f"Train Epoch: {epoch} [{batch_idx*len(data)}/{len(train_loader.dataset)}]  Loss: {loss.item():.6f}")

            
###YOUR CODE ENDS HERE###



Below you may define the testing process.

In [10]:
###1.set model to evaluation mode
###2.initilaize correct prediction counter
###3.disable gradient calculation
###4.iterate over test batches
###5.move data to device(GPU/CPU)
###6.forward pass
###7.calculate accuracy

###YOUR CODE BEGINS HERE

def test(model, device, test_loader):
    model.eval()
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            pred = output.argmax(dim=1)
            correct += pred.eq(target).sum().item()
    acc = 100. * correct / len(test_loader.dataset)
    print(f"Test Accuracy: {acc:.2f}%")
    return acc

###YOUR CODE ENDS HERE

# 5-Train and evaluate the model

In [None]:
###1.device setup
###2.model initialization
###3.define optimizer and loss function
###4.training and testing your defined functions


###YOUR CODE BEGINS HERE


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AlexNet(num_classes=10).to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

for epoch in range(1, 100):  
    train(model, device, train_loader, optimizer, criterion, epoch)
    test(model, device, test_loader)

###YOUR CODE ENDS HERE
