# ImageNet and AlexNet

A turning point in the history of deep learning

*ImageNet* is one of the largest image dataset used as a metric to evaluate state of the art NN architectures  
14 million images and 1000 categories. We often report the top 5-accuracy metric

In [1]:
# Don't execute unless you have a good bandwidth and enough hard drive space (150GB)
# Update: ImageNet is not available to download from torchvision directly. It needs to be downloaded by hand
'''
torchvision.datasets.ImageNet(root='./imageNet', split='train', 
                              transform=torchvision.transforms.Compose([
                                  torchvision.transforms.ToTensor(),
                                  transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])], download=True)
'''

"\ntorchvision.datasets.ImageNet(root='./imageNet', split='train', \n                              transform=torchvision.transforms.Compose([\n                                  torchvision.transforms.ToTensor(),\n                                  transforms.Normalize(mean=[0.485, 0.456, 0.406],\n                                 std=[0.229, 0.224, 0.225])], download=True)\n"

In 2012, Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton proposed a CNN architecture and implemented it on Cuda GPUs to compete in the *ImageNet* competition

<center>
    <img src='images/Comparison_image_neural_networks.png' width=55% style="margin-left:auto; margin-right:auto"/>
    <p style="font-size:14px;">Source: <a href='https://en.wikipedia.org/wiki/AlexNet'>Wikipedia</a></p>
</center>

TODO: Based on the right end of the diagram, implement the AlexNet architecture

In [8]:
import torch
import torch.nn as nn
import torch.functional as F

class AlexNet(nn.Module):
    
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 96, 11, stride=4)
        self.maxPool = nn.MaxPool2d(3, stride=2)
        self.conv2 = nn.Conv2d(96, 256, 5, padding=2)
        self.conv3 = nn.Conv2d(256, 384, 3, padding=1)
        self.conv4 = nn.Conv2d(384, 384, 3, padding=1)
        self.conv5 = nn.Conv2d(384, 256, 3, padding=1)
        self.fc1 = nn.Linear(5 * 5 * 256, 4096)
        self.fc2 = nn.Linear(4096, 4096)
        self.fc3 = nn.Linear(4096, 1000)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=0.5)
    
    def forward(self, x):
        x = self.maxPool(self.relu(self.conv1(x)))
        x = self.maxPool(self.relu(self.conv2(x)))
        x = self.relu(self.conv3(x))
        x = self.relu(self.conv4(x))
        x = self.relu(self.conv5(x))
        x = self.maxPool(x)
        x = x.reshape(x.size(0), -1)
        x = self.dropout(self.relu(self.fc1(x)))
        x = self.dropout(self.relu(self.fc2(x)))
        return self.fc3(x)

In [9]:
net = AlexNet()
net(torch.randn(2, 3, 224, 224))

torch.Size([2, 96, 26, 26])


tensor([[-0.0229, -0.0044,  0.0065,  ...,  0.0175,  0.0012,  0.0135],
        [-0.0085,  0.0136,  0.0106,  ...,  0.0064,  0.0017,  0.0237]],
       grad_fn=<AddmmBackward0>)