### AlexNet

> https://github.com/pytorch/hub/blob/master/pytorch_vision_alexnet.md

In [None]:
import torch

# pretrained 
model = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)

# eval mode
model.eval()

In [None]:
# AlexNet(
#   (features): Sequential(
#      # the first convolutional layer
#     (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    
#     # normalization, pooling layers
#     (1): ReLU(inplace=True)
#     (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

#     # the second convolutional layer
#     (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))

#     # normalization, pooling layers
#     (4): ReLU(inplace=True)
#     (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)

#     # the third convolutional layer
#     (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

#     # normalization layer
#     (7): ReLU(inplace=True)

#     # the fourth convolutional layer
#     (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

#     # normalization layer
#     (9): ReLU(inplace=True)

#     # the fifth convolutional layer
#     (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

#     # normalization, pooling layers
#     (11): ReLU(inplace=True)
#     (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
#   )

#   (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))

#   (classifier): Sequential(
      
#     # regularization method
#     (0): Dropout(p=0.5, inplace=False)

#     # fully_connected layer
#     (1): Linear(in_features=9216, out_features=4096, bias=True)

#     # normalization layer
#     (2): ReLU(inplace=True)

#     # regularization method
#     (3): Dropout(p=0.5, inplace=False)

#      # fully_connected layer
#     (4): Linear(in_features=4096, out_features=4096, bias=True)

#     # normalization layer
#     (5): ReLU(inplace=True)

#     # fully_connected layer
#     (6): Linear(in_features=4096, out_features=1000, bias=True)
#   )
# )

In [20]:
from torchsummary import summary
from torchvision import models

alexnet = models.alexnet()

# check summary
summary(alexnet, batch_size=-1, input_size=(3, 224, 224), device="cuda") # batch size is set to -1 meaning any batch size

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 55, 55]          23,296
              ReLU-2           [-1, 64, 55, 55]               0
         MaxPool2d-3           [-1, 64, 27, 27]               0
            Conv2d-4          [-1, 192, 27, 27]         307,392
              ReLU-5          [-1, 192, 27, 27]               0
         MaxPool2d-6          [-1, 192, 13, 13]               0
            Conv2d-7          [-1, 384, 13, 13]         663,936
              ReLU-8          [-1, 384, 13, 13]               0
            Conv2d-9          [-1, 256, 13, 13]         884,992
             ReLU-10          [-1, 256, 13, 13]               0
           Conv2d-11          [-1, 256, 13, 13]         590,080
             ReLU-12          [-1, 256, 13, 13]               0
        MaxPool2d-13            [-1, 256, 6, 6]               0
AdaptiveAvgPool2d-14            [-1, 25

In [15]:
from torch.nn.utils import parameters_to_vector as p2v
p2v(alexnet.parameters()).numel()

61100840

In [21]:
# download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x `H` x `W`), where `H` and `W` are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using `mean` = [0.485, 0.456, 0.406] and `std` = [0.229, 0.224, 0.225].

In [25]:
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.405], std=[0.229, 0.224, 0.225])
])

input_tensor = preprocess(input_image)

input_batch = input_tensor.unsqueeze(0)

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.inference_mode():
    output = model(input_batch)

# check shape of output
print(f"Shape of ouput: {output.shape}") # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes

# run a softmax
probabilities = torch.nn.functional.softmax(output[0], dim=0)
print(probabilities)

Shape of ouput: torch.Size([1, 1000])
tensor([6.8070e-09, 4.5677e-10, 5.7809e-09, 5.2474e-10, 1.4567e-09, 5.0166e-08,
        1.0571e-07, 1.3456e-05, 1.1150e-04, 1.7735e-08, 1.3930e-08, 1.9339e-08,
        2.8034e-08, 4.8585e-09, 7.7248e-09, 1.3523e-09, 2.0293e-08, 1.0262e-07,
        4.3273e-08, 3.1826e-10, 1.2125e-09, 2.6504e-06, 1.1823e-08, 3.5808e-06,
        3.5143e-08, 1.6876e-10, 3.1054e-10, 1.1854e-09, 5.7156e-10, 4.7373e-08,
        1.3152e-09, 4.3327e-11, 3.1284e-10, 5.4636e-10, 4.0191e-09, 1.8491e-09,
        7.4754e-07, 9.7579e-10, 5.9732e-11, 4.2378e-10, 1.2135e-09, 2.1615e-10,
        2.6510e-10, 1.4177e-10, 8.5561e-10, 6.3991e-10, 4.6318e-08, 4.0571e-10,
        1.2357e-10, 1.4798e-10, 2.4508e-09, 1.4211e-09, 6.8810e-09, 1.8810e-10,
        2.2684e-09, 2.5487e-09, 5.6846e-09, 3.4717e-09, 1.7955e-10, 8.1710e-10,
        1.6106e-09, 5.5779e-10, 1.8480e-10, 3.3829e-10, 9.8067e-10, 6.3747e-10,
        6.4716e-10, 3.7525e-10, 1.4146e-09, 1.4597e-11, 8.2853e-09, 7.6292e-10,
  

In [None]:
import wget

# download ImageNet labels
# wget.download("https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt")

In [50]:
# read the categories
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

# show top categories per image
top5_prob, top5_catid = torch.topk(probabilities, 5) # top `k` largest elements with their indexes
for i in range(top5_prob.size(0)): # 5
    print(categories[top5_catid[i]], top5_prob[i].item())

Samoyed 0.7248545289039612
wallaby 0.13807834684848785
Pomeranian 0.05914163589477539
Angora 0.023098941892385483
Arctic fox 0.01257919892668724


` 40% top-1 error` should mean `60% accuracy`

### Model Description

AlexNet competed in the ImageNet Large Scale Visual Recognition Challenge on September 30, 2012. The network achieved a top-5 error of 15.3%, more than 10.8 percentage points lower than that of the runner up. The original paper's primary result was that the depth of the model was essential for its high performance, which was computationally expensive, but made feasible due to the utilization of graphics processing units (GPUs) during training.

The 1-crop error rates on the imagenet dataset with the pretrained model are listed below.

| Model structure | Top-1 error | Top-5 error |
| --------------- | ----------- | ----------- |
|  alexnet        | 43.45       | 20.91       |

### References

1. [One weird trick for parallelizing convolutional neural networks](https://arxiv.org/abs/1404.5997).
2. [ImageNet: what is top-1 and top-5 error rate?](https://stats.stackexchange.com/questions/156471/imagenet-what-is-top-1-and-top-5-error-rate)