# AlexNet

The work that perhaps could be credited with sparking renewed interest in neural networks and the beginning of the dominance of deep learning in many computer vision applications was the 2012 paper by Alex Krizhevsky et al. titled [ImageNet Classification with Deep Convolutional
Neural Networks](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf). The paper describes a model later referred to as **AlexNet** designed to address the ImageNet Large Scale Visual Recognition Challenge or ILSVRC-2010 competition for classifying photographs of objects into one of 1,000 different categories.

The ILSVRC was a competition designed to spur innovation in the field of computer vision. Before the development of AlexNet, the task was thought very difficult and far beyond the capability of modern computer vision methods. 

> AlexNet successfully demonstrated the capability
of the convolutional neural network model in the domain and kindled a fire that resulted in many more improvements and innovations, many demonstrated on the same ILSVRC task in subsequent years. 

More broadly, **the paper showed that it is possible to develop deep and effective end-to-end models** for a challenging problem without using unsupervised pre-training techniques popular at the time.

Important in the design of AlexNet was a suite of new or successful methods, but not widely adopted at the time. Now, they have become requirements when using CNNs for image classification. 

> AlexNet used the rectified linear activation function, or ReLU, as the nonlinearly after each convolutional layer, instead of S-shaped functions such as the logistic or Tanh that were common up until that point. A softmax activation function was used in the output layer, now a staple for multiclass classification with neural networks.

The average pooling used in LeNet-5 was replaced with a max-pooling method, although in this case, overlapping pooling was found to outperform non-overlapping pooling that is commonly used today (e.g., stride of pooling operation is the same size as the pooling operation,
e.g., 2 by 2 pixels). The newly proposed dropout method was used to address overfitting between the fully connected layers of the classifier part of the model to improve generalization error. The architecture of AlexNet is deep and extends upon some of the patterns established
with LeNet-5. The image below, taken from the paper, summarizes the model architecture, in this case, split into two pipelines to train on the GPU hardware of the time.

<img width="800" src="https://drive.google.com/uc?export=view&id=111DrLxQn-ejJ2-7zNA4M8t78wmqcMQNe"/>


**It is similar to LeNet-5, only much larger and deeper**, and it was the first to stack convolutional layers directly on top of one another, instead of stacking a pooling layer on top of each convolutional layer. Table below presents this architecture.

<center><img width="600" src="https://drive.google.com/uc?export=view&id=193aOD83q_m_apxqjv1kFSjGRYFV7HfL2"></center><center>AlexNet Architecture.</center>


The model has **five convolutional layers** in the **feature extraction part** of the model and **three fully connected layers** in the **classifier part** of the model. Input images were fixed to the size **227x227 (there is a typo in the original paper using 224 x 224) with three color channels**. In terms of the number of **filters** used in each convolutional layer, the pattern of increasing the number of filters with depth seen in LeNet was mainly adhered to; in this case, the sizes: **96, 256, 384, 384, and 256**. Similarly, the **pattern of decreasing the size of the filter** (kernel) with depth was used, starting from the smaller size of 11x11 and decreasing to 5x5, and then to 3x3 in the deeper layers. **The use of small filters such as 5x5 and 3x3 is now the norm**.

The pattern of a convolutional layer followed by a pooling layer was used at the start and end of the feature detection part of the model. Interestingly, **a pattern of a convolutional layer followed immediately by a second convolutional** layer was used. **This pattern too has become a modern standard**. 

To reduce overfitting, the authors used **two regularization techniques**. First, they applied dropout with a 50% dropout rate during training to the outputs of layers F9 and F10. Second, they performed **data augmentation** by randomly shifting the training images by various offsets, flipping them horizontally, and changing the lighting conditions.

**Data augmentation** artificially increases the size of the training set by generating many realistic variants of each training instance. **This reduces overfitting**, making this a regularization technique. The generated instances should be as realistic as possible: ideally, given an image from the augmented training set, a human should not be able to tell whether it was augmented or not. Simply adding white noise will not help; the modifications should be learnable (white noise is not).

AlexNet also uses a competitive normalization step immediately after the ReLU step of layers C1 and C3, called **Local Response Normalization (LRN)**: the most strongly activated neurons inhibit other neurons located at the same position in neighboring feature maps (such competitive activation has been observed in biological neurons). This encourages different feature maps to specialize, pushing them apart and forcing them to explore a wider range of features, ultimately improving generalization.



We can summarize the key aspects of the architecture relevant in modern models as follows:

- Use of the ReLU activation function after convolutional layers and softmax for the output layer.
- Use of Max Pooling instead of Average Pooling.
- Use of Dropout regularization between the fully connected layers.
- The pattern of a convolutional layer fed directly to another convolutional layer.
- Use of Data Augmentation.


> A variant of AlexNet called [ZF Net](https://arxiv.org/abs/1311.2901) was developed by Matthew Zeiler and Rob Fergus and won the 2013 ILSVRC challenge. It is essentially AlexNet with a few tweaked hyperparameters (number of feature maps, kernel size, stride, etc.).

## Step 01: Setup

Start out by installing the experiment tracking library and setting up your free W&B account:


*   **pip install wandb** – Install the W&B library
*   **import wandb** – Import the wandb library
*   **wandb login** – Login to your W&B account so you can log all your metrics in one place

In [None]:
!pip install wandb -qU

[K     |████████████████████████████████| 1.9 MB 4.9 MB/s 
[K     |████████████████████████████████| 166 kB 67.7 MB/s 
[K     |████████████████████████████████| 182 kB 65.3 MB/s 
[K     |████████████████████████████████| 63 kB 1.8 MB/s 
[K     |████████████████████████████████| 166 kB 70.1 MB/s 
[K     |████████████████████████████████| 162 kB 65.8 MB/s 
[K     |████████████████████████████████| 162 kB 72.5 MB/s 
[K     |████████████████████████████████| 158 kB 71.7 MB/s 
[K     |████████████████████████████████| 157 kB 61.0 MB/s 
[K     |████████████████████████████████| 157 kB 61.4 MB/s 
[K     |████████████████████████████████| 157 kB 74.7 MB/s 
[K     |████████████████████████████████| 157 kB 75.9 MB/s 
[K     |████████████████████████████████| 157 kB 74.3 MB/s 
[K     |████████████████████████████████| 157 kB 47.4 MB/s 
[K     |████████████████████████████████| 157 kB 75.6 MB/s 
[K     |████████████████████████████████| 156 kB 65.9 MB/s 
[?25h  Building wheel for 

In [None]:
# a Python package for tracking the carbon emissions produced by various
# kinds of computer programs, from straightforward algorithms to deep neural networks.
!pip install codecarbon

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting codecarbon
  Downloading codecarbon-2.1.4-py3-none-any.whl (174 kB)
[K     |████████████████████████████████| 174 kB 4.9 MB/s 
Collecting arrow
  Downloading arrow-1.2.3-py3-none-any.whl (66 kB)
[K     |████████████████████████████████| 66 kB 3.4 MB/s 
Collecting py-cpuinfo
  Downloading py-cpuinfo-8.0.0.tar.gz (99 kB)
[K     |████████████████████████████████| 99 kB 10.6 MB/s 
Collecting pynvml
  Downloading pynvml-11.4.1-py3-none-any.whl (46 kB)
[K     |████████████████████████████████| 46 kB 5.0 MB/s 
[?25hCollecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Building wheels for collected packages: py-cpuinfo
  Building wheel for py-cpuinfo (setup.py) ... [?25l[?25hdone
  Created wheel for py-cpuinfo: filename=py_cpuinfo-8.0.0-py3-none-any.whl size=22257 sha256=a9f1700fc613e07e9194b80a2774691f932c0a361155d0aa0e77822632dc57c8
  Stored in dir

### Import Packages

In [None]:
# import the necessary packages
import logging
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K
import matplotlib.pyplot as plt
import numpy as np
from codecarbon import EmissionsTracker
from tensorflow.keras.callbacks import Callback
from wandb.keras import WandbCallback
import wandb

In [None]:
wandb.login()

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 

··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [None]:
# configure logging
# reference for a logging obj
logger = logging.getLogger()

# set level of logging
logger.setLevel(logging.INFO)

# create handlers
c_handler = logging.StreamHandler()
c_format = logging.Formatter(fmt="%(asctime)s %(message)s",datefmt='%d-%m-%Y %H:%M:%S')
c_handler.setFormatter(c_format)

# add handler to the logger
logger.handlers[0] = c_handler

## Implementing the AlexNet model

<center><img width="400" src="https://drive.google.com/uc?export=view&id=193aOD83q_m_apxqjv1kFSjGRYFV7HfL2"></center><center>AlexNet Architecture.</center>


In [None]:
class AlexNet:
  ''' 
  # create AlexNet model
  #
  # it is composed of the 9 layers 
  # such as:
  #      - 2 blocks CONV => RELU => POOL
  #      - 3 blocks CONV => RELU
  #      - 1 flatten layer
  #      - 2 fully connected layers
  #      - 1 output layer with 1000 outputs
  #      - input shape = (227,227,3)
  '''
  @staticmethod
  def build(width, height, depth, classes):
    # initialize the model
    model = Sequential()
    inputShape = (height, width, depth)
    
    # if we are using "channels first", update the input shape
    if K.image_data_format() == "channels_first":
      inputShape = (depth, height, width)
   
    # Block #1: first CONV => RELU => POOL layer set
    model.add(Conv2D(96, (11, 11), strides=(4, 4),
                    input_shape=inputShape, padding="valid",
                    kernel_regularizer=l2(0.0002),activation='relu'))

    # Batch Normalization does not exist in 2012, here is a modification of original proposal
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    # In the original paper dropout was used only in FC layers
    model.add(Dropout(0.25))

    # Block #2: second CONV => RELU => POOL layer set
    model.add(Conv2D(256, (5, 5), padding="same",
                    kernel_regularizer=l2(0.0002),activation='relu'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    model.add(Dropout(0.25))

    # Block #3: CONV => RELU => CONV => RELU => CONV => RELU
    model.add(Conv2D(384, (3, 3), padding="same",
                    kernel_regularizer=l2(0.0002),activation='relu'))
    model.add(BatchNormalization())
    model.add(Conv2D(384, (3, 3), padding="same",
                    kernel_regularizer=l2(0.002),activation='relu'))
    model.add(BatchNormalization())
    model.add(Conv2D(256, (3, 3), padding="same",
                    kernel_regularizer=l2(0.002),activation='relu'))
    model.add(BatchNormalization())
    model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
    model.add(Dropout(0.25))

    # Block #4: first set of FC => RELU layers
    model.add(Flatten())
    model.add(Dense(4096, kernel_regularizer=l2(0.0002),activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # Block #5: second set of FC => RELU layers
    model.add(Dense(4096, kernel_regularizer=l2(0.0002),activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # softmax classifier
    model.add(Dense(classes, kernel_regularizer=l2(0.0002)))
    model.add(Activation("softmax"))
        
    # return the constructed network architecture
    return model

In [None]:
# create a model object
model = AlexNet.build(227,227,3,1000)

# summarize layers
model.summary()