![MLA Logo](https://drive.corp.amazon.com/view/mrruckma@/MLA_headerv2.png?download=true)

#### In this example, we will do an advanced task: Adding new layers to a pre-trained network. We will add another fully connected layer (along with its dropout) and the last softmax layer to a pre-trained Alexnet. 

#### In this section, we will first create "our own" Alexnet. Then, we will copy the weights from a pre-trained alexnet by matching the names of the layers. Be careful with the naming of the layers in the pre-trained network and ours. We will match layers by their names, their shapes should also match.
#### Alexnet has the following architecture, let's start building it below.
![alexnet](https://drive.corp.amazon.com/view/cesazara@/cv-notebook-images/alexnet.png?download=true)

#### We will implement a custom Alexnet where we add antoher Fully Connected Layer with 4096 neurons. Our custom network will be like this:
![custom alexnet](https://drive.corp.amazon.com/view/cesazara@/cv-notebook-images/custom_alexnet.png?download=true)

In [1]:
import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from mxnet.gluon.model_zoo import vision
import mxnet.ndarray as nd

# Set this to GPU or CPU
# ctx = mx.gpu()
ctx = mx.cpu()

# Define custom model, we will add an extra fully connected layer with dropout differently from original.
alex_net = gluon.nn.Sequential()
alex_net.add(nn.Conv2D(64, kernel_size=11, strides=4, padding=2, activation='relu'))
alex_net.add(nn.MaxPool2D(pool_size=3, strides=2))
alex_net.add(nn.Conv2D(192, kernel_size=5, padding=2, activation='relu'))
alex_net.add(nn.MaxPool2D(pool_size=3, strides=2))
alex_net.add(nn.Conv2D(384, kernel_size=3, padding=1, activation='relu'))
alex_net.add(nn.Conv2D(256, kernel_size=3, padding=1, activation='relu'))
alex_net.add(nn.Conv2D(256, kernel_size=3, padding=1, activation='relu'))
alex_net.add(nn.MaxPool2D(pool_size=3, strides=2))
alex_net.add(nn.Flatten())
alex_net.add(nn.Dense(4096, activation='relu'))
alex_net.add(nn.Dropout(0.5))
alex_net.add(nn.Dense(4096, activation='relu'))
alex_net.add(nn.Dropout(0.5))

# must initialize parameters before changing,
# so pass through example batch since lazy initialization
alex_net.initialize(ctx=ctx)
alex_net(nd.random.uniform(shape=(10, 3, 224, 224)).as_in_context(ctx))

# load pretrained model
pretrained_alex_net = vision.alexnet(pretrained=True, ctx=ctx)

# create parameter dictionaries
model_params = {name: param for name, param in alex_net.collect_params().items()}
pretrained_model_params = {name: param for name, param in pretrained_alex_net.collect_params().items()}

### Let's print our custom model

In [2]:
alex_net

Sequential(
  (0): Conv2D(3 -> 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2), Activation(relu))
  (1): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (2): Conv2D(64 -> 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), Activation(relu))
  (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (4): Conv2D(192 -> 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (5): Conv2D(384 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (6): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (7): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (8): Flatten
  (9): Dense(9216 -> 4096, Activation(relu))
  (10): Dropout(p = 0.5, axes=())
  (11): Dense(4096 -> 4096, Activation(relu))


### Let's print the pretrained model

In [3]:
pretrained_alex_net

AlexNet(
  (features): HybridSequential(
    (0): Conv2D(3 -> 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2), Activation(relu))
    (1): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
    (2): Conv2D(64 -> 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), Activation(relu))
    (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
    (4): Conv2D(192 -> 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
    (5): Conv2D(384 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
    (6): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
    (7): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
    (8): Flatten
    (9): Dense(9216 -> 4096, Activation(relu))
    (10): Dropout(p = 0.5, axes

#### We will match the layers by their names. Let's print them below.
#### Pay attention to the naming difference between pre-trained and custom model. We have an extra "alexnet0_" substring in the parameter names of the pre-trained network. 

In [7]:
print("Pre-trained Alexnet Parameters")
print(pretrained_model_params)
print("Custom Alexnet Parameters")
print(model_params)

Pre-trained Alexnet Parameters
{'alexnet0_conv0_weight': Parameter alexnet0_conv0_weight (shape=(64, 3, 11, 11), dtype=<class 'numpy.float32'>), 'alexnet0_conv0_bias': Parameter alexnet0_conv0_bias (shape=(64,), dtype=<class 'numpy.float32'>), 'alexnet0_conv1_weight': Parameter alexnet0_conv1_weight (shape=(192, 64, 5, 5), dtype=<class 'numpy.float32'>), 'alexnet0_conv1_bias': Parameter alexnet0_conv1_bias (shape=(192,), dtype=<class 'numpy.float32'>), 'alexnet0_conv2_weight': Parameter alexnet0_conv2_weight (shape=(384, 192, 3, 3), dtype=<class 'numpy.float32'>), 'alexnet0_conv2_bias': Parameter alexnet0_conv2_bias (shape=(384,), dtype=<class 'numpy.float32'>), 'alexnet0_conv3_weight': Parameter alexnet0_conv3_weight (shape=(256, 384, 3, 3), dtype=<class 'numpy.float32'>), 'alexnet0_conv3_bias': Parameter alexnet0_conv3_bias (shape=(256,), dtype=<class 'numpy.float32'>), 'alexnet0_conv4_weight': Parameter alexnet0_conv4_weight (shape=(256, 256, 3, 3), dtype=<class 'numpy.float32'>), '

#### Let's transfer weights by matching the names of the parameters below. We will have mismatches for the new added layers and 

In [4]:
for name, param in model_params.items():
    lookup_name = 'alexnet0_' + name
    if lookup_name in pretrained_model_params:
        lookup_param = pretrained_model_params[lookup_name]
        if lookup_param.shape == param.shape:
            param.set_data(lookup_param.data())
            print("Sucessful match for {}.".format(name))
        else:
            print("Error: Shape mismatch for {}. {}!={}".format(name, lookup_param.shape, param.shape))
    else:
        print("Error: Couldn't find match for {}.".format(name))

Sucessful match for conv0_weight.
Sucessful match for conv0_bias.
Sucessful match for conv1_weight.
Sucessful match for conv1_bias.
Sucessful match for conv2_weight.
Sucessful match for conv2_bias.
Sucessful match for conv3_weight.
Sucessful match for conv3_bias.
Sucessful match for conv4_weight.
Sucessful match for conv4_bias.
Sucessful match for dense0_weight.
Sucessful match for dense0_bias.
Sucessful match for dense1_weight.
Sucessful match for dense1_bias.


#### In this part, we will add the remainder of the network

In [5]:
alex_net.add(nn.Dense(4096, activation='relu'))
alex_net.add(nn.Dropout(0.5))
alex_net.add(nn.Dense(4096, activation='relu'))
alex_net.add(nn.Dropout(0.5))
alex_net.add(nn.Dense(5))

In [6]:
alex_net

Sequential(
  (0): Conv2D(3 -> 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2), Activation(relu))
  (1): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (2): Conv2D(64 -> 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), Activation(relu))
  (3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (4): Conv2D(192 -> 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (5): Conv2D(384 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (6): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (7): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (8): Flatten
  (9): Dense(9216 -> 4096, Activation(relu))
  (10): Dropout(p = 0.5, axes=())
  (11): Dense(4096 -> 4096, Activation(relu))
