# Selection question: AlexNet

## Implementation and optimization of a convolutional neural network

Implement a classic convolutional neural network: AlexNet. **Improve its efficiency as much as possible.** Your work will be evaluated by four aspects: *performance, algorithm design, software architecture and readability.*

- Neural network should be implemented by hand, open source arch(Caffe, Tensorflow, Theano, Torch) should not be used directly.
- Both inference and train algorithm should be constructed, performance should be well considered either.
- Project deadline: 2016-12-25

## Getting started

To achieve a easy start, I copied some useful code from https://www.zybuluo.com/hanbingtao/note/476663 and https://www.zybuluo.com/hanbingtao/note/485480, including:
- class Layer, class Node, class ConstNode, class Connection in fc_layer_hbt.py
- class ConvLayer in conv_layer.py

With the codes above, now we have the ability to:
- Construct a full connection layer instance with class Layer, forward calculation function not included in those classes
- Initialize a convolutional layer with class ConvLayer, including backpropagation algorithm and forward computation method

To achieve the goal mentioned at beginning, there exists several works to be finished:
- Design a network class, composing convolutional layer and full connection layer together. Both train and predict function should be implemented. 
- Optimize the training and predicting procedure, improve performance of the network. 

## Software architecture

### Inversion of Control

Before we design the network class, it might be wise to take a look at **alexnet.py** in tensorflow:

In [None]:
net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID',scope='conv1')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool1')
net = slim.conv2d(net, 192, [5, 5], scope='conv2')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool2')
net = slim.conv2d(net, 384, [3, 3], scope='conv3')
net = slim.conv2d(net, 384, [3, 3], scope='conv4')
net = slim.conv2d(net, 256, [3, 3], scope='conv5')
net = slim.max_pool2d(net, [3, 3], 2, scope='pool5')

Apparently, to make the procedure of network building looks more friendly, principle IoC was applied: instance net is passed to the constructor function of each layer-- my network class will learn from this to take more readability for the code：

In [2]:
class Network(object):
    def __init__(self):
        self.layers = []

    def append_layer(self, layer):
        self.layers.append(layer)

Member variable **layers** will be initialized to hold layers in the network. Constructor method of layer will invoke the method *append_layer* to append themselves to the network.

### Training methods

As we all know, the training task of a neural network can be divided into three subtasks:
- Forward calculation, or model prediction: given input sample, calculate the output vector due to NN model
- Backpropagation-1: $\delta$ calculation of each layer. Acturally, $\delta_{l}$ is the partial derivative of the loss function $E_{d}$ on the weighted input vector $net_{l}$: $\delta_{l}=\frac{\partial E_{d}}{\partial net_{l}}$
- Backpropagation-2: calculate gradient matrix $\nabla$ for each layer, then update weight matrix $W$.

In [3]:
def train_one_sample(self, label, sample, rate):
        """
        train network with one sample
        """
        self.predict(sample)
        self.calc_delta(label)
        self.update_weight(rate)

    def calc_delta(self, label):
        """
        calc delta of each layer
        """
        output_layer = self.layers[-1]
        output_layer.calc_output_layer_delta(label)
        downstream_layer = output_layer
        for layer in self.layers[-2::-1]:
            layer.calc_layer_delta(downstream_layer)
            downstream_layer = layer

    def update_weight(self, rate):
        """
        update weights of each connection or filter
        """
        for layer in self.layers:
            layer.update_weight(rate)
    def predict(self, sample):
        """
        predict output according to input
        """
        self.layers[0].forward(sample)
        for i in range(1, len(self.layers)):
            self.layers[i].forward(self.layers[i-1].get_output())
        return self.layers[-1].get_output()

Obviously, some interface method should be implemented in each layer class: *forward*, *calc_layer_delta*, *update_weight*.

### Redesign of full connection layer

I need to redesign the full connection layer class, for those reasons:
- Full connect layers implemented with Node and Connection objects will encounter a great problem when docking with convolutional layer copied from hanbingtao.
- Layers with large amount of nodes takes large memory storing their Node and Connection objects, besides,  disastrous efficiency problem will be caused when executing the backpropagation algorithm.

To take advantage of numpy's matrix calculation ability, I will implement methods in fc layer using matrix operation.

In [1]:
import numpy as np

class FcLayer(object):
    def __init__(self, network, node_count, activator):
        self.output_array = np.zeros([node_count])
        self.bias_array = np.zeros([node_count])
        self.input_shape = network.layers[-1].get_output().shape
        self.input_1dim = reduce(lambda ret, dim: ret * dim, self.input_shape, 1)
        self.trans_matrix = np.zeros([node_count, self.input_1dim])
        self.activator = activator
        network.append_layer(self)

In [None]:
def forward(self, input_array):
        self.input_array = input_array.reshape(self.input_1dim)
        self.output_array = np.dot(self.trans_matrix, self.input_array) + self.bias_array
        self.output_array = np.array([self.activator.forward(value) for value in self.output_array])

In [None]:
def calc_output_layer_delta(self, label):
        derivative = np.array([self.activator.backward(out) for out in self.output_array])
        self.delta_array = derivative * (label - self.output_array)

    def calc_layer_delta(self, downstream_layer):
        derivative = np.array([self.activator.backward(out) for out in self.output_array])
        self.delta_array = derivative * downstream_layer.get_transformed_delta()

In [None]:
def update_weight(self, rate):
        self.trans_matrix += rate * np.dot(self.delta_array.reshape([len(self.delta_array),1]),self.input_array.reshape([1, self.input_1dim]))
        self.bias_array += rate * self.delta_array

## Performance

## Performance