# Using Tensorflow to implement AlexNet

[reference](https://github.com/linlinyaoyao/TensorFlowPro/blob/master/%E5%9F%BA%E7%A1%80%E6%A1%88%E4%BE%8B%E6%95%99%E7%A8%8B/2.%E5%9F%BA%E4%BA%8EVisual%20Studio%20Tools%20for%20AI%E7%9A%84TensorFlow%E7%BC%96%E7%A8%8B%E5%AE%9E%E7%8E%B0CNN%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C.md)
    

## Step1 : Import libs and define tool functions

* import datetime package to get time
* batch size = 32
* batch count = 100
* drop_out means randomly assign some weight to 0, and amplify other weight to 1/keep_prob, this can reduce overfitting.


In [9]:
from datetime import datetime
import math
import time
import tensorflow as tf

batch_size = 32
num_batches = 100

# print layer name and layer shape
def print_activation(t):
    print(t.op.name, ' ', t.get_shape().as_list())
    
# return a full connect layer
# params:
#    input    : data input
#    num_in   : input dimension
#    num_out  : output dimension
def full_connect(input, num_in, num_out, drop_out=1.0):
    w = tf.Variable(tf.truncated_normal([num_in, num_out]))
    b = tf.Variable(tf.truncated_normal([num_out]))
    return tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(tf.matmul(input, w), b)), keep_prob = drop_out)


## Step 2: Construct AlexNet

* Notes
    * [How to intuitively understand convolution(Zh)?](https://www.zhihu.com/question/22298352)
        * Here the convolution between kernel and image is actually using kernel flipped on both height and width to multiply with the area with the same size on image. If there is matching patter, strong signal is outputted, otherwise weak signal is outputted.
        
    * Size calculation
        * If not using padding. Consider 1-D case. Kernel size = a, image size = b, step = s, then final output size is (b-a+1) / s. This can be easily generalized to 2-D case.
        
    * What is Local Response Normalization (LRN) and why we need it? [Ref](https://prateekvjoshi.com/2016/04/05/what-is-local-response-normalization-in-convolutional-neural-networks/)
        * LRN implements the lateral inhibition. This layer is useful when we are dealing with ReLU neurons. Because ReLu neurons have unbounded activations and we need LRN to normalize that. At the same time, it will dampen the responses that are uniformly large in any given local neighborhood.
        * tf.nn.lrn ([Krizhevsky et al., ImageNet classification with deep convolutional neural networks(NIPS2012)](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)
        * LRN does not change the dimensions, the output dimension is the same as it in input.
    * Pooling ([ref](https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/))
        * max pooling
        * average pooling
        * pooling is a downscaling which was aiming to solve sensitivity to feature location
        * In image scenarios max pooling is performed better than average pooling
   


In [10]:
def inference(images):
    parameters = []
    
    # Conv layer 1
    with tf.name_scope('conv1') as scope:
        # Kernel dims:
        # Dim 1,2: kernel size = 11 x 11
        # Dim 3:   channel count = 3
        # Dim 4:   kernel count = 64
        # Here kernel is initialized with truncated_normal distribution, but in real case it should be real kernels for meaningful patterns
        kernel = tf.Variable(tf.truncated_normal([11, 11, 3, 64], stddev=1e-1, dtype=tf.float32), name='weights')
        
        # Stride dims:
        # Dim 1 : step length on samples, 即隔几个样本取一次。
        # Dim 2 : step length on image width
        # Dim 3 : step length on image height
        # Dim 4 : step length on channel
        # Usually setting Dim 1 and 4 to 1 to traversal all samples and channels
        # padding = 'SAME', conv2d result is the same size as original image, it's done by filling 0 in the outbound area and do the convolution.
        conv = tf.nn.conv2d(images, kernel, strides=[1, 4, 4, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(bias, name=scope)
        print_activation(conv1)
        parameters += [kernel, biases]
    
    lrn1 = tf.nn.lrn(conv1, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn1')
    pool1 = tf.nn.max_pool(lrn1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool1')
    print_activation(pool1)
    
    # Conv Layer 2
    with tf.name_scope('conv2') as scope:
        kernel = tf.Variable(tf.truncated_normal([5, 5, 64, 192], stddev=1e-1, dtype=tf.float32), name='weights')
        conv = tf.nn.conv2d(pool1, kernel, strides=[1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[192], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        
    print_activation(conv2)
    lrn2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001/9, beta=0.75, name='lrn2')
    pool2 = tf.nn.max_pool(lrn2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool2')
    print_activation(pool2)
    
    # Conv Layer 3
    with tf.name_scope('conv3') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 192, 384], stddev=1e-1, dtype=tf.float32), name='weights')
        conv = tf.nn.conv2d(pool2, kernel, strides=[1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[384], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv3 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        
    print_activation(conv3)
    
    # Conv Layer 4
    with tf.name_scope('conv4') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 384, 256], stddev=1e-1, dtype=tf.float32), name='weights')
        conv = tf.nn.conv2d(conv3, kernel, strides=[1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv4 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        
    print_activation(conv4)
    
    # Conv Layer 5
    with tf.name_scope('conv5') as scope:
        kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], stddev=1e-1, dtype=tf.float32), name='weights')
        conv = tf.nn.conv2d(conv4, kernel, strides=[1, 1, 1, 1], padding='SAME')
        biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
        bias = tf.nn.bias_add(conv, biases)
        conv5 = tf.nn.relu(bias, name=scope)
        parameters += [kernel, biases]
        
    print_activation(conv5)
    pool5 = tf.nn.max_pool(conv5, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='VALID', name='pool5')
    print_activation(pool5)
    
    flatten = tf.reshape(pool5, [-1, 6*6*256])
    fc_1 = full_connect(flatten, 6*6*256, 4096, 0.5)
    fc_2 = full_connect(fc_1, 4096, 4096, 0.5)
    fc_3 = full_connect(fc_2, 4096, 1000)
    
    return fc_3, parameters

## Evaluate training time


In [11]:
def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10
    total_duration = 0.0
    total_duration_squared = 0.0
    
    for i in range(num_steps_burn_in + num_batches):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time
        if i>= num_steps_burn_in:
            if not i % 10:
                print(r'%s: step:%d. duration = %.3f' % (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_squared += duration * duration
            
    mn = total_duration / num_batches
    vr = total_duration_squared / num_batches - mn * mn
    sd = math.sqrt(vr)
    print(r'%s: %s across %d steps, %.3f +/- %.3f sec / batch' % (datetime.now(), info_string, num_batches, mn, sd))


In [12]:
def run_benchmark():
    with tf.Graph().as_default():
        image_size = 224
        images = tf.Variable(tf.random_normal([batch_size, image_size, image_size, 3], dtype=tf.float32, stddev=1e-1))
        
        fc_3, parameters = inference(images)
        
        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)
        
        time_tensorflow_run(sess, fc_3, "forward")
        objective = tf.nn.l2_loss(fc_3)
        grad = tf.gradients(objective, parameters)
        time_tensorflow_run(sess, grad, 'forward-backward')
        
        
run_benchmark()

W0921 18:06:46.098731  3632 deprecation.py:506] From <ipython-input-9-28928a9921f8>:21: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


conv1   [32, 56, 56, 64]
pool1   [32, 27, 27, 64]
conv2   [32, 27, 27, 192]
pool2   [32, 13, 13, 192]
conv3   [32, 13, 13, 384]
conv4   [32, 13, 13, 256]
conv5   [32, 13, 13, 256]
pool5   [32, 6, 6, 256]
2019-09-21 18:06:53.690444: step:0. duration = 0.030
2019-09-21 18:06:53.991448: step:10. duration = 0.030
2019-09-21 18:06:54.291445: step:20. duration = 0.030
2019-09-21 18:06:54.593444: step:30. duration = 0.030
2019-09-21 18:06:54.894480: step:40. duration = 0.030
2019-09-21 18:06:55.196444: step:50. duration = 0.030
2019-09-21 18:06:55.498444: step:60. duration = 0.031
2019-09-21 18:06:55.800445: step:70. duration = 0.030
2019-09-21 18:06:56.103457: step:80. duration = 0.030
2019-09-21 18:06:56.406446: step:90. duration = 0.030
2019-09-21 18:06:56.677482: forward across 100 steps, 0.030 +/- 0.000 sec / batch
2019-09-21 18:06:57.924149: step:0. duration = 0.087
2019-09-21 18:06:58.793153: step:10. duration = 0.087
2019-09-21 18:06:59.661746: step:20. duration = 0.086
2019-09-21 18: