<H1 style="text-align: center;"> TensorFlow Tutorial</H1>
<H2 style="text-align: center;"> Convolutional Neural Networks</H2>
<footer style="text-align: center;"> Kent Yu<br><br>10/14/2016</footer>

## Overview
* **Problem** 
    * Classify RGB 32x32 pixel images across 10 categories
    
* **Solution**
    * Build a relatively small convolutional neural network (CNN) for recognizing images
    
* **Implication**
    * Provides a template for constructing larger and more sophisticated models

## Graph
![tensorboard graph](./cifar_graph.png "")

## Conv1 Layer
* ** Input: 128 X 24 X 24 X 3 **
* **Kernel size: 5 (Height) X 5 (Width) X 3 (Channels) X 64 (#of Kernels)**
* **Activation: RELU **
* ** Output: 128 X 24 X 24 X 64 **

![tensorboard graph](./Conv1.jpg "")

```python
  # conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 3, 64],
                                         stddev=5e-2,
                                         wd=0.0)
    conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
    bias = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(bias, name=scope.name)
    _activation_summary(conv1)

```

## Converlutional Network Illustration
<img src="./convolutional-network-demo.gif" width=700/>

## Pool 1 Layer
* ** Input: 128 X 24 X 24 X 64 **

* **KSize:  1 (Batch) X 3 (Height) X 3 (Width) X 1 (Channel) **
* **Stride: 1 (Batch) X 2 (Height) X 2 (Width) X 1 (Channel) **
* **Type: Max**
* **Padding: Same**

* ** Output: 128 X 12 X 12 X 64 **
![tensorboard graph](./Pool1.jpg "")

```Python
  # pool1
  pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
                         padding='SAME', name='pool1')
```                         



## Norm1 Layer

* tf.nn.lrn(**pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='norm1'**)
* **Accoring to CS231n, LRN is rarely used recently**
> Many types of normalization layers have been proposed for use in ConvNet architectures,
> ...
> ...
> However, these layers have since fallen out of favor because in practice their contribution has been shown to be minimal, if any. 
* What it does
![tensorboard graph](./LRN.jpg "")

```Python
  # norm1
  norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm1')
```

## Conv2 Layer
* ** Input: 128 X 12 X 12 X 64 **
* **Kernel size: 5 (Height) X 5 (Width) X 64 (Channel) X 64 (# of Kernels)**
* **Activation: RELU **
* ** Output: 128 X 12 X 12 X 64 **
![tensorboard graph](./Conv2.jpg "")

```Python
 # conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[5, 5, 64, 64],
                                         stddev=5e-2,
                                         wd=0.0)
    conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
    bias = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(bias, name=scope.name)
    _activation_summary(conv2)
```    

## Norm2 Layer

* tf.nn.lrn(**conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm2')**)

```Python
  # norm2
  norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
                    name='norm2')
```               

## Pool 2 Layer
* ** Input: 128 X 12 X 12 X 64 **

* **KSize:  1 (Batch) X 3 (Height) X 3 (Width) X 1 (Channel) **
* **Stride: 1 (Batch) X 2 (Height) X 2 (Width) X 1 (Channel) **
* ** Type: max **
* **Padding: Same**

* ** Output: 128 X 6 X 6 X 64 **
![tensorboard graph](./Pool2.jpg "")

```Python
  # pool2
  pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
                         strides=[1, 2, 2, 1], padding='SAME', name='pool2')
```                    




## Local 3 Layer

* **Input: 128 * 6 * 6 * 64 **
* ** Number of Nurons: 384**
* ** Output: 128 * 384 **
![tensorboard graph](./Local3.jpg "")

```Python
  # local3
  with tf.variable_scope('local3') as scope:
    # Move everything into depth so we can perform a single matrix multiply.
    reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])
    dim = reshape.get_shape()[1].value
    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
    _activation_summary(local3)
```    


## Local 4 Layer

* ** Input: 128 * 384 **
* ** Number of Nurons: 192**
* ** Output: 128 * 192 **
![tensorboard graph](./Local4.jpg "")
```Python
  # local4
  with tf.variable_scope('local4') as scope:
    weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
    _activation_summary(local4)
```    
    

## Softmax
* **Input:128 * 192 **
* **Number of classes: 10**
* **Output: 10 (Probability)**
![tensorboard graph](./Softmax.jpg "")
```Python
  # softmax, i.e. softmax(WX + b)
  with tf.variable_scope('softmax_linear') as scope:
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                          stddev=1/192.0, wd=0.0)
    biases = _variable_on_cpu('biases', [NUM_CLASSES],
                              tf.constant_initializer(0.0))
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
    _activation_summary(softmax_linear)
```    


## Lost Function

```Python
def loss(logits, labels):
  """Add L2Loss to all the trainable variables.

  Add summary for "Loss" and "Loss/avg".
  Args:
    logits: Logits from inference().
    labels: Labels from distorted_inputs or inputs(). 1-D tensor
            of shape [batch_size]

  Returns:
    Loss tensor of type float.
  """
  # Calculate the average cross entropy loss across the batch.
  labels = tf.cast(labels, tf.int64)
  cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
      logits, labels, name='cross_entropy_per_example')
  cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
  tf.add_to_collection('losses', cross_entropy_mean)

  # The total loss is defined as the cross entropy loss plus all of the weight
  # decay terms (L2 loss).
  return tf.add_n(tf.get_collection('losses'), name='total_loss')
```  



### Reduce Mean

reduce_mean(input_tensor, reduction_indices=None, keep_dims=False, name=None)
input_tensor: The tensor to reduce. Should have numeric type.
reduction_indices: The dimensions to reduce. If `None` (the defaut),
    reduces all dimensions.
```Python
# 'x' is [[1., 1. ]]
#         [2., 2.]]
tf.reduce_mean(x) ==> 1.5
tf.reduce_mean(x, 0) ==> [1.5, 1.5]
tf.reduce_mean(x, 1) ==> [1.,  2.]
```

In [45]:
import tensorflow as tf
import numpy as np

sess = tf.Session()

# Assume y_hat is the calculated result from the tensor graph (Logit)
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))
#print ("y_hat=",sess.run(y_hat))

#Softmax:
# For each batch `i` and class `j` we have
#      softmax[i, j] = exp(logits[i, j]) / sum_j(exp(logits[i, j]))
y_hat_softmax = tf.nn.softmax(y_hat)
print ("y_hat_softmax=",sess.run(y_hat_softmax))

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
#print ("y_true=", sess.run(y_true))

loss_per_instance_1 = -tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1])
print ("loss_per_instance_1=",sess.run(loss_per_instance_1))

total_loss_1 = tf.reduce_mean(-tf.reduce_sum(y_true * tf.log(y_hat_softmax), reduction_indices=[1]))
print ("total_loss_1=",sess.run(total_loss_1))

loss_per_instance_2 = tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true)
print ("loss_per_instance_2=",sess.run(loss_per_instance_2))
# array([ 0.4790107 ,  1.19967598])

total_loss_2 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))
print ("loss_total_2=",sess.run(total_loss_2))
# 0.83934333897877922


total_loss_3 = tf.nn.sparse_softmax_cross_entropy_with_logits(y_hat, y_true)
print ("loss_total_3=",sess.run(total_loss_3))


y_hat_softmax= [[ 0.227863    0.61939586  0.15274114]
 [ 0.49674623  0.20196195  0.30129182]]
loss_per_instance_1= [ 0.4790107   1.19967598]
total_loss_1= 0.839343338979
loss_per_instance_2= [ 0.4790107   1.19967598]
loss_total_2= 0.839343338979


ValueError: ('Rank mismatch: Labels rank (received %s) should equal logits rank (received %s) - 1.', 2, 2)

## Training Operation

```Python
def train(total_loss, global_step):
  """Train CIFAR-10 model.

  Create an optimizer and apply to all trainable variables. Add moving
  average for all trainable variables.

  Args:
    total_loss: Total loss from loss().
    global_step: Integer Variable counting the number of training steps
      processed.
  Returns:
    train_op: op for training.
  """
  # Variables that affect learning rate.
  num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
  decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)

  # Decay the learning rate exponentially based on the number of steps.
  lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
                                  global_step,
                                  decay_steps,
                                  LEARNING_RATE_DECAY_FACTOR,
                                  staircase=True)
  tf.scalar_summary('learning_rate', lr)

  # Generate moving averages of all losses and associated summaries.
  loss_averages_op = _add_loss_summaries(total_loss)

  # Compute gradients.
  with tf.control_dependencies([loss_averages_op]):
    opt = tf.train.GradientDescentOptimizer(lr)
    grads = opt.compute_gradients(total_loss)

  # Apply gradients.
  apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)

  # Add histograms for trainable variables.
  for var in tf.trainable_variables():
    tf.histogram_summary(var.op.name, var)

  # Add histograms for gradients.
  for grad, var in grads:
    if grad is not None:
      tf.histogram_summary(var.op.name + '/gradients', grad)

  # Track the moving averages of all trainable variables.
  variable_averages = tf.train.ExponentialMovingAverage(
      MOVING_AVERAGE_DECAY, global_step)
  variables_averages_op = variable_averages.apply(tf.trainable_variables())

  with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
    train_op = tf.no_op(name='train')

  return train_op
```  

## Initialization and Training Step

```Python
def train():
  """Train CIFAR-10 for a number of steps."""
  with tf.Graph().as_default():
    global_step = tf.Variable(0, trainable=False)

    # Get images and labels for CIFAR-10.
    images, labels = cifar10.distorted_inputs()

    # Build a Graph that computes the logits predictions from the
    # inference model.
    logits = cifar10.inference(images)

    # Calculate loss.
    loss = cifar10.loss(logits, labels)

    # Build a Graph that trains the model with one batch of examples and
    # updates the model parameters.
    train_op = cifar10.train(loss, global_step)

    # Create a saver.
    saver = tf.train.Saver(tf.all_variables())

    # Build the summary operation based on the TF collection of Summaries.
    summary_op = tf.merge_all_summaries()

    # Build an initialization operation to run below.
    init = tf.initialize_all_variables()

    # Start running operations on the Graph.
    sess = tf.Session(config=tf.ConfigProto(
        log_device_placement=FLAGS.log_device_placement))
    sess.run(init)

    # Start the queue runners.
    tf.train.start_queue_runners(sess=sess)

    summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)

    for step in xrange(FLAGS.max_steps):
      loss_value = sess.run([train_op, loss]) # Training
     ＃ ...
     # ...
```