# TensorFlow Assignment: Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN)

**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**

Name: [Zakharov Nikolai]

Now that you've run through a simple logistic regression model on MNIST, let's see if we can do better (Hint: we can). For this assignment, you'll build a multilayer perceptron (MLP) and a convolutional neural network (CNN), two popular types of neural networks, and compare their performance. Some potentially useful code:

In [3]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# Import data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [5]:
# Helper functions for creating weight variables
def weight_variable(shape):
    """weight_variable generates a weight variable of a given shape."""
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    """bias_variable generates a bias variable of a given shape."""
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# Tensorflow Functions that might also be of interest:
# tf.nn.sigmoid()
# tf.nn.relu()

### Multilayer Perceptron

Build a multilayer perceptron for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> fully connected (500 hidden units) -> nonlinearity (Sigmoid/ReLU) -> fully connected (10 hidden units) -> softmax

Skeleton framework for you to fill in (Code you need to provide is marked by `###`):

In [20]:
tf.reset_default_graph()
# Model Inputs， define placeholders
im_size = 784
x = tf.placeholder(shape=[None, im_size], dtype=tf.float32)
y_ = tf.placeholder(shape=[None, 10], dtype=tf.int32)

# Define the graph
# one hidden layer, h1=relu([bs, 500]),out=softmax([bs, 10])
hid1_size = 500
with tf.variable_scope('vars'):
    w1 = tf.get_variable('w1', shape=[im_size, hid1_size], 
                         initializer=tf.truncated_normal_initializer(stddev=0.1))
    b1 = tf.get_variable('h1', shape=[hid1_size],
                         initializer=tf.constant_initializer(0.1))
    w_out = tf.get_variable('w_out', shape=[hid1_size, 10],
                            initializer=tf.truncated_normal_initializer(stddev=0.1))
    b_out = tf.get_variable('b_out', shape=[10],
                         initializer=tf.constant_initializer(0.1))
    
# h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
h1 = tf.nn.sigmoid(tf.matmul(x, w1) + b1)
y_mlp = tf.matmul(h1, w_out) + b_out
### Create your MLP here##
### Make sure to name your MLP output as y_mlp ###
# Loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_mlp))

# Optimizer
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Evaluation
correct_prediction = tf.equal(tf.argmax(y_mlp, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    
    # Training regimen
    for i in range(4000):
        # Validate every 250th batch
        if i % 250 == 0:
            validation_accuracy = 0
            for v in range(10):
                batch = mnist.validation.next_batch(100)
                validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})
            print('step %d, validation accuracy %g' % (i, validation_accuracy))
        
        # Train    
        batch = mnist.train.next_batch(50)
        train_step.run(feed_dict={x: batch[0], y_: batch[1]})

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

step 0, validation accuracy 0.101
step 250, validation accuracy 0.907
step 500, validation accuracy 0.911
step 750, validation accuracy 0.921
step 1000, validation accuracy 0.926
step 1250, validation accuracy 0.936
step 1500, validation accuracy 0.945
step 1750, validation accuracy 0.948
step 2000, validation accuracy 0.949
step 2250, validation accuracy 0.96
step 2500, validation accuracy 0.956
step 2750, validation accuracy 0.958
step 3000, validation accuracy 0.967
step 3250, validation accuracy 0.964
step 3500, validation accuracy 0.968
step 3750, validation accuracy 0.961
test accuracy 0.9633


#### Comparison

How do the sigmoid and rectified linear unit (ReLU) compare?

***

For this examle using ReLU got 0.9771 test accuracy and 0.9633 test accuracy using Sigmoid function.


***

### Convolutional Neural Network

Build a simple 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> CNN (32 5x5 filters) -> nonlinearity (ReLU) ->  (2x2 max pool) -> CNN (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (1024 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some additional functions that you might find helpful:

In [None]:
# Convolutional neural network functions
def conv2d(x, W):
    """conv2d returns a 2d convolution layer with full stride."""
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    """max_pool_2x2 downsamples a feature map by 2X."""
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Tensorflow Function that might also be of interest:
# tf.reshape()

Skeleton framework for you to fill in (Code you need to provide is marked by `###`):

*Hint: Convolutional Neural Networks are spatial models. You'll want to transform the flattened MNIST vectors into images for the CNN. Similarly, you might want to flatten it again sometime before you do a softmax. You also might want to look into the  [conv2d() documentation](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) to see what shape kernel/filter it's expecting.*

In [14]:
from tensorflow import layers
# Model Inputs
im_size = 784
x = tf.placeholder(shape=[None, im_size], dtype=tf.float32)
y_ = tf.placeholder(shape=[None, 10], dtype=tf.int32)

# Define the graph
x_img = tf.reshape(x, [-1, 28, 28, 1])
conv1 = tf.layers.conv2d(
    inputs=x_img,
    filters=32,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

conv2 = tf.layers.conv2d(
  inputs=pool1,
  filters=64,
  kernel_size=[5, 5],
  padding="same",
  activation=tf.nn.relu)

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

# Dense Layer
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

# Logits Layer
y_conv = tf.layers.dense(inputs=dense, units=10)

### Create your CNN here##
### Make sure to name your CNN output as y_conv ###

# Loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

# Optimizer
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# Evaluation
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    
    # Training regimen
    for i in range(3000):
        # Validate every 250th batch
        if i % 250 == 0:
            validation_accuracy = 0
            for v in range(10):
                batch = mnist.validation.next_batch(50)
                validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})
            print('step %d, validation accuracy %g' % (i, validation_accuracy))
        
        # Train    
        batch = mnist.train.next_batch(50)
        train_step.run(feed_dict={x: batch[0], y_: batch[1]})

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

step 0, validation accuracy 0.138
step 250, validation accuracy 0.926
step 500, validation accuracy 0.954
step 750, validation accuracy 0.964
step 1000, validation accuracy 0.986
step 1250, validation accuracy 0.978
step 1500, validation accuracy 0.976
step 1750, validation accuracy 0.976
step 2000, validation accuracy 0.99
step 2250, validation accuracy 0.976
step 2500, validation accuracy 0.982
step 2750, validation accuracy 0.988
test accuracy 0.9872


In [26]:
import numpy as np
# Using Tensorflow API, not contrib.layers
im_size = 784
x = tf.placeholder(shape=[None, im_size], dtype=tf.float32)
y_ = tf.placeholder(shape=[None, 10], dtype=tf.int32)

def weight_variable(shape):
  initial = tf.truncated_normal(shape,
                                stddev=0.1/np.sqrt(shape[0])) 
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

def conv2d(x, W, stride):
  return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding='SAME')

def maxpool(x, kernel, stride):
  return tf.nn.max_pool(x, ksize=[1, kernel, kernel, 1],
                        strides=[1, stride, stride, 1], padding='SAME')

def convLayer(x, W, b, stride):  # x-input, W-weight, b-bias,stride-convolution stride
  return tf.nn.relu(conv2d(x, W, stride) + b)


# 第一层的参数
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
# 输出，图片
x_image = tf.reshape(x, [-1, 28, 28, 1])
# 第一层CONV+RELU+POOL2x2
h_conv1 = convLayer(x_image, W_conv1, b_conv1, 1)
h_pool1 = maxpool(h_conv1, 2, 2)
# 第二层的函数
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
# 第二层CONV+RELU+POOL
h_conv2 = convLayer(h_pool1, W_conv2, b_conv2, 1)
h_pool2 = maxpool(h_conv2, 2, 2)

W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

# CNN output
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

# Loss 
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))

# Optimizer
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

# Evaluation
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    # Initialize all variables
    sess.run(tf.global_variables_initializer())
    # Training regimen
    for i in range(3000):
        # Validate every 250th batch
        if i % 250 == 0:
            validation_accuracy = 0
            for v in range(10):
                batch = mnist.validation.next_batch(50)
                validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})
            print('step %d, validation accuracy %g' % (i, validation_accuracy))
        
        # Train    
        batch = mnist.train.next_batch(50)
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))

step 0, validation accuracy 0.096
step 250, validation accuracy 0.856
step 500, validation accuracy 0.916
step 750, validation accuracy 0.936
step 1000, validation accuracy 0.962
step 1250, validation accuracy 0.976
step 1500, validation accuracy 0.962
step 1750, validation accuracy 0.956
step 2000, validation accuracy 0.972
step 2250, validation accuracy 0.98
step 2500, validation accuracy 0.976
step 2750, validation accuracy 0.972
test accuracy 0.9819


Sorry didnt train longer, dont want to connect to university server now.

Some differences from the logistic regression model to note:

- The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

- The logistic regression model we used previously was pretty basic, and as such, we were able to get away with using the GradientDescentOptimizer, which performs implements the gradient descent algorithm. For more difficult optimization spaces (such as the ones deep networks pose), we might want to use more sophisticated algorithms. Prof David Carlson has a lecture on this later.
    
- Because of the larger size of our network, notice that our minibatch size has shrunk.
    
- We've added a validation step every 250 minibatches. This let's us see how our model is doing during the training process, rather than sit around twiddling our thumbs and hoping for the best when training finishes. This becomes especially significant as training regimens start approaching days and weeks in length. Normally, we validate on the entire validation set, but for the sake of time we'll just stick to 10 validation minibatches (500 images) for this homework assignment.

#### Comparison

How do the MLP and CNN compare in accuracy? Training time? Why would you use one vs the other? Is there a problem you see with MLPs when applied to other image datasets?

***

CNN, as well as MLP helps to extract features from images, however the biggest advantage of CNN for images is that is able to learn spatial invariance by repeating its weights across the space.

***