# Lab2: Logistic Regression (Softmax) & MLP in TensorFlow

**Reference Materials:** 
(I strongly suggest all of you should read through these pages)
* https://www.tensorflow.org/versions/r0.10/tutorials/mnist/beginners/index.html
* http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
* http://colah.github.io/posts/2014-07-Understanding-Convolutions/

## Logistic Regression
Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes.

Suppose we have a set of data $\{(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), ..., (x^{(n)}, y^{(n)})\},$ where $x^{(i)}$ is the feature and $y^{(i)}$ is the corresponding class

In binary logistic regression setting ($y^{(i)} \in \{0, 1\}$):

 **Probability output:** $p(y = 1|x, \theta) = \frac{1}{1+exp(-\theta^Tx)}$
 
 **Cost function:** $J(\theta) = -[\sum_i y^{(i)}log h_{\theta}(x^{(i)}) + (1 - y^{(i)})log (1- h_{\theta}(x^{(i)}))]$
 
In Softmax setting ($y^{(i)} \in \{0, 1, ..., K\}$), we would like to estimate the probability $P(y^{(i)} = k|x^{(i)})$:

 **Probability output:** $p(y = k|x, \theta) = \frac{exp(\theta^{(k)T}x)}{\sum_i exp(\theta^{(i)T}x)}$
 
 **Cost function:** $J(\theta) = -[\sum_i \sum_k 1\{y^{(i)} = k\}log p(k|x^{(i)}, \theta)]$

In [None]:
import tensorflow as tf
import numpy as np

In [None]:
# Import MINST data
# The MNIST database of handwritten digits
import input_data
mnist_path = "/tmp/data/"
mnist = input_data.read_data_sets(mnist_path, one_hot=True)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
sample_img = mnist.train.images[np.random.randint(len(mnist.train.images))]
plt.imshow(sample_img.reshape([28, 28]), cmap='Greys')

In [None]:
# Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1

# tf Graph Input
x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes

In [None]:
# Create model

# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

In [None]:
# Construct model

In [None]:
# Minimize error using cross entropy
# Cross entropy
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) 

In [None]:
# Initializing the variables
init = tf.initialize_all_variables()

In [None]:
# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            sess.run # Complete this
            # Compute average loss
            avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys})
        # Display logs per epoch step
        if epoch % display_step == 0:
            print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost/total_batch)

    print "Optimization Finished!"

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})

## Multiple Layer Perceptron (MLP)
An MLP with single hidden layer could be represented graphically as follows:
![](http://deeplearning.net/tutorial/_images/mlp.png)

Formally, a one-hidden-layer MLP is a function $f: R^D \rightarrow R^L$, where $D$ is the dimension of input vector $x$, and $L$ is the dimension of output vector $f(x)$

The calculation could be represented as following:

$f(x) = W^{(2)}(s(W^{(1)}x + b^{(1)})) + b^{(2)}$, where $b$ are bias vectors, and $W$ are weight matrices

$s$ are activation functions. Typical choice including $tanh$, $sigmoid$, and $relu$

We can combine MLP with Softmax output to do classification:

In [None]:
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
layer_1 = tf.nn.relu(layer_1)
# Hidden layer with RELU activation
# Output layer with linear activation
pred = tf.matmul(layer_2, weights['out']) + biases['out']

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

## Convolution
### What is Convolution?
The convolution of $f$ and $g$ is defined as:

$(f * g)(t) = \int_{-\infty}^{\infty}f(t-\tau)g(\tau) d\tau$

![](https://upload.wikimedia.org/wikipedia/commons/6/6a/Convolution_of_box_signal_with_itself2.gif)
It can be viewed as the matching similarity between filter $g$ and function $f$

Discrete version:

$(f * g)[n] = \sum_{m=-M}^{M}f(n-m)g(m)$

It's natural to be extended to 2D, and on image, discrete version can be viewed as matrix multiplication

![](http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif)

### What are convolutional neural networks (CNN)?

CNNs are basically just several layers of convolutions with nonlinear activation functions like ReLU or tanh applied to the results. 

During the training phase, a **CNN automatically learns the values of its filters** based on the task you want to perform. 

![](http://www.wildml.com/wp-content/uploads/2015/11/Screen-Shot-2015-11-07-at-7.26.20-AM.png)

In [None]:
# Parameters
learning_rate = 0.001
training_iters = 200000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
    # Reshape input picture
    x = tf.reshape(x, shape=[-1, 28, 28, 1])

    # Convolution Layer
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling)
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling)
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Apply Dropout
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output, class prediction
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

# Store layers weight & bias
weights = {
    # 5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"

    # Calculate accuracy for 256 mnist test images
    print "Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: mnist.test.images[:256],
                                      y: mnist.test.labels[:256],
                                      keep_prob: 1.})