# MNIST handwritten digits classification with MLPs

In this notebook, we'll train a multi-layer perceptron model to classify MNIST digits using Tensorflow. This notebook is based on [mnist_2.0_five_layers_sigmoid.py](https://github.com/martin-gorner/tensorflow-mnist-tutorial/blob/master/mnist_2.0_five_layers_sigmoid.py) by Martin Gorner.

In [None]:
%matplotlib inline

import tensorflow as tf
#tf.set_random_seed(0)

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

device_name = tf.test.gpu_device_name()
if device_name == '':
    device_name = "None"
print('Using TensorFlow version:', tf.__version__, ', GPU:', device_name)

First, we'll download and extract the MNIST dataset.

In [None]:
from tensorflow.examples.tutorials.mnist import input_data as mnist_data
mnist = mnist_data.read_data_sets("data",
                                  one_hot=True,
                                  reshape=False,
                                  validation_size=0)

## Initialization

We'll start by defining the MLP network as a TensorFlow computation graph.

In [None]:
# Placeholders for input images and correct labels:
X = tf.placeholder(tf.float32, [None, 28, 28, 1])
Y_ = tf.placeholder(tf.float32, [None, 10])

# Variables for weights of two hidden layers:
L1, L2 = 200, 100
W1 = tf.Variable(tf.truncated_normal([784, L1], stddev=0.1))  # 784 = 28 * 28
B1 = tf.Variable(tf.zeros([L1]))
W2 = tf.Variable(tf.truncated_normal([L1, L2], stddev=0.1))
B2 = tf.Variable(tf.zeros([L2]))
W3 = tf.Variable(tf.truncated_normal([L2, 10], stddev=0.1))
B3 = tf.Variable(tf.zeros([10]))

# The MLP model:
XX = tf.reshape(X, [-1, 784])
Y1 = tf.nn.sigmoid(tf.matmul(XX, W1) + B1)
Y2 = tf.nn.sigmoid(tf.matmul(Y1, W2) + B2)
Ylogits = tf.matmul(Y2, W3) + B3
Y = tf.nn.softmax(Ylogits)

# Cross-entropy loss function:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=Ylogits,
                                                           labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy)*100

# Prediction accuracy:
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Learning rate and the used optimizer:
learning_rate = 0.003
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

The computation graph is launched in a TensorFlow runtime (`tf.Session()`)
The weights are initialized with random values.

In [None]:
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

## Learning

We train the network by running the graph with minibatches of training data.

In [None]:
%%time

iterations = 1000

acc_v,  ce_v = [], []

for i in range(iterations):

    batch_X, batch_Y = mnist.train.next_batch(100) # minibatch
    acc, ce, _ = sess.run([accuracy, cross_entropy, train_step], 
                          {X: batch_X, Y_: batch_Y})
    acc_v.append(acc); ce_v.append(ce)
    print(i, "accuracy:", acc, ce)

In [None]:
plt.figure(figsize=(5,3))
plt.plot(range(iterations), ce_v)
plt.title('cross-entropy loss')

plt.figure(figsize=(5,3))
plt.plot(range(iterations), acc_v)
plt.title('accuracy');