# LeNet

![LeNet](images/lenet.png)

Paper: [LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.](http://yann.lecun.com/exdb/publis/psgz/lecun-98.ps.gz)

Webpage: [LeNet-5, convolutional neural networks](http://yann.lecun.com/exdb/lenet/)

Tensorflow MNIST tutorials: [beginner](https://www.tensorflow.org/get_started/mnist/beginners) and [advanced](https://www.tensorflow.org/get_started/mnist/pros)

In [None]:
import tensorflow as tf
import numpy as np

import datetime

from __future__ import print_function

## Load the dataset

Tensorflow has helpers to load the MNIST dataset:

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/MNIST", one_hot=True)

## Check the dataset

We can check the size of the images:

In [None]:
image_size = mnist.train.images.shape[1]
image_height = int(np.sqrt(image_size))
image_width = int(np.sqrt(image_size))
image_channels = 1 # black and white images
print("{:d} = {:d} x {:d} x {:d}".format(image_size, image_height, image_width, image_channels))

and how many different labels we have (should be one per digit):

In [None]:
labels_size = mnist.train.labels.shape[1]
print(labels_size)

## Build the network

First we will define some helper functions and then we will call them several times with diferent parameters for different layers in order to build the network. We will use some variable scopes, variable names and operation names for debugging.

According to the paper, the weights are initialized with random values drawn from a uniform distribution between $-2.4 / F_i$ and $2.4 / F_i$ where $F_i$ is the number of input dimensions (fan-in). In Tensorflow, this initialization can be implemented with `tf.contrib.layers.variance_scaling_initializer`:

In [None]:
def weight_variable(name, shape):
    return tf.get_variable(name, shape, dtype=np.float32,
        initializer=tf.contrib.layers.variance_scaling_initializer(factor=2.0, mode="FAN_IN", uniform=True))

We initialize the bias vectors just with zeros:

In [None]:
def bias_variable(name, shape):
    return tf.get_variable(name, shape, dtype=np.float32, initializer=tf.zeros_initializer())

2D convolution layers with $tanh$ activation according to the paper:

In [None]:
def conv2d(name, t_input, patch, input_channels, output_channels):
    with tf.variable_scope(name):
        t_weight = weight_variable("weight", [patch, patch, input_channels, output_channels])
        t_conv = tf.nn.conv2d(t_input, t_weight, strides=[1, 1, 1, 1], padding="VALID", name="conv")
        t_bias = bias_variable("bias", [output_channels])
        t_linear = tf.add(t_conv, t_bias, name="linear")
        t_activation = tf.nn.tanh(t_linear, name="activation")
    return t_activation

Sampling layers of 2x2 patches using max pooling:

In [None]:
def max_pool_2x2(name, x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name)

Fully connected layers with optional $tanh$ activation (last layer uses softmax outside):

In [None]:
def fc(name, t_input, input_size, hidden_size, activation=True):
    with tf.variable_scope(name):
        t_weight = weight_variable("weight", [input_size, hidden_size])
        t_matmul = tf.matmul(t_input, t_weight, name="matmul")
        t_bias = bias_variable("bias", [hidden_size])
        t_linear = tf.add(t_matmul, t_bias, name="linear")
        if activation:
            return tf.nn.tanh(t_linear, name="activation")
        else:
            return t_linear

Use this in case you want to build the network again (you will get errors otherwise):

In [None]:
tf.reset_default_graph()

Instanciate the layers according to the paper diagram:

In [None]:
conv1_channels = 6
conv1_patch = 5
conv2_channels = 16
conv2_patch = 5
fc_input = conv2_channels * conv2_patch * conv2_patch
fc1_hidden = 120
fc2_hidden = 84

t_flat_images = tf.placeholder(tf.float32, shape=[None, image_size], name="flat_images")
t_labels = tf.placeholder(tf.float32, shape=[None, labels_size], name="labels")

t_square_images = tf.reshape(t_flat_images, [-1, image_height, image_width, image_channels], name="square_images")
t_padded_images = tf.pad(t_square_images, [[0, 0], [2, 2], [2, 2], [0, 0]], name="padded_images")

t_conv1 = conv2d("conv1", t_padded_images, conv1_patch, image_channels, conv1_channels)

t_sampling1 = max_pool_2x2("sampling1", t_conv1)

t_conv2 = conv2d("conv2", t_sampling1, conv2_patch, conv1_channels, conv2_channels)

t_sampling2 = max_pool_2x2("sampling2", t_conv2)

t_fc_input = tf.reshape(t_sampling2, [-1, fc_input], name="fc_input")

t_fc1 = fc("fc1", t_fc_input, fc_input, fc1_hidden)

t_fc2 = fc("fc2", t_fc1, fc1_hidden, fc2_hidden)

t_logits = fc("fc3", t_fc2, fc2_hidden, labels_size, activation=False)
t_predictions = tf.nn.softmax(t_logits, name="predictions")

## Train the network

We calculate the loss with cross entropy, comparing the digit with the highest predicted probability compared with the correct one and we optimize with gradient descent. Also we calculate the amount of correct predictions to compute the accuracy later:

In [None]:
with tf.variable_scope("loss"):
    t_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=t_logits, labels=t_labels))

learning_rate = 0.01

with tf.variable_scope("optimizer"):
    t_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(t_loss)

with tf.variable_scope("correct"):
    t_correct = tf.reduce_sum(
        tf.cast(
            tf.equal(tf.argmax(t_predictions, 1), tf.argmax(t_labels, 1)),
            tf.float32
        )
    )

We will update the weights in batches with the gradient descent optimizer we defined before. In test mode, we will only collect loss and accuracy metrics:

In [None]:
batch_size = 128
log_every_batches = 100

def run_batches(session, batches, train=True):
    epoch_loss = 0.0
    epoch_correct = 0.0
    
    iterations = batches.num_examples // batch_size

    for iteration in range(iterations):
        batch = batches.next_batch(batch_size)

        if train:
            session.run(t_optimizer, feed_dict={t_flat_images: batch[0], t_labels: batch[1]})
            
        loss, correct = session.run([t_loss, t_correct],
                                    feed_dict={t_flat_images: batch[0], t_labels: batch[1]})
        
        epoch_loss += loss
        epoch_correct += correct
        
        if train and iteration % log_every_batches == log_every_batches - 1:
            print("Train Batch: {:5d} Loss: {:.4f}".format(iteration + 1, loss))

    return epoch_loss / iterations, epoch_correct / (iterations * batch_size)

Finally, we run the training session:

In [None]:
epochs = 10

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    
    for epoch_id in range(epochs):
        train_loss, train_accuracy = run_batches(session, mnist.train, train=True)
        test_loss, test_accuracy = run_batches(session, mnist.test, train=False)

        print("Epoch: {:5d} Train Loss: {:.4f} Test Loss: {:.4f} Train Accuracy: {:.4f} Test Accuracy: {:.4f}".format(
            epoch_id + 1, train_loss, test_loss, train_accuracy, test_accuracy))