# DLW Practical 2
# Regularisation with MNIST

**Introduction**

In this practical, we will add regularisation to our non-linear model from the previous practical. This is an important concept to understand as many machine learning techniques suffer from over-fitting. A popular regularisation technique is to add a L2 prior over the model parameters, which is a penalty term added to our loss function. This will constrain our model parameters to take on smaller values. A more recent technique to use is called dropout. This randomly drops nodes in certain layers during training and prevents them from co-adapting. It also has advantages in terms of performing model averaging.

**Learning objectives**:

* To get familiar with L2 regularisation by adding it to the non-linear classifier code.
* To understand dropout regularisation by adding it to the non-linear classifier code.
* Make sure you understand why regularisation is important by observing the difference between the training set loss and cross-validation set loss.

**What is expected of you:**

* Implement L2 regularisation by adding a regularisation coefficient and tf.nn.l2_loss() to all the model's weights and add it to the loss function. Read the Tensorflow documentation if you get stuck.
* Remove the L2 regularisation and then implement dropout regularisation with a keep_prob coefficient and tf.nn.dropout(). Be careful what you do at test time! Read the Tensorflow documentation if you get stuck.
* What observations can you make between the training set loss and cross-validation set loss for both these methods?

In [3]:
%matplotlib inline

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data

In [4]:
def display_mnist_images(gens, num_images):
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    fig, axs = plt.subplots(1, num_images, figsize=(25, 3))
    for i in range(num_images):
        reshaped_img = (gens[i].reshape(28, 28) * 255).astype(np.uint8)
        axs.flat[i].imshow(reshaped_img)
    plt.show()

In [None]:
# download MNIST dataset #
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# visualize random MNIST images #
batch_xs, batch_ys = mnist.train.next_batch(10)
list_of_images = np.split(batch_xs, 10)
#   display_mnist_images(list_of_images, 10)

x_dim, train_examples, n_classes = mnist.train.images.shape[1], mnist.train.num_examples, mnist.train.labels.shape[1]

######################################
# define the model (build the graph) #
######################################

n_hidden = 100
x = tf.placeholder(tf.float32, [None, x_dim])
Wx = tf.Variable(tf.random_normal([x_dim, n_hidden]))
bx = tf.Variable(tf.ones([n_hidden]))
h = tf.nn.relu(tf.add(tf.matmul(x, Wx), bx))
Wh = tf.Variable(tf.random_normal([n_hidden, n_classes]))
bh = tf.Variable(tf.ones([n_classes]))
y = tf.placeholder(tf.float32, [None, n_classes])
y_ = tf.add(tf.matmul(h, Wh), bh)
prob = tf.nn.softmax(y_)

########################
# define loss function #
########################

regularisation_coeff = 0.01

cross_entropy_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_, labels=y)) + \
regularisation_coeff*tf.nn.l2_loss(Wx) + regularisation_coeff*tf.nn.l2_loss(bx) + \
regularisation_coeff*tf.nn.l2_loss(Wh) + regularisation_coeff*tf.nn.l2_loss(bh)

learning_rate = 0.01

train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy_loss)

###########################
# define model evaluation #
###########################

actual_class, predicted_class = tf.argmax(y, 1), tf.argmax(prob, 1)
correct_prediction = tf.cast(tf.equal(predicted_class, actual_class), tf.float32)
classification_accuracy = tf.reduce_mean(correct_prediction)

#########################
# define training cycle #
#########################

num_epochs = 100
batch_size = 20

# initializing the variables before starting the session #
init = tf.global_variables_initializer()

# launch the graph in a session (use the session as a context manager) #
with tf.Session() as sess:
    # run session #
    sess.run(init)
    # start main training cycle #
    for epoch in range(num_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples / batch_size)
        # loop over all batches #
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # run optimization op (backprop), cost op and accuracy op (to get training losses) #
            _, c, a = sess.run([train_step, cross_entropy_loss, classification_accuracy], feed_dict={x: batch_x, y: batch_y})
            # compute avg training loss and avg training accuracy #
            avg_cost += c / total_batch
        # display logs per epoch step #
        if epoch % 10 == 0:
            cost_eval = cross_entropy_loss.eval(feed_dict={x: mnist.validation.images, y: mnist.validation.labels})
            print("Epoch {}:\ntraining-cross-entropy-loss = {:.4f}\nvalidation-cross-entropy-loss = {:.4f}\n".format(epoch + 1, avg_cost, cost_eval))
    print("Optimization Finished!")
    # calculate test set accuracy #
    test_accuracy = classification_accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
    print("Accuracy on test set = {:.3f}%".format(test_accuracy * 100))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Epoch 1:
training-cross-entropy-loss = 311.9863
validation-cross-entropy-loss = 230.6362

