# ID2223 – Lab 2:  Deep Learning with TensorFlow

For this lab we will use a dataset called Fashion-MNIST. The original MNIST dataset (Mixed National Institute of Standards and Technology database) is a database of handwritten digits that is commonly used for evaluating image classification algorithms. You can read more about the dataset in Yann LeCun’s MNIST page or Chris Olah’s visualizations of MNIST. MNIST is considered the HelloWorld dataset for Deep Learning, however, for this lab, MNIST is too easy to get very high accuracy on.

As such, we will use Fashion-MNIST - a drop-in replacement for the original MNIST, released by Zalando. It contains images of various articles of clothing and accessories: shirts, bags, shoes, and other fashion items. The Fashion MNIST training set contains 55,000 examples, and the test set contains 10,000 examples. Each example is a 28x28 grayscale image (just like the images in the original MNIST), associated with a label from 10 classes (t-shirts, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots).

This lab has the following goals:

- Learn how to setup and run a computational graph in Tensorflow
- Implement a single-layer as well as a multi-layer Neural Network in Tensorflow
- Combine different activation functions to increase the accuracy
- Tackle overfitting using regularization
- Further improve the performance by using Convolutional Layers
- Use hyperparameter optimization to improve prediction accuracy

In [1]:
# create data/fashion folder, if it doesn't exist
import os
if not os.path.exists("data/fashion"): os.makedirs("data/fashion")

In [2]:
# all tensorflow api is accessible through this
import tensorflow as tf        
# to visualize the resutls
import matplotlib.pyplot as plt 
# 70k mnist dataset that comes with the tensorflow container
from tensorflow.examples.tutorials.mnist import input_data
# numpy library
import numpy as np
# random library
from random import randint
# datetime library
from datetime import datetime
%matplotlib inline
tf.set_random_seed(0)

# Improving Predictions with Hyperparameter Optimization

There are now many different hyperparameters in your deep neural network, from:
- the number of layers 
- the learning rate, 
- choice of optimizer, 
- values for Dropout. 

In this task, you will train your network with many different combinations of hyper-parameters, with the goal of improving predictions on the test dataset.

If you are running Docker/Python, you are free to use whatever methods you prefer.
You could, for example, implement your own random search or gridsearch method or just pick a number of experiments, and use a bash script to launch different experiments with
different combinations of hyperparameters. One framework you could use is hyperopt.

In [3]:
def mnist_fashion_model(training_iter, ad_learning_rate, dropout, decay_rate, decay_steps):

    # load data
    mnist = input_data.read_data_sets('data/fashion', one_hot=True)

    # 1. Define Variables and Placeholders
    num_pixels = 28
    num_inputs = num_pixels*num_pixels
    num_outputs = 10
    epoch_size = 100

    num_hidd_1 = 200
    num_hidd_2 = 100
    num_hidd_3 = 60
    num_hidd_4 = 30

    X = tf.placeholder(tf.float32, [None, num_pixels, num_pixels, 1], name="X")
    Y_ = tf.placeholder(tf.float32,[None, num_outputs], name="Y_") # correct answers(labels)
    XX = tf.reshape(X, [-1, num_inputs]) # flatten the images into a single line of pixels
    keep_prob = tf.placeholder(tf.float32) #prob.of keeping a node during dropout: 1.0 at testing (no dropout) and 0.75 at training
    global_step = tf.Variable(0, trainable=False) # decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)

    # Weights & bias initialization
    # One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients.
    # Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias 
    # to avoid "dead neurons"
    def weight_variable(shape):
      initial = tf.truncated_normal(shape, stddev=0.1)
      return tf.Variable(initial)

    def bias_variable(shape):
      initial = tf.constant(0.1, shape=shape)
      return tf.Variable(initial)

    # Convolution
    def conv2d(X, W, strides):
      return tf.nn.conv2d(X, W, strides=strides, padding='SAME')

    # 2. Define the model:

    # To apply the layer, we first reshape x to a 4d tensor, with the second and third dimensions corresponding to 
    # image width and height, and the final dimension corresponding to the number of color channels.
    x_image = tf.reshape(XX, [-1, num_pixels, num_pixels, 1])

    # Conv.Layer 1: patch of 5x5, 1 input channel, 4 output channels, stride of [1,1,1,1]
    W_conv1 = weight_variable([5, 5, 1, 4])
    B_conv1 = bias_variable([4])
    H_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, [1, 1, 1, 1]) + B_conv1)

    # Conv.Layer 2: patch of 5x5, 4 input channel, 8 output channels, stride of [1,2,2,1]
    W_conv2 = weight_variable([5, 5, 4, 8])
    B_conv2 = weight_variable([8])
    H_conv2 = tf.nn.relu(conv2d(H_conv1, W_conv2, [1, 2, 2, 1]) + B_conv2)
    # NOTE:the image size has been reduced to 14x14 (because of the used stride)

    # Conv.Layer 3: patch of 4x4, 8 input channel, 12 output channels, stride of [1,2,2,1]
    W_conv3 = weight_variable([4, 4, 8, 12])
    B_conv3 = weight_variable([12])
    H_conv3 = tf.nn.relu(conv2d(H_conv2, W_conv3, [1, 2, 2, 1]) + B_conv3)
    # NOTE:the image size has been reduced to 7x7 (because of the used stride)

    # Fully connected layer (ReLU): input = vector of 12*7*7 elements, output = vector of 200 elements
    # NOTE: You will need to reshape the tensor from having height X width X depth in matrix form 
    # to having a vector of height * width * depth elements.
    W_fc1 = weight_variable([7 * 7 * 12, 200])
    B_fc1 = weight_variable([200])
    H_fc1 = tf.nn.relu(tf.matmul(tf.reshape(H_conv3,[-1, 7 * 7 * 12]),W_fc1) + B_fc1)
    D_fc1 = tf.nn.dropout(H_fc1, keep_prob)

    # Readout layer (Softmax): input = vector of 200 elements, output = vector of 10 elements
    W_fc2 = weight_variable([200, num_outputs])
    B_fc2 = bias_variable([num_outputs])

    #YLogits: values to be used as input to softmax
    YL = tf.matmul(D_fc1, W_fc2) + B_fc2

    # 3. Define the loss function
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=Y_, logits=YL, name="loss"))

    # 4. Define the accuracy
    correct = tf.equal(tf.argmax(YL,1), tf.argmax(Y_,1)) #tf.nn.in_top_k(Y,Y_,1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

    # 5.2 Train with the AdamOptimizer (a slightly better optimizer) and a starter learning rate of 0.005
    learning_rate = tf.train.exponential_decay(ad_learning_rate, global_step, decay_steps, decay_rate, staircase=True)
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy,global_step=global_step)
    
    # initialize
    init = tf.global_variables_initializer()
    sess = tf.Session()
    sess.run(init)
    
    # training step function    
    def training_step(i, update_test_data, update_train_data):
        
        # actual learning
        batch_X, batch_Y = mnist.train.next_batch(100)
        sess.run(train_step, feed_dict={XX: batch_X, Y_: batch_Y, keep_prob: dropout})

        tra = []
        trc = []
        tta = []
        ttc = []

        if update_train_data:
            a, c = sess.run([accuracy, cross_entropy], feed_dict={XX: batch_X, Y_: batch_Y, keep_prob: dropout})
            tra.append(a)
            trc.append(c)

        if update_test_data:
            a, c = sess.run([accuracy, cross_entropy], feed_dict={XX: mnist.test.images, Y_: mnist.test.labels, keep_prob: dropout})
            tta.append(a)
            ttc.append(c)

        return (tra,trc,tta,ttc)

    # 6. Train and test the model, store the accuracy and loss per iteration
    train_a = []
    train_c = []
    test_a = []
    test_c = []

    for i in range(training_iter):
        #When you compute accuracy and loss you run the whole training/testing dataset
        #through your current model and this is quite expensive. Doing so every iteration would
        #make it impractical, so you will compute it every 100 rounds, which we can call epochs.    
        test = False
        if i % epoch_size == 0: 
            test = True
        a, c, ta, tc = training_step(i, test, test)
        train_a += a
        train_c += c
        test_a += ta
        test_c += tc
        
    sess.close()
        
    return test_a

In [4]:
def results(begda, experiment, iterations, learning_r, dropout_rate, decay_rate, decay_steps, accur):

    print("#################### Experiment No."+str(experiment))
    print("Time: "+str(datetime.now()-begda))
    print("No.iterations:"+str(iterations))
    print("Learning rate:"+str(learning_r))
    print("Dropout rate:"+str(dropout_rate))
    print("Decay rate:"+str(decay_rate))
    print("Decay steps:"+str(decay_steps))
    print("Test accuracy:"+str(accur[-1]))
    print("Max.test accuracy:"+str(max(accur)))
    print('\n')

In [5]:
# Hyperparameter optimization experiment
num_iterations = [10000]
learning_rates = [0.001, 0.005, 0.01]
dropout_rates = [0.1, 0.25, 0.5]
decay_steps = [100, 1000]
decay_rates = [0.96, 0.90]
experiment = 0

for i in num_iterations:
    for lr in learning_rates:
        for kr in dropout_rates:
            for ds in decay_steps:
                for dr in decay_rates:
                    experiment +=1
                    begda = datetime.now()
                    accur = mnist_fashion_model(i, lr, kr, dr, ds)
                    results(begda, experiment, i, lr, kr, dr, ds, accur)

Extracting data/fashion\train-images-idx3-ubyte.gz
Extracting data/fashion\train-labels-idx1-ubyte.gz
Extracting data/fashion\t10k-images-idx3-ubyte.gz
Extracting data/fashion\t10k-labels-idx1-ubyte.gz
#################### Experiment No.1
Time: 0:08:27.853597
No.iterations:10000
Learning rate:0.001
Dropout rate:0.1
Decay rate:0.96
Decay steps:100
Test accuracy:0.7905
Max.test accuracy:0.7993


Extracting data/fashion\train-images-idx3-ubyte.gz
Extracting data/fashion\train-labels-idx1-ubyte.gz
Extracting data/fashion\t10k-images-idx3-ubyte.gz
Extracting data/fashion\t10k-labels-idx1-ubyte.gz
#################### Experiment No.2
Time: 0:08:31.228494
No.iterations:10000
Learning rate:0.001
Dropout rate:0.1
Decay rate:0.9
Decay steps:100
Test accuracy:0.7419
Max.test accuracy:0.7476


Extracting data/fashion\train-images-idx3-ubyte.gz
Extracting data/fashion\train-labels-idx1-ubyte.gz
Extracting data/fashion\t10k-images-idx3-ubyte.gz
Extracting data/fashion\t10k-labels-idx1-ubyte.gz
#####

Extracting data/fashion\train-labels-idx1-ubyte.gz
Extracting data/fashion\t10k-images-idx3-ubyte.gz
Extracting data/fashion\t10k-labels-idx1-ubyte.gz
#################### Experiment No.22
Time: 0:09:33.316185
No.iterations:10000
Learning rate:0.005
Dropout rate:0.5
Decay rate:0.9
Decay steps:100
Test accuracy:0.8867
Max.test accuracy:0.8905


Extracting data/fashion\train-images-idx3-ubyte.gz
Extracting data/fashion\train-labels-idx1-ubyte.gz
Extracting data/fashion\t10k-images-idx3-ubyte.gz
Extracting data/fashion\t10k-labels-idx1-ubyte.gz
#################### Experiment No.23
Time: 0:09:37.518976
No.iterations:10000
Learning rate:0.005
Dropout rate:0.5
Decay rate:0.96
Decay steps:1000
Test accuracy:0.8896
Max.test accuracy:0.8936


Extracting data/fashion\train-images-idx3-ubyte.gz
Extracting data/fashion\train-labels-idx1-ubyte.gz
Extracting data/fashion\t10k-images-idx3-ubyte.gz
Extracting data/fashion\t10k-labels-idx1-ubyte.gz
#################### Experiment No.24
Time: 0:09:32.7