# SAIG Introduction to Deep Learning Workshop - Solutions

Today, we'll be exploring deep neural networks and applying them to the MNIST dataset.

By Swetha Revanur.

## Setting up the Environment

Run any code below by highlighting the box and hitting `Shift + Enter`. Import the libraries below.

In [1]:
!pip3 install -r requirements.txt





In [2]:
%%bash
export PYTHONHASHSEED=0

In [3]:
from __future__ import print_function
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import random
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
import os
os.environ['KERAS_BACKEND'] = 'theano'

# fix random seed for reproducibility
np.random.seed(1337)

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, BatchNormalization
from keras import regularizers
from keras import backend as K

Using Theano backend.


## Read in Data

In [4]:
# load data from MNIST (70000 28x28 images total)
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_train = x_train.shape[0]
num_test = x_test.shape[0]

print(num_train, 'records in train set')
print(num_test, 'records in test set')
print(num_train + num_test, 'records in total\n')

print(x_train.shape)
print(x_test.shape)

60000 records in train set
10000 records in test set
70000 records in total

(60000, 28, 28)
(10000, 28, 28)


In [5]:
dim = x_train.shape[1] # dimension of square images = 28

# flatten so we instead have 70000 28^2=784-dimensional vectors (one per image)
x_train = x_train.reshape((x_train.shape[0], dim*dim))
x_test = x_test.reshape((x_test.shape[0], dim*dim))
print(x_train.shape)
print(x_test.shape)

(60000, 784)
(10000, 784)


In [6]:
num_classes = len(np.unique(y_train)) # 10 unique digits (0-9)
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## Jupyter Exercise 1: Create a Baseline Model

**Your task:** At this point, we're ready to create a basic neural network classifier using Keras. Your model should have 6 layers, of sizes 10, 100, 500, 50, and `num_classes`. All layers, save the last one, should use a ReLU activation function. The last layer should use softmax. Check this [guide](https://keras.io/getting-started/sequential-model-guide/) out to get started.

In [12]:
batch_size = 256
epochs = 15
learning_rate = 5e-3

#########################################################
# Initializes baseline neural network.
# ReLU and Softmax activations. Cross-entropy loss.
#########################################################
def baseline_classifier(learning_rate):
    # create model
    model_baseline = Sequential()

    # YOUR CODE HERE:
    model_baseline.add(Dense(10, input_dim = x_train.shape[1], activation = 'relu')) 
    model_baseline.add(Dense(100, activation = 'relu')) 
    model_baseline.add(Dense(500, activation = 'relu')) 
    model_baseline.add(Dense(100, activation = 'relu')) 
    model_baseline.add(Dense(50, activation = 'relu')) 
    model_baseline.add(Dense(num_classes, activation = 'softmax'))
    # END CODE

    # compile model
    sgd = keras.optimizers.SGD(lr = learning_rate)
    model_baseline.compile(loss = keras.losses.categorical_crossentropy, optimizer = sgd, metrics=['accuracy'])
    
    return model_baseline

## Jupyter Exercise 2: Train and Evaluate the Baseline Model

**Your task:** Use the `eval()` function below to train and evaluate our baseline model. The return value of `eval()` is a tuple of loss and accuracy. Print both of these. In `eval()`, feel free to change the value of the `verbose` parameter. When `verbose = 0`, no information is printed. When it's 5, a lot of detailed information about the training process gets printed. Your test accuracy for this basic model should be around 37%.

Note that a parameter is `model.fit()` is `validation_split`. This takes a float from 0 to 1, representing the percentage of the training set to use for validation. Why do we need a validation set? More than training performance, we are interested in how our model does on unseen data. We can split data into only training and test. But if we then optimize our model using results from the test set, our test set can no longer be considered unseen data. As a workaround, we split our dataset into train, validation, and test. This way, we can optimize on our validation set, and only touch our test set at the very end. For the purposes of this workshop, we have ignored the validation set.

In [13]:
#########################################################
# Trains and evaluates given model. Returns loss and 
# accuracy.
#########################################################
def eval(model, verb = 2):
    # fit the model
    model.fit(x_train, y_train, 
              epochs = epochs, 
              batch_size = batch_size, 
              verbose = verb,
              shuffle = False)
    
    # evaluate the model
    scores = model.evaluate(x_test, y_test)
    
    return scores

model_baseline = baseline_classifier(learning_rate)

# YOUR CODE HERE:
loss, acc = eval(model_baseline)
print('\n\nTest loss:', loss)
print('Test accuracy:', acc)
# END CODE

Epoch 1/15
 - 1s - loss: 1.7107 - acc: 0.4336
Epoch 2/15
 - 1s - loss: 0.7817 - acc: 0.7421
Epoch 3/15
 - 1s - loss: 0.5179 - acc: 0.8379
Epoch 4/15
 - 1s - loss: 0.4323 - acc: 0.8650
Epoch 5/15
 - 1s - loss: 0.3814 - acc: 0.8817
Epoch 6/15
 - 1s - loss: 0.3473 - acc: 0.8928
Epoch 7/15
 - 1s - loss: 0.3202 - acc: 0.9017
Epoch 8/15
 - 1s - loss: 0.2990 - acc: 0.9084
Epoch 9/15
 - 1s - loss: 0.2828 - acc: 0.9138
Epoch 10/15
 - 1s - loss: 0.2688 - acc: 0.9185
Epoch 11/15
 - 1s - loss: 0.2567 - acc: 0.9208
Epoch 12/15
 - 1s - loss: 0.2463 - acc: 0.9245
Epoch 13/15
 - 1s - loss: 0.2379 - acc: 0.9264
Epoch 14/15
 - 1s - loss: 0.2298 - acc: 0.9289
Epoch 15/15
 - 1s - loss: 0.2229 - acc: 0.9307


Test loss: 0.2686284477956593
Test accuracy: 0.9187


## Jupyter Exercise 3: Introduce Regularization

Sometimes our train accuracies are consistently higher than our test accuracy.  We might also see train losses that continue to decrease while the validation losses hit a minimum then increase. Both of these things are indicators of overfitting and poor model generalization. Regularization is a way to fix this. Regularization reduces overfitting by adding a penalty to the loss function. By adding this penalty, the model is trained such that it does not learn interdependent sets of features weights.

**Your task:** Add L2 regularization to the last two hidden layers of our model. You may want to refer to the [Keras documentation about regularizers](https://keras.io/regularizers/).

In [18]:
reg_strength = 0.1

#########################################################
# Initializes neural network with L2 regularization.
#########################################################
def regularized_classifier(learning_rate, reg_strength):
    # create model
    model_regularized = Sequential()

    # YOUR CODE HERE:
    model_regularized.add(Dense(10, input_dim = dim*dim, activation = 'relu')) 
    model_regularized.add(Dense(100, activation = 'relu')) 
    model_regularized.add(Dense(500, activation = 'relu', kernel_regularizer = regularizers.l2(reg_strength))) 
    model_regularized.add(Dense(100, activation = 'relu'))
    model_regularized.add(Dense(50, activation = 'relu'))
    model_regularized.add(Dense(num_classes, activation = 'softmax'))
    # END CODE
    
    # compile model
    sgd = keras.optimizers.SGD(lr = learning_rate)
    model_regularized.compile(loss = keras.losses.categorical_crossentropy, 
                  optimizer = sgd, metrics=['accuracy'])
    
    return model_regularized

model_regularized = regularized_classifier(learning_rate, reg_strength)

loss, acc = eval(model_regularized)
print('\n\nTest loss:', loss)
print('Test accuracy:', acc)

Epoch 1/15
 - 1s - loss: 14.6631 - acc: 0.5538
Epoch 2/15
 - 1s - loss: 8.8709 - acc: 0.8135
Epoch 3/15
 - 1s - loss: 5.6031 - acc: 0.8693
Epoch 4/15
 - 1s - loss: 3.5970 - acc: 0.8914
Epoch 5/15
 - 1s - loss: 2.3551 - acc: 0.9007
Epoch 6/15
 - 1s - loss: 1.5806 - acc: 0.9079
Epoch 7/15
 - 1s - loss: 1.0950 - acc: 0.9132
Epoch 8/15
 - 1s - loss: 0.7905 - acc: 0.9175
Epoch 9/15
 - 1s - loss: 0.5977 - acc: 0.9197
Epoch 10/15
 - 1s - loss: 0.4755 - acc: 0.9228
Epoch 11/15
 - 1s - loss: 0.3960 - acc: 0.9253
Epoch 12/15
 - 1s - loss: 0.3446 - acc: 0.9266
Epoch 13/15
 - 1s - loss: 0.3100 - acc: 0.9290
Epoch 14/15
 - 1s - loss: 0.2867 - acc: 0.9309
Epoch 15/15
 - 1s - loss: 0.2703 - acc: 0.9321


Test loss: 0.2874129926919937
Test accuracy: 0.9252


## Jupyter Exercise 4: Introduce Dropout

Dropout is another regularization technique to prevent overfitting. Dropout randomly selects neurons to ignore during training. In other words, they are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. 

**Your task:** Remove the regularization, and this time around, add dropout for the input layer. You may want to refer to the [Keras documentation about Dropout layer](https://keras.io/layers/core/).

In [19]:
dropout_strength = 0.1

#########################################################
# Initializes neural network with dropout.
#########################################################
def dropout_classifier(learning_rate, dropout_strength):
    # create model
    model_dropout = Sequential()

    # YOUR CODE HERE:
    model_dropout.add(Dropout(dropout_strength, input_shape = (dim*dim,)))
    model_dropout.add(Dense(10, activation = 'relu')) 
    model_dropout.add(Dense(100, activation = 'relu')) 
    model_dropout.add(Dense(500, activation = 'relu')) 
    model_dropout.add(Dense(100, activation = 'relu')) 
    model_dropout.add(Dense(50, activation = 'relu')) 
    model_dropout.add(Dense(num_classes, activation = 'softmax'))
    # END CODE
    
    # compile model
    sgd = keras.optimizers.SGD(lr = learning_rate)
    model_dropout.compile(loss = keras.losses.categorical_crossentropy, 
                  optimizer = sgd, metrics=['accuracy'])
    
    return model_dropout

model_dropout = dropout_classifier(learning_rate, dropout_strength)

loss, acc = eval(model_dropout, verb = 2)
print('\n\nTest loss:', loss)
print('Test accuracy:', acc)

Epoch 1/15
 - 1s - loss: 1.6247 - acc: 0.4556
Epoch 2/15
 - 1s - loss: 0.7926 - acc: 0.7323
Epoch 3/15
 - 1s - loss: 0.5507 - acc: 0.8206
Epoch 4/15
 - 1s - loss: 0.4452 - acc: 0.8599
Epoch 5/15
 - 2s - loss: 0.3882 - acc: 0.8792
Epoch 6/15
 - 2s - loss: 0.3554 - acc: 0.8906
Epoch 7/15
 - 2s - loss: 0.3327 - acc: 0.8966
Epoch 8/15
 - 1s - loss: 0.3162 - acc: 0.9017
Epoch 9/15
 - 2s - loss: 0.2964 - acc: 0.9079
Epoch 10/15
 - 1s - loss: 0.2819 - acc: 0.9130
Epoch 11/15
 - 2s - loss: 0.2743 - acc: 0.9145
Epoch 12/15
 - 2s - loss: 0.2606 - acc: 0.9193
Epoch 13/15
 - 2s - loss: 0.2557 - acc: 0.9210
Epoch 14/15
 - 1s - loss: 0.2449 - acc: 0.9235
Epoch 15/15
 - 1s - loss: 0.2373 - acc: 0.9265


Test loss: 0.22514843581169844
Test accuracy: 0.9318
