# SAIG Introduction to Deep Learning Workshop - Exercises

Today, we'll be exploring deep neural networks and applying them to the MNIST dataset.

By Swetha Revanur.

## Setting up the Environment

Run any code below by highlighting the box and hitting `Shift + Enter`. Import the libraries below.

In [None]:
!pip3 install -r requirements.txt

In [None]:
%%bash
export PYTHONHASHSEED=0

In [None]:
from __future__ import print_function
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import random
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
import os
os.environ['KERAS_BACKEND'] = 'theano'

# fix random seed for reproducibility
np.random.seed(1337)
import tensorflow as tf
tf.random.set_random_seed(2)

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, BatchNormalization
from keras import regularizers
from keras import backend as K

## Read in Data

In [None]:
# load data from MNIST (70000 28x28 images total)
(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_train = x_train.shape[0]
num_test = x_test.shape[0]

print(num_train, 'records in train set')
print(num_test, 'records in test set')
print(num_train + num_test, 'records in total\n')

print(x_train.shape)
print(x_test.shape)

In [None]:
dim = x_train.shape[1] # dimension of square images = 28

# flatten so we instead have 70000 28^2=784-dimensional vectors (one per image)
x_train = x_train.reshape((x_train.shape[0], dim*dim))
x_test = x_test.reshape((x_test.shape[0], dim*dim))
print(x_train.shape)
print(x_test.shape)

In [None]:
num_classes = len(np.unique(y_train)) # 10 unique digits (0-9)
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## Jupyter Exercise 1: Create a Baseline Model

**Your task:** At this point, we're ready to create a basic neural network classifier using Keras. Your model should have 6 layers, of sizes 10, 100, 500, 50, and `num_classes`. All layers, save the last one, should use a ReLU activation function. The last layer should use softmax. Check this [guide](https://keras.io/getting-started/sequential-model-guide/) out to get started.

In [None]:
batch_size = 256
epochs = 15
learning_rate = 5e-3

#########################################################
# Initializes baseline neural network.
# ReLU and Softmax activations. Cross-entropy loss.
#########################################################
def baseline_classifier(learning_rate):
    # create model
    model_baseline = Sequential()

    # YOUR CODE HERE:
    
    # END CODE

    # compile model
    sgd = keras.optimizers.SGD(lr = learning_rate)
    model_baseline.compile(loss = keras.losses.categorical_crossentropy, optimizer = sgd, metrics=['accuracy'])
    
    return model_baseline

## Jupyter Exercise 2: Train and Evaluate the Baseline Model

**Your task:** Use the `eval()` function below to train and evaluate our baseline model. The return value of `eval()` is a tuple of loss and accuracy. Print both of these. In `eval()`, feel free to change the value of the `verbose` parameter. When `verbose = 0`, no information is printed. When it's 5, a lot of detailed information about the training process gets printed. Your test accuracy for this basic model should be around 37%.

Note that a parameter is `model.fit()` is `validation_split`. This takes a float from 0 to 1, representing the percentage of the training set to use for validation. Why do we need a validation set? More than training performance, we are interested in how our model does on unseen data. We can split data into only training and test. But if we then optimize our model using results from the test set, our test set can no longer be considered unseen data. As a workaround, we split our dataset into train, validation, and test. This way, we can optimize on our validation set, and only touch our test set at the very end. For the purposes of this workshop, we have ignored the validation set.

In [None]:
#########################################################
# Trains and evaluates given model. Returns loss and 
# accuracy.
#########################################################
def eval(model, verb = 2):
    # fit the model
    model.fit(x_train, y_train, 
              epochs = epochs, 
              batch_size = batch_size, 
              verbose = verb,
              shuffle = False)
    
    # evaluate the model
    scores = model.evaluate(x_test, y_test)
    
    return scores

model_baseline = baseline_classifier(learning_rate)

# YOUR CODE HERE:
# call eval and print the test loss and accuracy
# END CODE

## Jupyter Exercise 3: Introduce Regularization

Sometimes our train accuracies are consistently higher than our test accuracy.  We might also see train losses that continue to decrease while the validation losses hit a minimum then increase. Both of these things are indicators of overfitting and poor model generalization. Regularization is a way to fix this. Regularization reduces overfitting by adding a penalty to the loss function. By adding this penalty, the model is trained such that it does not learn interdependent sets of features weights.

**Your task:** Add L2 regularization to the last two hidden layers of our model. You may want to refer to the [Keras documentation about regularizers](https://keras.io/regularizers/).

In [None]:
reg_strength = 0.1

#########################################################
# Initializes neural network with L2 regularization.
#########################################################
def regularized_classifier(learning_rate, reg_strength):
    # create model
    model_regularized = Sequential()

    # YOUR CODE HERE:
    
    # END CODE
    
    # compile model
    sgd = keras.optimizers.SGD(lr = learning_rate)
    model_regularized.compile(loss = keras.losses.categorical_crossentropy, 
                  optimizer = sgd, metrics=['accuracy'])
    
    return model_regularized

model_regularized = regularized_classifier(learning_rate, reg_strength)

loss, acc = eval(model_regularized)
print('\n\nTest loss:', loss)
print('Test accuracy:', acc)

## Jupyter Exercise 4: Introduce Dropout

Dropout is another regularization technique to prevent overfitting. Dropout randomly selects neurons to ignore during training. In other words, they are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. 

**Your task:** Remove the regularization, and this time around, add dropout for the input layer. You may want to refer to the [Keras documentation about Dropout layer](https://keras.io/layers/core/).

In [None]:
dropout_strength = 0.1

#########################################################
# Initializes neural network with dropout.
#########################################################
def dropout_classifier(learning_rate, dropout_strength):
    # create model
    model_dropout = Sequential()

    # YOUR CODE HERE:
    
    # END CODE
    
    # compile model
    sgd = keras.optimizers.SGD(lr = learning_rate)
    model_dropout.compile(loss = keras.losses.categorical_crossentropy, 
                  optimizer = sgd, metrics=['accuracy'])
    
    return model_dropout

model_dropout = dropout_classifier(learning_rate, dropout_strength)

loss, acc = eval(model_dropout, verb = 2)
print('\n\nTest loss:', loss)
print('Test accuracy:', acc)