# CMPT 898: Assigment 3 Solutions
## By Samuel Horovatin, sch923, 11185403

### Baseline network: *You can start with a LeNet-5 style architecture that we discussed in the lecture. As a baseline start with ReLU activations for the hidden layers, and a softmax output layer.*

In [18]:
import tensorflow as tf
import numpy as np
import os, datetime
import math

EPOCHS = 10
OPTIMIZER = 'adam'
LOSS = 'sparse_categorical_crossentropy'
METRICS = 'accuracy'

# Grab the Cifar10 dataset, which is a color image database consisiting of 
# 10 different classes representing airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data()

#Normalization of x_train and x_test and split into training dataset and testing dataset
x_train, x_test = x_train / 255.0, x_test / 255.0

def train_model(model):
  model.compile(optimizer=OPTIMIZER,
                loss=LOSS,
                metrics=[METRICS])

  logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
  tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

  model.fit(x=x_train, 
            y=y_train, 
            epochs=EPOCHS, 
            validation_data=(x_test, y_test),
            callbacks=[tensorboard_callback])

In [19]:
def create_baseline_model():
  return tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        filters=6, 
        kernel_size=5, 
        activation='relu', 
        input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=5, 
        activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(120, activation='relu'),
    tf.keras.layers.Dense(84, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')])

%reload_ext tensorboard
train_model(create_baseline_model())
%load_ext tensorboard
%tensorboard --logdir logs

Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 3752), started 3:10:04 ago. (Use '!kill 3752' to kill it.)

### Add L2 weight decay regularization: *Add an L2-norm penalty on the weights of your baseline model as regularization. Test two different regularization strengths.*

In [16]:
#A model that applies an L2 regularization at every layer
def create_L2_model(reg_strength):
  return tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        filters=6, 
        kernel_size=5, 
        activation='relu', 
        input_shape=(32, 32, 3), 
        kernel_regularizer=tf.keras.regularizers.l2(reg_strength)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=5, 
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l2(reg_strength)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(
        120,
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l2(reg_strength)),
    tf.keras.layers.Dense(
        84,
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l2(reg_strength)),
    tf.keras.layers.Dense(10,
        activation='softmax',
        kernel_regularizer=tf.keras.regularizers.l2(reg_strength))])

print(f"Training model 1 with L2 lambda of {LAMBDA1}")
train_model(create_L2_model(LAMBDA1))
print(f"Training model 2 with L2 lambda of {LAMBDA2}")
train_model(create_L2_model(LAMBDA2))
%load_ext tensorboard
%tensorboard --logdir logs

Training model 1 with L2 lambda of 0.001
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Training model 2 with L2 lambda of 0.01
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 3752), started 2:47:46 ago. (Use '!kill 3752' to kill it.)

### Add L1 weight decay regularization: *Add an L1-norm penalty on the weights of your baseline model as regularization. Test two different regularization strengths.*

In [20]:
#A model that applies an L1 regularization at every layer
def create_L1_model(reg_strength):
  return tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        filters=6, 
        kernel_size=5, 
        activation='relu', 
        input_shape=(32, 32, 3), 
        kernel_regularizer=tf.keras.regularizers.l1(reg_strength)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=5, 
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l1(reg_strength)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(
        120,
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l1(reg_strength)),
    tf.keras.layers.Dense(
        84,
        activation='relu',
        kernel_regularizer=tf.keras.regularizers.l1(reg_strength)),
    tf.keras.layers.Dense(10,
        activation='softmax',
        kernel_regularizer=tf.keras.regularizers.l1(reg_strength))])

LAMBDA1 = 0.001
LAMBDA2 = 0.01

print(f"Training model 1 with L1 lambda of {LAMBDA1}")
train_model(create_L1_model(LAMBDA1))
print(f"Training model 2 with L1 lambda of {LAMBDA2}")
train_model(create_L1_model(LAMBDA2))
%load_ext tensorboard
%tensorboard --logdir logs

Training model 1 with L1 lambda of 0.001
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Training model 2 with L1 lambda of 0.01
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 3752), started 3:13:23 ago. (Use '!kill 3752' to kill it.)

### Remove fully-connected layers: *Modify the architecture to remove the fully-connected layers at the backend of the network. For example, replace with Global Average Pooling or an alternative. Report the change in the number of parameters for this model compared to previous.*


In [24]:
#A model that applies an L1 regularization at every layer
def create_average_pooling_model():
  return tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(
        filters=6, 
        kernel_size=5, 
        activation='relu', 
        input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(
        filters=16,
        kernel_size=5, 
        activation='relu'),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Flatten()])

print(f"Training model with Global Average Pooling")
train_model(create_average_pooling_model())

%load_ext tensorboard
%tensorboard --logdir logs

Training model with Global Average Pooling
Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 3752), started 3:20:17 ago. (Use '!kill 3752' to kill it.)

### Analyze the accuracy of the different models: *For all six models, train/test your model three times to get a sense of the consistency of the test error. Keep other aspects of your model the same among designs (# epochs, mini-batch size, hyperparameters). Generate a table that summarizes the training error, test error, standard deviation of test error across three runs, inference time, and \# of parameters for each model.*

### Analyze the weights of the regularized models: *For the baseline model and the four regularized models (from parts 2 and 3: L2 and L1 regularization with two different strengths each) measure the sparsity of the weights in each FC layer and create a bar chart that compares the sparsity between the models in each layer. There are a number of metrics that measure sparsity, e.g. Hoyer's index. For dierent sparsity metrics, see Table I in https://arxiv.org/abs/0811.4706*