<img src="https://dsiag.ch/images/dsi_rgb.png" alt="dsi logo" width="100" style="position: absolute; right: 0px;"/>

# Exercise: Regularization

### Load Fashion MNIST Dataset

Fashion-MNIST is a dataset of Zalando’s article images 

https://research.zalando.com/welcome/mission/research-projects/fashion-mnist/


In [2]:
import tensorflow as tf
import matplotlib.pyplot as plt

In [3]:
# fashion mnist is integrated into tf.keras.datasets
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# define class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Put grayscale values between 0 and 1
train_images = train_images / 255.0
test_images = test_images / 255.0

# Convert Labels to One-Hot Encoding 
train_labels_one_hot = tf.one_hot(train_labels, len(class_names))
test_labels_one_hot = tf.one_hot(test_labels, len(class_names))

## Exercise Tasks

**1. Create a sequential feed forward tensorflow model with l2 regularization:**
- the 28x28 input images are flattened to a vector
- the internal layer contains 128 nodes and is activated with the RELU function and a **l2 kernel regularizer**
- the output layer has ten output logits (one for each class in the one-hot encoded label)
- the softmax function is applied to the logits
- it uses a stochastic gradient descent optimizer with a categorical crossentropy loss function


Train the model on the training data for 10 epochs: How good is the accuracy on the training data?
    
Evaluate your model: How good is the accuracy on the test data?

**2.  Create a sequential feed forward tensorflow model with dropout regularization:**
- the 28x28 input images are flattened to a vector
- the internal layer contains 128 nodes and is activated with the RELU function
- **two dropout layers regularize the model and prevent overfitting**
- the output layer has ten output logits (one for each class in the one-hot encoded label)
- the softmax function is applied to the logits
- it uses a stochastic gradient descent optimizer with a categorical crossentropy loss function

Train the model on the training data for 10 epochs: How good is the accuracy on the training data?
    
Evaluate your model: How good is the accuracy on the test data?

**3. Compare the two regularization techniques, which one would you prefer?**

**4. (Optional) Can you improve your model(s)?**




#### Links / References:

- TensorFlow Quick Start for Beginners: https://www.tensorflow.org/tutorials/quickstart/beginner
- TensorFlow Keras Classification https://www.tensorflow.org/tutorials/keras/classification
- TensorFlow Overfit and Underfit https://www.tensorflow.org/tutorials/keras/overfit_and_underfit

## Feed Forward Neural Net with l2 regularization

In [None]:
# Define model
l2Model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

l2Model.compile(optimizer='adam',
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])
l2Model.summary()

In [22]:
# Train Model
l2Model.fit(train_images, train_labels_one_hot, epochs=30)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x2ddfacfa0>

In [23]:
# Evaluate

test_loss, test_acc = l2Model.evaluate(test_images,  test_labels_one_hot, verbose=2)

l2Model.predict(test_images)

313/313 - 0s - loss: 0.5755 - accuracy: 0.8128 - 158ms/epoch - 504us/step


array([[2.0736930e-05, 1.7025479e-06, 7.3970837e-06, ..., 3.9631811e-01,
        1.6323115e-04, 4.5134342e-01],
       [2.8443062e-03, 9.3030224e-09, 9.5105702e-01, ..., 6.2054977e-17,
        2.4820949e-05, 1.7752392e-12],
       [9.8315704e-06, 9.9998391e-01, 1.5340872e-06, ..., 3.4426714e-10,
        2.9953927e-08, 6.6042576e-13],
       ...,
       [2.6600435e-01, 7.8347193e-06, 1.0628628e-02, ..., 1.0941106e-06,
        6.0520869e-01, 5.8511637e-06],
       [5.1299485e-06, 9.9972850e-01, 2.8920388e-07, ..., 6.1263535e-07,
        1.2104543e-07, 1.4075777e-08],
       [1.2343496e-04, 3.8417561e-06, 2.7144077e-04, ..., 2.4742501e-02,
        9.1397995e-04, 6.3444447e-04]], dtype=float32)

## Feed Forward Neural Net with Dropout regularization

In [57]:
from keras import regularizers

# Define model
dropoutModel = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

dropoutModel.compile(optimizer='adam',
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])
dropoutModel.summary()

Model: "sequential_19"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_19 (Flatten)        (None, 784)               0         
                                                                 
 dense_38 (Dense)            (None, 64)                50240     
                                                                 
 dropout_20 (Dropout)        (None, 64)                0         
                                                                 
 dense_39 (Dense)            (None, 10)                650       
                                                                 
 softmax_19 (Softmax)        (None, 10)                0         
                                                                 
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________


In [58]:
# Train Model
dropoutModel.fit(train_images, train_labels_one_hot, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x33cccdb80>

In [56]:
# Evaluate

test_loss, test_acc = dropoutModel.evaluate(test_images,  test_labels_one_hot, verbose=2)

dropoutModel.predict(test_images)

313/313 - 0s - loss: 0.6707 - accuracy: 0.8861 - 124ms/epoch - 397us/step


array([[3.3070443e-33, 0.0000000e+00, 0.0000000e+00, ..., 1.0423998e-10,
        0.0000000e+00, 1.0000000e+00],
       [3.4867647e-14, 0.0000000e+00, 1.0000000e+00, ..., 0.0000000e+00,
        1.4910449e-26, 0.0000000e+00],
       [0.0000000e+00, 1.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       ...,
       [6.5609187e-29, 0.0000000e+00, 6.4767921e-37, ..., 0.0000000e+00,
        1.0000000e+00, 0.0000000e+00],
       [0.0000000e+00, 1.0000000e+00, 0.0000000e+00, ..., 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00],
       [4.9685348e-15, 0.0000000e+00, 1.8933569e-27, ..., 5.3921558e-06,
        3.3581203e-13, 2.8084856e-21]], dtype=float32)