# Spring 2022
# CPSC 585 Project 3
## Raymond Carpio
## Yu Pan
## Sijie Shang
## John Tu

# 1. At the end of the last project, you applied a simple Convolutional Neural Network (CNN or “Convnet”) example to the MNIST Letters dataset. You should have found that while the EMNIST Letters are harder to learn than the MNIST digits, switching to a different network architecture led to a significant increase in model performance.
# You may have noticed, however, that the training process was slower. This means that experiments take longer, and mistakes can be costly. You saw in Project 1 that learning curves can help to understand the training process and diagnose potential problems. Unfortunately, while the Keras fit() method does return a History object that can be used to plot a curve, it does not return until the training process is complete.
# In order to avoid down dead-ends while adjusting and tuning your model, TensorFlow includes the TensorBoard tool and the TensorBoard notebook extension for this purpose.
# Add the TensorBoard callback to the CNN model from the previous project, and add TensorBoard to your notebook to visualize the training process.
## Note: if you get a 403 error when trying to use TensorBoard in Google Colab, you may need to enable third-party cookies.

In [1]:
import numpy as np # Needed to do NumPy functions
from matplotlib import pyplot as plt # Needed to do matplotlib operations

In [2]:
#load EMNIST dataset
emnist_data = np.load('emnist_letters.npz')

train_img = emnist_data['train_images']
train_label = emnist_data['train_labels']

test_img = emnist_data['test_images']
test_label = emnist_data['test_labels']

validate_img = emnist_data['validate_images']
validate_label = emnist_data['validate_labels']

#prepare the data
train_img = train_img.reshape((104000, 28, 28))
train_img = train_img.astype("float32") / 255
test_img = test_img.reshape((20800, 28, 28))
test_img = test_img.astype("float32") / 255
validate_img = validate_img.reshape((20800, 28, 28))
validate_img = validate_img.astype("float32") / 255

In [3]:
%load_ext tensorboard

In [4]:
import tensorflow as tf
import datetime, os
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import regularizers
from tensorflow.keras import initializers
from tensorflow.keras import mixed_precision

distribution_strategy = tf.distribute.MirroredStrategy()

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)


In [5]:
num_classes = 27
input_shape = (28, 28, 1)

train_img_convnet = np.expand_dims(train_img, -1)
test_img_convnet = np.expand_dims(test_img, -1)
validate_img_convnet = np.expand_dims(validate_img, -1)

In [6]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 1600)              0         
                                                                 
 dropout (Dropout)           (None, 1600)              0

In [7]:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [8]:
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
model.fit(train_img_convnet, train_label, epochs=20, batch_size=128, callbacks=[tensorboard_callback], validation_data=(validate_img_convnet, validate_label))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x195113d4be0>

In [9]:
%tensorboard --logdir logs

Reusing TensorBoard on port 6006 (pid 5912), started 19 days, 0:22:35 ago. (Use '!kill 5912' to kill it.)

In [10]:
model.evaluate(test_img_convnet, test_label, verbose=1)



[0.3500271439552307, 0.8939422965049744]

## The performance of the default example Keras CNN on the EMNIST Letters test set is around 87-88% after 20 epochs. This will be the baseline for comparison.

# 2. Now that you have a baseline convolutional network for comparison, begin experimenting with adjusting hyperparameters and alternative architectures (e.g. adding Dense hidden layers to learn combinations of features). How much can you improve its accuracy on the validation set?
# Use the techniques you learned in Chapters 3 and 4 of the textbook to obtain the highest accuracy you can, including:

### Weight initialization
### Choice of activation function
### Choice of optimizer
### Batch normalization
### Regularization
### Dropout
### Early Stopping

## (You will notice that some of these techniques are already in use in the Simple MNIST convnet example.)
## You may find the slides for Chapter 3 helpful, particularly the presentation “Neural Network Training [Initialization, Preprocessing, Mini-Batching, Tuning, and Other Black Art].”

In [11]:
# Try to obtain the highest accuracy possible by creating a new model with the following parameters added/changed:
# weight initialization, activation function, optimizer, batch normalization, regularization, dropout, and early stopping

def create_new_model():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.Input(shape=input_shape))
    model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(num_classes, activation="softmax"))
    model.compile(loss="categorical_crossentropy", optimizer="adamax", metrics="accuracy", steps_per_execution=32)
    return model

with distribution_strategy.scope():
  new_classifier = create_new_model()
  new_classifier.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_2 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 batch_normalization (BatchN  (None, 5, 5, 64)         256       
 ormalization)                                                   
                                                      

In [12]:
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
with distribution_strategy.scope():
  new_classifier.fit(train_img_convnet, train_label, epochs=20, batch_size=128, 
                   callbacks=[tensorboard_callback], validation_data=(validate_img_convnet, validate_label))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## The first change made to the model was the addition of a batch normalization layer after the convolutional layers. This caused the Adam optimizer to exhibit noticeably greater variance on the validation set. Similar optimizers like RMSprop and Nadam exhibited the same behavior. The addition of a second batch normalization layer caused these optimizers to fail to converge entirely on the validation set, even as they quickly converged to nearly 100% accuracy on the training set. The batch normalization layers seem to have increased the model's overfitting behavior when used with these optimizers.

## The Adagrad and Adadelta optimizers exhibted much less variance on the validation set when used with the batch normalization layers. However, they also learned much more slowly compared to the original model with the Adam optimizer. 

## The best performance in terms of loss and accuracy on the validation set was achieved with the Adamax optimizer. Unlike the Adam optimizer, the Adamax optimizer actually performed better on the validation set when used with a batch normalization layer, and after 20 epochs it reached 90% accuracy on both the validation and test sets, which is slightly better than the original model's performance.

In [13]:
#Early stopping
new_classifier_ES = create_new_model()
new_classifier_ES.summary()
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
callbackES = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)
callbackTB = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
new_classifier_ES.fit(x=train_img_convnet, y=train_label, epochs=50, batch_size=128, callbacks=[callbackES, callbackTB], validation_data=(validate_img_convnet, validate_label))

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_4 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 batch_normalization_1 (Batc  (None, 5, 5, 64)         256       
 hNormalization)                                                 
                                                      

Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x19512a3adf0>

## Early stopping with restore best weights set to True was added to the previous model to address the variance in validation set loss and accuracy. This ensured that the final model used on the test set would have the weights that resulted in the best performance on the validation set.  

In [14]:
#change weight initializers
def new_model_l2_weight():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.Input(shape=input_shape))
    model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", kernel_initializer=initializers.HeUniform(),
                                     kernel_regularizer=regularizers.l2(0.001), bias_regularizer=regularizers.l2(0.001)))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu", kernel_initializer=initializers.HeUniform(),
                                     kernel_regularizer=regularizers.l2(0.001), bias_regularizer=regularizers.l2(0.001)))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(num_classes, activation="softmax",
                                    kernel_regularizer=regularizers.l2(0.001), bias_regularizer=regularizers.l2(0.001)))
    model.compile(loss="categorical_crossentropy", optimizer="adamax", metrics="accuracy", steps_per_execution=32)
    return model

model_IW = new_model_l2_weight()
model_IW.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_6 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_7 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_7 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 batch_normalization_2 (Batc  (None, 5, 5, 64)         256       
 hNormalization)                                                 
                                                      

In [15]:
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
callbackES = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)
callbackTB = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
model_IW.fit(x=train_img_convnet, y=train_label, epochs=50, batch_size=128, callbacks=[callbackES, callbackTB], validation_data=(validate_img_convnet, validate_label))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x19512f97fa0>

## Further adjustments were made to the model by changing some layer initializations and adding L2 regularization. The convolutional layers' initialization was changed to He (Kaiming) Initialization, which is recommended for ReLU activation (uniform is also said to be slightly preferable to normal). The default Glorot (Xavier) Initialization for the Dense layer is already the recommended for softmax activation, so that was not changed. L2 regularization was added to all layers in an attempt to decrease the remaining variance seen in the validation set results. 

## Neither of these adjustments appears to have improved the model's learning and prediction behavior. This is most likely due to the fact that the batch normalization and dropout layers have already corrected for overfitting to a point where other techniques that correct for overfitting will not have any additional benefits. Both this model and the previous model perform almost identically on the validation set, and any slight performance differences tended to favor the earlier model without the additional regularization techniques applied.

# 3. When finished tuning, save your model and evaluate the results on the test set.

In [16]:
new_classifier_ES.save("new_model")
%tensorboard --logdir logs

INFO:tensorflow:Assets written to: new_model\assets


Reusing TensorBoard on port 6006 (pid 5912), started 19 days, 1:58:47 ago. (Use '!kill 5912' to kill it.)

## Since both Adamax models perform almost identically on the validation set, and slight performance differences tended to favor the earlier model without the additional regularization techniques applied, the model without the additional regularization was chosen to be the saved model.

In [17]:
new_classifier_ES.evaluate(test_img_convnet, test_label, verbose=1)
model_IW.evaluate(test_img_convnet, test_label, verbose=1)



[0.4778617024421692, 0.8909134864807129]

# 4. Now build and train a new model for the Binary Alphadigits dataset. What is the best validation accuracy that you can achieve?
## Note: this dataset does not include separate validation and test sets, so you will need to use another method such as the using the validation_split parameter rather than validation_data. While it is possible to improve performance using cross-validation, as described in Chapter 4 it is generally regarded as too expensive to train, especially when (as we will soon see), there are other methods.

In [18]:
binary_alphadigits=np.load('binaryalphadigs.npz') # Load the dataset with NumPy.

binary_images = binary_alphadigits['images']
binary_labels = binary_alphadigits['labels']

# Inspect the contents of Binary Alphadigits dataset.
print(binary_images.shape)
print(binary_labels.shape)

# Reshape the size for binary_images.
binary_images = binary_images.reshape(1014, 20, 16)
binary_images2 = np.expand_dims(binary_images, -1)
input_shape2 = (20, 16, 1)

(1014, 320)
(1014, 27)


In [19]:
# Build a similar model from problem 2, except try to add different hyperparameters to improve accuracy.
def create_new_model_3():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.Input(shape=input_shape2))
    model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", 
                                     kernel_initializer=initializers.HeUniform(), 
                                     kernel_regularizer=regularizers.l2(0.001), bias_regularizer=regularizers.l2(0.001)))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu",
                                     kernel_initializer=initializers.HeUniform(), 
                                     kernel_regularizer=regularizers.l2(0.001), bias_regularizer=regularizers.l2(0.001)))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(num_classes, activation="softmax",
                                    kernel_regularizer=regularizers.l2(0.001), bias_regularizer=regularizers.l2(0.001)))
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics="accuracy")
    return model

new_classifier_2 = create_new_model_3()
new_classifier_2.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_8 (Conv2D)           (None, 18, 14, 32)        320       
                                                                 
 max_pooling2d_8 (MaxPooling  (None, 9, 7, 32)         0         
 2D)                                                             
                                                                 
 conv2d_9 (Conv2D)           (None, 7, 5, 64)          18496     
                                                                 
 max_pooling2d_9 (MaxPooling  (None, 3, 2, 64)         0         
 2D)                                                             
                                                                 
 batch_normalization_3 (Batc  (None, 3, 2, 64)         256       
 hNormalization)                                                 
                                                      

## Although the model that performed best on the EMNIST Letters dataset was a model with an Adamax optimizer, no L1 or L2 regularization, and no change to the default Keras initializations for each layer, the Adamax optimizer learned too slowly for the Binary Alphadigits dataset compared to the Adam optimizer, so the optimizer for the Binary Alphadigits model was changed back to Adam, and the initialization changes and L2 regularization were reintroduced to regularize the variance in the Adam optimizer's predictions.  

In [20]:
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
callbackES = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)
callbackTB = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
new_classifier_2.fit(binary_images2, binary_labels, epochs=50, batch_size=128, callbacks=[callbackES, callbackTB], validation_split=0.2)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50


<keras.callbacks.History at 0x195135cbeb0>

## Some runs produced validation accuracy over 2%, but this was rare, so the 1.5% accuracy is more representative of the model's performance.

In [21]:
%tensorboard --logdir logs

Reusing TensorBoard on port 6006 (pid 5912), started 19 days, 1:59:00 ago. (Use '!kill 5912' to kill it.)

In [22]:
new_classifier_2.evaluate(binary_images2, binary_labels, verbose=1)



[3.6749494075775146, 0.09368836134672165]

## Interesting to note that the prediction accuracy on the test set is higher than on the validation set. This is most likely due to the fact that the validation set, instead of being an entirely separate set of data as in the case of the EMNIST Letters, was simply a very small random portion of the training set, and therefore likely to contain samples in the set that were not learned well during training since training accuracy was only about 60%. Meanwhile the test set is a completely separate set of data, which may contain slighty more samples that were learned well during training. 

# 5. From your experience in experiment (4), what can you conclude about the dataset?

## A: From what I learned so far in experiment 4, the dataset is smaller in size than the EMNIST letters dataset, which explains that it takes less time to fit the model and evaluate.  Based on the training accuracy obtained for Binary Alphadigits, I can also conclude that underfitting occurred due to the small number of training samples.

# 6. The process of transfer learning can be used to apply an existing model to a new dataset. Use the Keras Developer Guide Transfer learning & fine-tuning to apply the model you saved in step (3) to the Binary Alphadigits dataset.
# Note that since the images are different sizes in the two datasets, you will need to use tf.image.resize_with_pad() to get them into the right format.

In [23]:
from tensorflow.image import ResizeMethod

resized_image = tf.image.resize_with_pad(
    binary_images2,
    target_height = 28,
    target_width = 28,
    method=ResizeMethod.BILINEAR,
    antialias=False
)

In [24]:
#load model
reconstructed_model = keras.models.load_model("new_model")
reconstructed_model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_4 (Conv2D)           (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 batch_normalization_1 (Batc  (None, 5, 5, 64)         256       
 hNormalization)                                                 
                                                      

In [25]:
#test the unmodified saved model on the Binary Alphadigits dataset without affecting base for new model
reconstructed_model1 = reconstructed_model
reconstructed_model1.evaluate(resized_image, binary_labels, verbose=1)



[2722.514892578125, 0.18047337234020233]

# Is the model you trained on EMNIST Letters about to recognize letters from this new dataset?

## A: Based on the accuracy obtained for this problem, the model trained on EMNIST letters is able to recognize the letter from the new dataset.

# 7. Can you improve the performance by adding additional trainable layers and fine-tuning the network?

In [26]:
#make the model and manually remove the non-convolutional layers
inner_model = tf.keras.Sequential()
for layer in reconstructed_model.layers[0:4]:
  inner_model.add(layer)

#freeze the layers
for layer in inner_model.layers:
  layer.trainable=False

In [27]:
# construct a model on top of base
inputs = keras.Input(shape=input_shape)
x = inner_model(inputs, training=False)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Dense(256, activation="relu")(x)
outputs = keras.layers.Dense(num_classes, activation="softmax")(x)


#base + newly constructed layers
transfer_model = keras.Model(inputs, outputs)
transfer_model.summary()


Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 sequential_5 (Sequential)   (None, 5, 5, 64)          18816     
                                                                 
 flatten_5 (Flatten)         (None, 1600)              0         
                                                                 
 dropout_5 (Dropout)         (None, 1600)              0         
                                                                 
 dense_5 (Dense)             (None, 256)               409856    
                                                                 
 dense_6 (Dense)             (None, 27)                6939      
                                                                 
Total params: 435,611
Trainable params: 416,795
Non-trainable

In [28]:
transfer_model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.CategoricalCrossentropy(from_logits=False),
              metrics=[keras.metrics.CategoricalAccuracy()])

In [29]:
callback = tf.keras.callbacks.EarlyStopping(monitor='categorical_accuracy', patience=10, restore_best_weights=True)
transfer_model.fit(resized_image, binary_labels, epochs=30, batch_size=128, callbacks=[callback], validation_split=0.1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0x195134c7f10>

## To our saved model from Part 2, we added a single Dense layer before the output layer. This did not seem to improve model performance on the Binary Alphadigits dataset, as the baseline from the unchanged Part 2 model was 22% accuracy. It is unclear to us why loss begins to consistently increase (as opposed to merely fluctuate) on the validation set even as accuracy also continues to increase. Some sources say this may be due to the model beginning to overfit while continuing to learn at the same time.

In [30]:
# fine tune by unfreezing the base_model and re-compiling
inner_model.trainable = True
transfer_model.summary()


Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 sequential_5 (Sequential)   (None, 5, 5, 64)          18816     
                                                                 
 flatten_5 (Flatten)         (None, 1600)              0         
                                                                 
 dropout_5 (Dropout)         (None, 1600)              0         
                                                                 
 dense_5 (Dense)             (None, 256)               409856    
                                                                 
 dense_6 (Dense)             (None, 27)                6939      
                                                                 
Total params: 435,611
Trainable params: 435,611
Non-trainable

In [31]:
transfer_model.compile(optimizer=keras.optimizers.Adam(1e-5),
              loss=keras.losses.CategoricalCrossentropy(from_logits=False),
              metrics=[keras.metrics.CategoricalAccuracy()])


In [32]:
callback = tf.keras.callbacks.EarlyStopping(monitor='categorical_accuracy', patience=10)
transfer_model.fit(resized_image, binary_labels, epochs=50, batch_size=128, callbacks=[callback], validation_split=0.1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50


<keras.callbacks.History at 0x19514bb7b80>

In [33]:
transfer_model.save("new_model_2")

INFO:tensorflow:Assets written to: new_model_2\assets


# 8. Another training technique, described in Section 8.4.3 of the textbook, is data augmentation. See Section 8.2 of Deep Learning with Python, Second Edition for details.
# How much can you improve the accuracy of the model using technique?

In [34]:
# data augmentation layer = first layer after input
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.3),
])


In [35]:
# reconstruct a base model
reconstructed_model_2 = tf.keras.models.load_model("new_model_2")
reconstructed_model_2.summary()


Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 sequential_5 (Sequential)   (None, 5, 5, 64)          18816     
                                                                 
 flatten_5 (Flatten)         (None, 1600)              0         
                                                                 
 dropout_5 (Dropout)         (None, 1600)              0         
                                                                 
 dense_5 (Dense)             (None, 256)               409856    
                                                                 
 dense_6 (Dense)             (None, 27)                6939      
                                                                 
Total params: 435,611
Trainable params: 435,611
Non-trainable

In [36]:
inner_model_2 = tf.keras.Sequential()
for layer in reconstructed_model_2.layers:
  inner_model_2.add(layer)


# freeze
for layer in inner_model_2.layers:
  layer.trainable=False


In [37]:
# create the data augmentation layer on top as the first layer
inputs = keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = inner_model_2(x, training=False)
x = keras.layers.Dense(128, activation="relu")(x)
outputs = keras.layers.Dense(num_classes, activation="softmax")(x)


transfer_model_2 = keras.Model(inputs, outputs)
transfer_model_2.summary()


Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_7 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 sequential_6 (Sequential)   (None, 28, 28, 1)         0         
                                                                 
 sequential_7 (Sequential)   (None, 27)                435611    
                                                                 
 dense_7 (Dense)             (None, 128)               3584      
                                                                 
 dense_8 (Dense)             (None, 27)                3483      
                                                                 
Total params: 442,678
Trainable params: 7,067
Non-trainable params: 435,611
_________________________________________________________________


In [38]:
transfer_model_2.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.CategoricalCrossentropy(from_logits=False),
              metrics=[keras.metrics.CategoricalAccuracy()])

In [39]:
# callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)
transfer_model_2.fit(resized_image, binary_labels, epochs=100, batch_size=128, validation_split=0.1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100


Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100


Epoch 100/100


<keras.callbacks.History at 0x1951cdeffd0>

In [40]:
# unfreeze
inner_model_2.trainable=True
transfer_model_2.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_7 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 sequential_6 (Sequential)   (None, 28, 28, 1)         0         
                                                                 
 sequential_7 (Sequential)   (None, 27)                435611    
                                                                 
 dense_7 (Dense)             (None, 128)               3584      
                                                                 
 dense_8 (Dense)             (None, 27)                3483      
                                                                 
Total params: 442,678
Trainable params: 442,678
Non-trainable params: 0
_________________________________________________________________


In [41]:
transfer_model_2.compile(optimizer=keras.optimizers.Adam(1e-5),
              loss=keras.losses.CategoricalCrossentropy(from_logits=False),
              metrics=[keras.metrics.CategoricalAccuracy()])


In [42]:
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)
transfer_model_2.fit(resized_image, binary_labels, epochs=30, batch_size=128, callbacks=[callback], validation_split=0.1)


Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30


<keras.callbacks.History at 0x19518f670d0>

## A: Via data augmentation, we extended the small dataset to contain a wider variety of sample data, which decreases the ability of the model to simply memorize the training data. As a result, we see the training accuracy drop to become much closer to the validation accuracy, indicating that the model is no longer overfitting to the training data and will be better at generalizing. The improved generalization ability is reflected in the increased validation accuracy.