**Cole Turner and Ethan Seal**

Fall 2019

CS343: Neural Networks

Project 4: Transfer Learning

In [None]:
import os
import random
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

plt.style.use(['seaborn-colorblind', 'seaborn-darkgrid'])
plt.rcParams.update({'font.size': 20})

np.set_printoptions(suppress=True, precision=3)
rgen = np.random.RandomState(1)

# Automatically reload external modules
%load_ext autoreload
%autoreload 2

*Sanity check that Tensorflow is installed correctly:*

Executing the following cell should return 3 

In [None]:
tf.print(tf.reduce_sum([tf.Variable(1), tf.Variable(2)]))

## Task 1) Implement ConvNet4 in Tensorflow, train/test on STL-10

### 1a) Use the high level `Keras::Sequential` API in Tensorflow 2.0 to implement the architecture of ConvNet4 from the last project. Train and test your network on the STL-10 dataset. 

Recall the `Keras::Sequential` common worflow:

- Build structure of network with `keras::Sequential`.
- Compile network with your choice of optimizer, loss, and metrics.
- Fit the model (remembering to pass in the appropriate training and validation sets). This results a history object that can be used to examine training/validation accuracy and loss.
- Evaluate the model on the test set. This returns test loss and accuracy.

**Notes**:
- You should use the usual STL-10 data acquistion and preprocessing code from your last project.
- You don't need to do a hyperparameter search. Values that worked on the CNN project should get you in the ballpark here. The goal is to show that you know how to put together a `keras::Sequential` model and have it work successfully.
- Tensorflow needs the RGB color channel AFTER the spatial dimensions. For example: (64, 64, 3), not (3, 64, 64). You may therefore need to slightly modify the preprocesssing pipeline for this project.

These documentation pages should be helpful:
- https://www.tensorflow.org/api_docs/python/tf/keras/Sequential
- https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile
- https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate
- https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
- https://www.tensorflow.org/api_docs/python/tf/keras/Model#summary

In [None]:
import load_stl10_dataset
from preprocess_data import preprocess_stl, create_splits

In [None]:
stl_imgs, stl_labels = load_stl10_dataset.load()

In [None]:
classes = np.loadtxt(os.path.join('data', 'stl10_binary', 'class_names.txt'), dtype=str)

In [None]:
# Preprocess image pixel values for the MLP net
stl_imgs, stl_labels = preprocess_stl(stl_imgs, stl_labels)
print(f'stl_imgs dtype is {stl_imgs.dtype} and it should be float64')
print(f'stl_imgs max is {np.max(stl_imgs[:, 1:]):.3f} and it should be 0.668')
print(f'stl_imgs shape is {stl_imgs.shape} and it should be (5000, 32, 32, 3)')
print(f'stl_labels span {stl_labels.min()}->{stl_labels.max()} and it should be 0->9')

In [None]:
x_train, y_train, x_test, y_test, x_val, y_val, x_dev, y_dev = create_splits(stl_imgs, stl_labels)  
print ('Train data shape: ', x_train.shape)
print ('Train labels shape: ', y_train.shape)
print ('Test data shape: ', x_test.shape)
print ('Test labels shape: ', y_test.shape)
print ('Validation data shape: ', x_val.shape)
print ('Validation labels shape: ', y_val.shape)
print ('dev data shape: ', x_dev.shape)
print ('dev labels shape: ', y_dev.shape)

### 1b) Make 2 "high quality" plots showing the following

- Plot the training and validation accuracy (y axis) over training epochs (x axis).
- Plot the training and validation loss (y axis) over epochs (x axis).

A high quality plot consists of:
- A useful title
- X and Y axis labels
- A legend

**Question 1:** What accuracy do you get on the STL-10 test set? Briefly summarize the hyperparameters that you used to obtain this result.

**Question 2:** How do the loss and accurary results compare to the CNN projct?

## Task 2) Transfer learning

Use Tensorflow 2.0 to download the pre-trained MobileNetV2 network (you may also use InceptionV3, but MobileNetV2 likely will run noticeably faster on your machine). We will use transfer learning to accelerate training to solve a novel problem: **the binary classification task of discriminating whether an image is of a hotdog or not.**

### Overview of the task

- Run some hotdog-or-not dataset images through the network. How does the net seem to classify things correctly?
- Remove the output layer.
- Add a new Dense output layer.
- Freeze (disable) training on all non-output layers.
- Train the last layer on a food dataset. Assess performance. Plot some example images and their classification below

**TODO:**
- Get the **food dataset** on filer: `Courses/CS343/Course_Materials/hot-dog-not-hot-dog`. Copy it into a `data` subfolder in your project directory.
- Run the below code to load in the hot-dog-or-not dataset. Check the shapes to ensure everything is loaded in correctly. 

### 2a) Load in hotdot image dataset

In [None]:
ds_base_dir = 'data/hot-dog-not-hot-dog/numpy/'
hotdog_train_x = np.load(os.path.join(ds_base_dir, 'train_x.npy'))
hotdog_train_y = np.load(os.path.join(ds_base_dir, 'train_y.npy'))
hotdog_test_x = np.load(os.path.join(ds_base_dir, 'test_x.npy'))
hotdog_test_y = np.load(os.path.join(ds_base_dir, 'test_y.npy'))

print(f'Training hotdog split shape: {hotdog_train_x.shape}. Should be (16000, 96, 96, 3)')
print(f'Test hotdog split shape: {hotdog_test_x.shape}. Should be (4000, 96, 96, 3)')

### 2b) Preprocess hotdog dataset

In [None]:
hotdog_train_x = 2*(hotdog_train_x - 0.5)
hotdog_test_x = 2*(hotdog_test_x - 0.5)

### 2c) Create hotdog validation set

In [None]:
VAL_PROP = 0.2  # proportion of trainng set to reserve for validation
VAL_SZ = int(VAL_PROP*len(hotdog_train_x))
hotdog_val_x = hotdog_train_x[-VAL_SZ:]
hotdog_val_y = hotdog_train_y[-VAL_SZ:]
hotdog_train_x = hotdog_train_x[:len(hotdog_train_x)-VAL_SZ]
hotdog_train_y = hotdog_train_y[:len(hotdog_train_y)-VAL_SZ]

print(f'Validation hotdog split shape: {hotdog_train_x.shape}. Should be (12800, 96, 96, 3)')

### 2d) Load in pre-trained MobileNetV2 network.

**TODO:**
- Load in pre-trained MobileNetV2 network (look up constructor in `tf.keras.applications` or look at the tutorial from class) and set it to a variable called `model`. https://www.tensorflow.org/api_docs/python/tf/keras/applications
- Set the `trainable` field of the model object to be `False` (use dot notation).
- If you call the `summary()` method on the network object, you should see a table with many rows. The top and bottom rows should be:


    input_3 (InputLayer)            [(None, 96, 96, 3)]  0                                           
    __________________________________________________________________________________________________
    out_relu (ReLU)                 (None, 3, 3, 1280)   0           Conv_1_bn[0][0]  

and you should see the following at the bottom:

    Total params: 2,257,984
    Trainable params: 0
    Non-trainable params: 2,257,984

### 2e) Replace output layer

**TODO:**
- Create a Dense layer object with the correct number of units to deal with the hot-dog or not problem, which will be the new output layer. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
- Create a new `keras/Sequential` model object composed of a Python list of the following:
    - the original model object
    - "Flatten layer"
    - the dense output layer
- Compile the augmented model with the Adam optimizer (learning rate of 0.0001), binary_crossentropy loss, and accuracy metric. https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam https://www.tensorflow.org/api_docs/python/tf/losses

A summary on the augmented model should yield:

    Layer (type)                 Output Shape              Param #   
    =================================================================
    mobilenetv2_1.00_96 (Model)  (None, 3, 3, 1280)        2257984   
    _________________________________________________________________
    flatten_4 (Flatten)          (None, 11520)             0         
    _________________________________________________________________
    dense_4 (Dense)              (None, 1)                 11521     
    =================================================================
    Total params: 2,269,505
    Trainable params: 11,521
    Non-trainable params: 2,257,984

**Question 3:** What is the accuracy and loss for the network with the untrained output layer?

**Question 4:** Briefly defend your choice of number of units in the output layer.

### 2f) Fit the augmented model on the hotdog training data

**Notes**
- Remember to also pass in the hotdog validation data. Train for 10 epochs with a batch size of 32.
- Setting the verbose optional parameter to 2 will give you helpful printouts of performance on the validation set as it completes every epoch of training.


**NOTE:**
- If training time is taking much more 2.5 minutes per epoch on your computer, you could try reducing the number of data samples in train and validation. For example, by default train `N = 12800`. Try `N = 6400` instead. You could do the same for the validation set.

### 2g) Plot hotdog results

Produce 2 high quality plots showing the following:

- Training and validation loss over epoch.
- Training and validation accuracy over epoch.

**Question 5:** What accuracy do you achieve on the test set? Briefly summarize the hyperparameters that were used in your model.