In [None]:
%matplotlib inline

# Assignment 4

**DUE: Sunday November 21, 2021 11:59pm**

Turn in the assignment via Canvas.

To write legible answers you will need to be familiar with both [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) and [Latex](https://www.latex-tutorial.com/tutorials/amsmath/)

Before you turn this problem in, make sure everything runs as expected. First, restart the kernel (in the menubar, select Runtime→→Restart runtime) and then run all cells (in the menubar, select Runtime→→Run All).

Make sure you fill in any place that says "YOUR CODE HERE" or "YOUR ANSWER HERE", as well as your name below:

In [None]:
NAME = "Suneet Bhandari"
STUDENT_ID = "1704322"

## Gesture Recognition
---
**BEFORE YOU START: Change the runtime to GPU. From the "Runtime" dropdown menu in the toolbar above select "change runtime type". Then change :Hardware accelerator" to GPU.**


American Sign Language (ASL) is a complete, complex language that employs signs made by moving the hands combined with facial expressions and postures of the body. It is the primary language of many North Americans who are deaf and is one of several communication options used by people who are deaf or hard-of-hearing.

The hand gestures representing English alphabet are shown below. In this question, you will focus on classifying these hand gesture images using convolutional neural networks. Specifically, given an image of a hand showing one of the letters, we want to detect which letter is being represented.


<img src = 'https://drive.google.com/uc?id=1nRxq6yqDkmumUuePXfDx_5YgGl9vKXcj' width="300">






Run the following code cell to download the training and test data. It might take a while to download the zip file and extract it.

In [None]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
file_id = '11V_w6LMLhGcdTU-dX_mw_SZxpHmHNq-x'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('asl_alphabet_1000.zip')
!unzip -q asl_alphabet_1000.zip

replace asl_alphabet_train_1000/A/A1.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: n
replace asl_alphabet_train_1000/A/A10.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: no
replace asl_alphabet_train_1000/A/A100.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: none
replace asl_alphabet_train_1000/A/A1000.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

Now that you downloaded the data, you see a directory containing 26 subdirectories that contain the hand gesture images. Notice that the subdirectories are named after the classes. Each of the subdirectories contains 1000 RBG images for its' respective class. Each RBG image has a height and width of $200\times 200$. In Questions 1, 2, and 3, you will use use the tensorflow .image_dataset_from_directory() to create training and validation sets. The following cells implent a small tutorial for visualizing some training examples.

 Coding examples from adapted from: https://www.tensorflow.org/tutorials/load_data/images

### Create a dataset
Define some parameters for the loader:

In [None]:
import numpy as np
import os
import PIL
import PIL.Image
import tensorflow as tf

batch_size = 32 # The batch size
img_height = 200 # Image resize height
img_width = 200 # Image resize width
data_dir = "asl_alphabet_train_1000" # Data directory; you may need to change to location of asl_alphabet_train_1000

Use 80% of the images for training and 20% for validation.

In [None]:
!ls asl_alphabet_train_1000

In [None]:
# Create training dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

In [None]:
# Create validation dataset
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Print the class names in the class_names attribute on these datasets. Notice that the class names are inferred from the subdirectory's names.




In [None]:
class_names = train_ds.class_names
print(class_names)

### Visualize the data


Here are the first 9 images from the training dataset.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

Here are the first 9 images from the validation dataset.

In [None]:
plt.figure(figsize=(10, 10))
for images, labels in val_ds.take(1):
  for i in range(9):
    ax = plt.subplot(3, 3, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

You can train a model using these datasets by passing them to model.fit (shown later in this tutorial). Let's retrieve one batch from the training datset and check the outputs.

In [None]:
for features_batch, labels_batch in train_ds:
  print(features_batch.shape)
  print(labels_batch.shape)
  break
count = 0
count = len(train_ds)
print(count)



features_batch is a single single batch from the training dataset. It contains $32$ $height=200$ by, $width=200$ by, $channel=3$ images from the training dataset. The size of the last dimension is 3, and contains the RGB values of the pixels. The labels are simply of size (32,). 

##  Question 1 - Fully-Connected Neural Network
---

###Part A) Understanding and Processing the Data (10 points)

Now that you downloaded the data, you should see a folder containing the images in their respective subdirectories. Complete the following steps (you may reuse the code from the tutorial):

1) read in the training and test data.

2) make sure that all of your images are of size $200\times 200$. If not, scale them appropriately.

3) rescale the pixel values of the training and test images from [0,255] to [0,1]. <br> Hint: tf.keras.layers.experimental.preprocessing.Rescaling(1./255) is recommended, see example at https://www.tensorflow.org/tutorials/load_data/images)

4) Ensure that your target values (classes) are stored appropriately. You must have 26 classes for 'A-Z'.

In [None]:
normalization_layer = tf.keras.layers.Rescaling(1./255)
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
normalized_vs = val_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch2, labels_batch2 = next(iter(normalized_vs))
first_image = image_batch[0]
first_image2 = image_batch2[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
print(np.min(first_image2), np.max(first_image2))
#for features_batch, labels_batch in train_ds:
  #print(features_batch.shape)
  #print(labels_batch.shape)



###Part B) Building a Fully-Connected Neural Network (10 points)
Now that the dataset is downloaded, let's see what happens when we try to allocate a fully-connected neural network with a flatten layer, two hidden layers and an output layer to take in the $200 \times 200 \times 3$ images. Recall that our batch size is still 32. Don't spend too much time on this model.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from keras.layers import Dense, Dropout, Flatten, Activation, BatchNormalization, Conv2D, MaxPooling2D
from tensorflow.keras.layers import Input, Dense # only use these layers
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import * # you can use any optimizer

# Create the network

def build_model2(): 
  model = Sequential() 
  model.add(Flatten(input_dim=200*200*3, batch_size = 32))
  model.add(Dense(units = 32, activation='relu')) 
  model.add(Dense(units = 32, activation='relu')) 
  model.add(Dense(units = len(class_names), activation='softmax'))
  return model

model = build_model2()
# Build model 


# Get model summary
### YOUR CODE HERE ###
#model.build(None, 200,200,3)
model.summary()

Assuming the code is correct, did you get any errors upon running the cell? If so, why do you think this error occured? Also, how many parameters does the above model have?

 There were some errors initially, saying I the model was not being build and the network inputs were wrong. I fixed this issue with simple syntax fixes. Also by inputing the dimensions and the batch size in the model. This model has 3,841,946 parameters.


##  Question 2 - Convolutional Neural Networks
---
You have seen the shortcomings of using a fully-connected neural network for image recognition tasks. You will now build a convolutional network. For the rest of this assignment, we are not going to give you any starter code. You are welcome to use any code from previous class exercises, section handouts, and lectures. You may reuse your training and validation sets that you created in Question 1. You should also write your own code.

You may use the TensorFlow documentation freely. You might also find online tutorials helpful. However, all code that you submit must be your own.

Make sure that your code is vectorized, and does not contain obvious inefficiencies (for example, unnecessary for loops). Ensure enough comments are included in the code so that your TA can understand what you are doing. It is your responsibility to show that you understand what you write.

Follow the steps below to show your work.

#### Part A) Building the Network (15 points)
Build a convolutional neural network model that takes the ($200\times 200 \times 3$ RGB) image as input, and predicts the letter.

Explain your choice of the architecture: how many layers did you choose? What types of layers did you use? Did you use dropout or normalization layers? What about other decisions like activation functions, kernel size, stride, and padding? Lastly, how many parameters does your model have?

In [None]:
model = Sequential()

model.add(Conv2D(filters = 16,      
                 kernel_size = (3, 3), 
                 padding = 'Same',
                 activation = 'relu', 
                 input_shape = (200, 200, 3)))

model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(BatchNormalization())
model.add(Conv2D(filters = 32,      
                 kernel_size = (3, 3), 
                 padding = 'Same',
                 activation = 'relu'))

model.add(MaxPooling2D(pool_size = (2, 2)))

model.add(Flatten())

model.add(Dense(200, activation = 'relu'))  

model.add(Dense(26, activation = "softmax")) 

model.summary()

Added multiple layers to the model with the size of 200x200x3 because that is how big each image is. 
Last layer added is 26 because of the number of classes there are. Kernel size is 3,3 due to there being 3 layers and making it easier to analyze the data. Padding to make sure the data is read efficiently and nothing is missed. Total number of parameters is 16,010,578.

#### Part B) Training the Network (15 points)
Write code that trains your CNN given the training data (check the dataset tutorial to see how to use .fit with your custom dataset). Your training code should make it easy to tweak the usual hyperparameters, like batch size, learning rate, and the model object itself. Make sure that you are checkpointing your models from time to time (the frequency is up to you). Explain your choice of loss function and optimizer.

Plot the training curve as well.

In [None]:

batchsize = 32
# Each epoch goes through the entire training set once
epochs = 1 

model.compile(optimizer = Adagrad(lr = 0.001),
              loss = 'sparse_categorical_crossentropy', 
              metrics = ['accuracy'])

history = model.fit(train_ds,validation_data=val_ds,epochs = epochs, batch_size = batchsize,verbose = 1)
#model.fit(train_ds,validation_data=val_ds,epochs = 50)

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['train', 'test'])
plt.show()

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['train', 'test'])
plt.show()

[YOUR ANSWER HERE]

#### Part C) Hyperparameter Search (15 points)

1. List 3 hyperparameters that you think are most worth tuning. Choose at least one hyperparameter related to the model architecture.

2. Tune the hyperparameters you listed previously, trying as many values as you need to until you feel satisfied that you are getting a good model. Plot the training curve of at least 4 different hyperparameter settings.

3. Choose the best model out of all the ones that you have trained. Justify your choice.

4. Report the test accuracy of your best model. You should only do this step once.







In [None]:


batchsize = 26
# Each epoch goes through the entire training set once
epochs = 50  

model.compile(optimizer = Adam(lr = 3e-4),
              loss = 'sparse_categorical_crossentropy', 
              metrics = ['accuracy'])

history = model.fit(train_ds,validation_data=val_ds,epochs = epochs, batch_size = batchsize,verbose = 1)
#model.fit(train_ds,validation_data=val_ds,epochs = 50)

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['train', 'test'])
plt.show()

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['train', 'test'])
plt.show()

3 hyperparameters that I think are most worth tuning are the amount of hidden layers, learning rate, and the optimizers used. I changed the batch size, the number of epochs the optimizer and the learning rate. I choose the second model since it produces a better output than the first one. the accuracy for the both validation and training data was much better. The best test accuracy I saw was 0.9696.

## Question 3 - Transfer Learning
---
For many image classification tasks, it is generally not a good idea to train a very large deep neural network model from scratch due to the enormous compute requirements and lack of sufficient amounts of training data.

One of the better options is to try using an existing model that performs a similar task to the one you need to solve. This method of utilizing a pre-trained network for other similar tasks is broadly termed Transfer Learning. In this assignment, we will use Transfer Learning to extract features from the hand gesture images. Then, train a smaller network to use these features as input and classify the hand gestures.

As you have learned from the CNN lecture, convolution layers extract various features from the images which get utilized by the fully-connected layers for correct classification.


Keras even has pretrained models built in for this purpose. 

#### Keras Pretrained Models
        Xception
        VGG16
        VGG19
        ResNet, ResNetV2, ResNeXt
        InceptionV3
        InceptionResNetV2
        MobileNet
        MobileNetV2
        DenseNet
        NASNet

Usually one uses the layers of the pretrained model up to some point, and then creates some fully connected layers to learn the desired recognition task. The earlier layers are "frozen", and only the later layers need to be trained. We'll use VGG16, which was trained to recognize 1000 objects in ImageNet. What we're doing here for our classifier may be akin to killing a fly with a shotgun, but the same process can be used to recognize objects the original network couldn't (i.e., you could use this technique to train your computer to recognize family and friends).

In [None]:
# Some stuff we'll need...
from tensorflow.keras.layers import Input
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD, Adagrad, Adadelta, RMSprop 
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Dropout, Flatten, Activation
from tensorflow.keras.layers import Conv2D, MaxPooling2D

# Powerful deep learning module.
import tensorflow as tf

# For dealing with data.
import numpy as np  

Creating this pretrained network is a one line command. Notice we specified that the "top" should not be included. We aren't classifying 1000 different categories like ImageNet, so we don't include that layer. We'll add our own layer more suited to the task at hand.

In [None]:
# Import the VGG16 trained neural network model, minus it's last (top) neuron layer.
base_model = VGG16(weights = 'imagenet', 
                   include_top = False, 
                   input_shape = (200, 200, 3), 
                   pooling = None)

Let's take a look at this pretrained model:

In [None]:
base_model.summary()

Please do realize, this may be overkill for our toy recognition task. One could use this network with some layers (as we're about to add) to recognize 100 dog breeds or to recognize all your friends. If you wanted to recognize 100 dog breeds, you would use a final 100 neuron softmax for the final layer. We'll need a final softmax layer as before. First let's freeze all these pretrained weights. They are fine as they are.

In [None]:
# This freezes the weights of our VGG16 pretrained model.
for layer in base_model.layers:  
    print(layer)
    layer.trainable = False

### Part A) Building the Classifier (10 points)
Now let's just add a flatten layer, a trainable dense layer, and a final softmax layer to the network to complete the classifier model for our gesture recognition task. Use Keras' functional approach to building a network.

In [None]:
# Now add layers to our pre-trained base model and add classification layers on top of it
#x = ### YOUR CODE HERE ### 
x = base_model.output
x = Flatten()(x)
x = Dense(units=60,activation='relu')(x)
x = Dense(units=26,activation='softmax')(x)  

# And now put this all together to create our new model.
model = Model(inputs = base_model.input, outputs = x) 
model.summary()


### Part B) Initializing Training Parameters (5 points)

Compile the model using an appropriate loss function and optimizer.

In [None]:
# Compile the model.
opt = Adam(learning_rate=0.001)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])
# printing out a summary of the model
model.summary()

### Part C) Training the Model (10 points)

Train your new network, including any hyperparameter tuning. Plot the training curve of your best model only.

As you can see here in the Keras docs:

https://keras.io/api/applications/vgg/#vgg16-function

that we are required to preprocess our image data in a specific way to use this pretrained model, so let's go ahead and do that first.

In [None]:
# Preprocess your input image data
train_ds = train_ds.map(lambda x, y: (tf.keras.applications.vgg16.preprocess_input(x), y))
val_ds = val_ds.map(lambda x, y: (tf.keras.applications.vgg16.preprocess_input(x), y))



In [None]:
# Train the model

model_info = model.fit(train_ds,validation_data=val_ds,epochs = 50)



In [None]:
# Plot the training curve

# Data visualizaton.
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns
import random as rn

def plot_losses(model_info):
    plt.plot(model_info.history["loss"])
    plt.plot(model_info.history["val_loss"])
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'])
    plt.show()

def plot_accuracies(hist):
    plt.plot(model_info.history['accuracy'])
    plt.plot(model_info.history['val_accuracy'])
    plt.title('Model Accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'])
    plt.show()

# Plot your losses and accuracies
plot_losses(model_info)
plot_accuracies(model_info)

### Part D) Your Best Classifier (10 points)

Add on your own last layers to the pretrained model and train it on the training data (in the previous parts you could have only one flatten layer and one dense layer to do the classification). You can increase (or decrease) the number of nodes per layer, increase (or decrease) the number of layers, and add dropout if your model is overfitting, change the hyperparameters, change your optimizer, etc. Try to get the validation accuracy higher than what the previous transfer learning model was able to obtain, and try to minimize the amount of overfitting.

Plot the classification accuracy for each epoch. Report the best test accuracy your model was able to achieve.

In [None]:
x = base_model.output
x = Flatten()(x)
x = Dense(units=200,activation='relu')(x)
x = Dropout(rate=0.75)(x)
x = Dense(units=50, activation='relu')(x)
x = Dropout(rate=0.25)(x)
output = Dense(units=26,activation='softmax')(x) 

# creating the new model
new_model = Model(inputs = base_model.input, outputs = output) 

opt = Adam()
new_model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics = ['accuracy'])
# printing out a summary of the model

new_model_info = model.fit(train_ds,validation_data=val_ds,epochs = 50)

plot_losses(new_model_info)
plot_accuracies(new_model_info)



I flattened the data again and add 2 more hidden layers with dropout and it increased my accuracy. My highest test accuracy is .9950 which is very high in this case.