## <font color='green'> Deep Learning Course Project - Gesture Recognition </font>

### Problem Statement
Imagine you are working as a data scientist at a home electronics company which manufactures state of the art smart televisions. You want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote.

The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command:
 
| Gesture | Corresponding Action |
| --- | --- | 
| Thumbs Up | Increase the volume. |
| Thumbs Down | Decrease the volume. |
| Left Swipe | 'Jump' backwards 10 seconds. |
| Right Swipe | 'Jump' forward 10 seconds. |
| Stop | Pause the movie. |

Each video is a sequence of 30 frames (or images).

### Objectives:
1. **Generator**:  The generator should be able to take a batch of videos as input without any error. Steps like cropping, resizing and normalization should be performed successfully.

2. **Model**: Develop a model that is able to train without any errors which will be judged on the total number of parameters (as the inference(prediction) time should be less) and the accuracy achieved. As suggested by Snehansu, start training on a small amount of data and then proceed further.

3. **Write up**: This should contain the detailed procedure followed in choosing the final model. The write up should start with the reason for choosing the base model, then highlight the reasons and metrics taken into consideration to modify and experiment to arrive at the final model. 


### Developed by:
##### 1. Niranjana Mangaleswaran - Group facilitator
##### 2. Nihit Patel

In [64]:
# Importing the necessary libraries

import numpy as np
import os
from imageio import imread
from skimage.transform import resize as imresize
import datetime
import os
import warnings
warnings.filterwarnings("ignore")

We set the random seed so that the results can be reproduced.

In [65]:
np.random.seed(30)
import random as rn
rn.seed(30)
from keras import backend as K
import tensorflow as tf
tf.random.set_seed(30)

In [66]:
import cv2
import matplotlib.pyplot as plt
% matplotlib inline

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [67]:
# importing some other libraries which will be needed for model building.

from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv3D, MaxPooling3D, Conv2D, MaxPooling2D
from keras.layers.recurrent import LSTM
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from keras import optimizers
from keras.layers import Dropout

In [69]:
from google.colab import drive
drive.mount('/content/gdrive')

root_path = '/content/gdrive/My Drive/Upgrad/Gesture Recognition/Project_data' 
#root_path = '..\Gesture-Recognition-Case-study-IIITB-Assignment--master\Project_data' <---- Local path

#!unzip '/content/gdrive/My Drive/Upgrad/Project_data.zip' -d '/content/gdrive/My Drive/Upgrad/Gesture Recognition'


Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


## Generator
This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with `img_idx`, `y`,`z` and normalization such that you get high accuracy.

In [70]:
def generator(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)
    img_idx = np.round(np.linspace(0,total_frames-1,frames_to_sample)).astype(int)

    while True:
        t = np.random.permutation(folder_list)
        num_batches = batch_size
        for batch in range(num_batches): # we iterate over the number of batches
            batch_data = np.zeros((batch_size,len(img_idx),image_height,image_width,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    # image_resized=imresize(image,(image_height,image_width,3))
                    
                    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
                    x0, y0 = np.argwhere(gray > 0).min(axis=0)
                    x1, y1 = np.argwhere(gray > 0).max(axis=0) 
                    cropped=image[x0:x1,y0:y1,:]
                    image_resized=imresize(cropped,(self.image_height,self.image_width,3))
                    #crop the images and resize them. Note that the images are of 2 different shape 
                    #and the conv3D will throw error if the inputs in a batch have different shapes
                    
                    batch_data[folder,idx,:,:,0] = (image_resized[:,:,0])/255.0
                    batch_data[folder,idx,:,:,1] = (image_resized[:,:,1])/255.0
                    batch_data[folder,idx,:,:,2] = (image_resized[:,:,2])/255.0
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do

        
        # write the code for the remaining data points which are left after full batches
        batch_size = len(folder_list) - (batch_size*num_batches)
        batch_data = np.zeros((batch_size,len(img_idx),image_height,image_width,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
        batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
        for folder in range(batch_size): # iterate over the batch_size
            imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
            for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                
                # image_resized=imresize(image,(image_height,image_width,3))
                
                gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
                x0, y0 = np.argwhere(gray > 0).min(axis=0)
                x1, y1 = np.argwhere(gray > 0).max(axis=0) 
                cropped=image[x0:x1,y0:y1,:]
                image_resized=imresize(cropped,(self.image_height,self.image_width,3))
                #crop the images and resize them. Note that the images are of 2 different shape 
                #and the conv3D will throw error if the inputs in a batch have different shapes
                
                batch_data[folder,idx,:,:,0] = (image_resized[:,:,0])/255.0
                batch_data[folder,idx,:,:,1] = (image_resized[:,:,1])/255.0
                batch_data[folder,idx,:,:,2] = (image_resized[:,:,2])/255.0
                
            batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
        yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do


## Model
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. You would want to use `TimeDistributed` while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

In [71]:


train_doc = np.random.permutation(open(root_path + '/train.csv').readlines())
val_doc = np.random.permutation(open(root_path + '/val.csv').readlines())


curr_dt_time = datetime.datetime.now()
train_path = root_path + '/train'
val_path = root_path + '/val'
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_classes = 5
total_frames = 30

# training sequences = 663
# validation sequences = 100


## Sample Model

In [72]:

image_height = 160
image_width = 160
frames_to_sample = 30
batch_size = 40
num_epochs = 1


model = Sequential()
model.add(Conv3D(16, (3, 3, 3), padding='same',
         input_shape=(frames_to_sample,image_height,image_width,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(32, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(64, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(128, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Flatten())
model.add(Dense(128,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Dense(64,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))


model.add(Dense(num_classes,activation='softmax'))


In [73]:
optimiser = tf.keras.optimizers.Adam(lr=0.0002)
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_28 (Conv3D)           (None, 30, 160, 160, 16)  1312      
_________________________________________________________________
activation_28 (Activation)   (None, 30, 160, 160, 16)  0         
_________________________________________________________________
batch_normalization_47 (Batc (None, 30, 160, 160, 16)  64        
_________________________________________________________________
max_pooling3d_28 (MaxPooling (None, 15, 80, 80, 16)    0         
_________________________________________________________________
conv3d_29 (Conv3D)           (None, 15, 80, 80, 32)    4128      
_________________________________________________________________
activation_29 (Activation)   (None, 15, 80, 80, 32)    0         
_________________________________________________________________
batch_normalization_48 (Batc (None, 15, 80, 80, 32)   

In [74]:
model_name = root_path + '/model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', save_freq=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.2, verbose=1, patience=4)
callbacks_list = [checkpoint, LR]

In [75]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

In [76]:
# train_generator = generator(train_path, train_doc, batch_size)
# val_generator = generator(val_path, val_doc, batch_size)


# model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
#                     callbacks=callbacks_list, validation_data=val_generator, 
#                     validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

### We had hit the limit on memory resources with image resolution of 160x160 with 30 frames and batch_size of 40


Let us see how training time is affected by change in image resolution, number of images in sequence and batch size

- After some experiments, we noted that **"image resolution"** and **number of frames** in sequence have more impact on training time than **batch_size**
- We can consider the Batch Size around 20-40
- We will used number of frames less than 20
- We will change the resoulution 160 x 160, 120 x 120 according to the model performance


## Model 1
### Base Model - Batch Size = 40 and No. of Epochs = 15

In [77]:

image_height = 160
image_width = 160
frames_to_sample = 20
batch_size = 40
num_epochs = 15

model = Sequential()
model.add(Conv3D(16, (3,3,3), padding='same',
          input_shape=(frames_to_sample,image_height,image_width,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(32, (3,3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(64, (3,3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(128, (3,3,3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Flatten())
model.add(Dense(64,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Dense(64,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))


model.add(Dense(num_classes,activation='softmax'))


In [78]:
optimiser = tf.keras.optimizers.Adam(lr=0.0002)
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_32 (Conv3D)           (None, 20, 160, 160, 16)  1312      
_________________________________________________________________
activation_32 (Activation)   (None, 20, 160, 160, 16)  0         
_________________________________________________________________
batch_normalization_53 (Batc (None, 20, 160, 160, 16)  64        
_________________________________________________________________
max_pooling3d_32 (MaxPooling (None, 10, 80, 80, 16)    0         
_________________________________________________________________
conv3d_33 (Conv3D)           (None, 10, 80, 80, 32)    13856     
_________________________________________________________________
activation_33 (Activation)   (None, 10, 80, 80, 32)    0         
_________________________________________________________________
batch_normalization_54 (Batc (None, 10, 80, 80, 32)   

In [79]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)



print("Total Params:", model.count_params())
# model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
#                     callbacks=callbacks_list, validation_data=val_generator, 
#                     validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Total Params: 1117061


### <font color='red'>Note : I have copied the training output in Text Cell for convenience and for not training the model everytime I run the notebook. This saved a lot of my time in experimenting new things and refering to previous model performance. </font>


```
Total Params: 1117061
Epoch 1/15
17/17 [==============================] - 168s 10s/step - loss: 1.7374 - categorical_accuracy: 0.3746 - val_loss: 6.3520 - val_categorical_accuracy: 0.3200

Epoch 2/15
17/17 [==============================] - 55s 3s/step - loss: 1.2724 - categorical_accuracy: 0.5247 - val_loss: 3.1187 - val_categorical_accuracy: 0.2900

Epoch 3/15
17/17 [==============================] - 59s 3s/step - loss: 0.9921 - categorical_accuracy: 0.6158 - val_loss: 1.2044 - val_categorical_accuracy: 0.5100

Epoch 4/15
17/17 [==============================] - 58s 3s/step - loss: 0.7925 - categorical_accuracy: 0.7376 - val_loss: 0.8441 - val_categorical_accuracy: 0.6700

Epoch 5/15
17/17 [==============================] - 58s 3s/step - loss: 0.5836 - categorical_accuracy: 0.7768 - val_loss: 1.1775 - val_categorical_accuracy: 0.6300

Epoch 6/15
17/17 [==============================] - 59s 3s/step - loss: 0.4384 - categorical_accuracy: 0.8491 - val_loss: 0.9481 - val_categorical_accuracy: 0.6800

Epoch 7/15
17/17 [==============================] - 58s 3s/step - loss: 0.3369 - categorical_accuracy: 0.8831 - val_loss: 1.0296 - val_categorical_accuracy: 0.6400

Epoch 8/15
17/17 [==============================] - 59s 3s/step - loss: 0.2387 - categorical_accuracy: 0.9258 - val_loss: 0.9406 - val_categorical_accuracy: 0.6600

Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.00020000000949949026.
Epoch 9/15
17/17 [==============================] - 59s 3s/step - loss: 0.1887 - categorical_accuracy: 0.9424 - val_loss: 0.9321 - val_categorical_accuracy: 0.6900

Epoch 10/15
17/17 [==============================] - 58s 3s/step - loss: 0.1446 - categorical_accuracy: 0.9720 - val_loss: 0.8745 - val_categorical_accuracy: 0.7000

Epoch 11/15
17/17 [==============================] - 58s 3s/step - loss: 0.1322 - categorical_accuracy: 0.9720 - val_loss: 0.6898 - val_categorical_accuracy: 0.7800

Epoch 12/15
17/17 [==============================] - 59s 3s/step - loss: 0.1189 - categorical_accuracy: 0.9823 - val_loss: 0.6373 - val_categorical_accuracy: 0.8100

Epoch 13/15
17/17 [==============================] - 58s 3s/step - loss: 0.1126 - categorical_accuracy: 0.9823 - val_loss: 0.6135 - val_categorical_accuracy: 0.8000

Epoch 14/15
17/17 [==============================] - 59s 3s/step - loss: 0.0967 - categorical_accuracy: 0.9867 - val_loss: 0.5723 - val_categorical_accuracy: 0.7900

Epoch 15/15
17/17 [==============================] - 59s 3s/step - loss: 0.0870 - categorical_accuracy: 0.9926 - val_loss: 0.5564 - val_categorical_accuracy: 0.8200
```



##### Model is Overfitting.

## Model 2

- Adding dropout layers on Dense Layer
- Reduce filter size to (2,2,2) 
- image res to 100 x  100
- Batch Size = 30 
- No. of Epochs = 20
- Also, tried to make model with mininum parameters

In [80]:

image_height = 100
image_width = 100
frames_to_sample = 16
batch_size = 20
num_epochs = 20

model = Sequential()
model.add(Conv3D(16, (3, 3, 3), padding='same',
        input_shape=(frames_to_sample,image_height,image_width,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(32, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(64, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(128, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Flatten())
model.add(Dense(128,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Dense(128,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Dense(num_classes,activation='softmax'))


In [81]:
optimiser = tf.keras.optimizers.Adam(lr=0.0002)
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_36 (Conv3D)           (None, 16, 100, 100, 16)  1312      
_________________________________________________________________
activation_36 (Activation)   (None, 16, 100, 100, 16)  0         
_________________________________________________________________
batch_normalization_59 (Batc (None, 16, 100, 100, 16)  64        
_________________________________________________________________
max_pooling3d_36 (MaxPooling (None, 8, 50, 50, 16)     0         
_________________________________________________________________
conv3d_37 (Conv3D)           (None, 8, 50, 50, 32)     4128      
_________________________________________________________________
activation_37 (Activation)   (None, 8, 50, 50, 32)     0         
_________________________________________________________________
batch_normalization_60 (Batc (None, 8, 50, 50, 32)   

In [82]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

print("Total Params:", model.count_params())
# model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
#                     callbacks=callbacks_list, validation_data=val_generator, 
#                     validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Total Params: 696645


```
Total Params: 696645
Epoch 1/20
34/34 [==============================] - 88s 3s/step - loss: 1.8166 - categorical_accuracy: 0.3561 - val_loss: 1.2759 - val_categorical_accuracy: 0.5000

Epoch 2/20
34/34 [==============================] - 78s 2s/step - loss: 1.2129 - categorical_accuracy: 0.5342 - val_loss: 0.8633 - val_categorical_accuracy: 0.6600

Epoch 3/20
34/34 [==============================] - 83s 2s/step - loss: 0.9788 - categorical_accuracy: 0.6152 - val_loss: 1.1551 - val_categorical_accuracy: 0.5800

Epoch 4/20
34/34 [==============================] - 83s 2s/step - loss: 0.8827 - categorical_accuracy: 0.6589 - val_loss: 0.9667 - val_categorical_accuracy: 0.6500

Epoch 5/20
34/34 [==============================] - 83s 2s/step - loss: 0.7812 - categorical_accuracy: 0.6961 - val_loss: 0.8326 - val_categorical_accuracy: 0.7300

Epoch 6/20
34/34 [==============================] - 83s 2s/step - loss: 0.7510 - categorical_accuracy: 0.7108 - val_loss: 0.9504 - val_categorical_accuracy: 0.6300

Epoch 7/20
34/34 [==============================] - 83s 2s/step - loss: 0.6567 - categorical_accuracy: 0.7546 - val_loss: 1.0235 - val_categorical_accuracy: 0.6400

Epoch 8/20
34/34 [==============================] - 82s 2s/step - loss: 0.5885 - categorical_accuracy: 0.7789 - val_loss: 1.3001 - val_categorical_accuracy: 0.5900

Epoch 9/20
34/34 [==============================] - 82s 2s/step - loss: 0.5872 - categorical_accuracy: 0.7862 - val_loss: 0.9069 - val_categorical_accuracy: 0.7100

Epoch 00009: ReduceLROnPlateau reducing learning rate to 3.9999998989515007e-05.
Epoch 10/20
34/34 [==============================] - 82s 2s/step - loss: 0.5001 - categorical_accuracy: 0.8278 - val_loss: 0.8944 - val_categorical_accuracy: 0.6900

Epoch 11/20
34/34 [==============================] - 83s 2s/step - loss: 0.4845 - categorical_accuracy: 0.8260 - val_loss: 0.8720 - val_categorical_accuracy: 0.7100

Epoch 12/20
34/34 [==============================] - 83s 2s/step - loss: 0.4514 - categorical_accuracy: 0.8484 - val_loss: 0.8168 - val_categorical_accuracy: 0.7400

Epoch 13/20
34/34 [==============================] - 82s 2s/step - loss: 0.4796 - categorical_accuracy: 0.8282 - val_loss: 0.8252 - val_categorical_accuracy: 0.7200

Epoch 14/20
34/34 [==============================] - 83s 2s/step - loss: 0.4373 - categorical_accuracy: 0.8311 - val_loss: 0.8192 - val_categorical_accuracy: 0.7100

Epoch 15/20
34/34 [==============================] - 82s 2s/step - loss: 0.4369 - categorical_accuracy: 0.8425 - val_loss: 0.8511 - val_categorical_accuracy: 0.7000

Epoch 16/20
34/34 [==============================] - 81s 2s/step - loss: 0.4420 - categorical_accuracy: 0.8480 - val_loss: 0.8697 - val_categorical_accuracy: 0.7000

Epoch 00016: ReduceLROnPlateau reducing learning rate to 7.999999797903002e-06.
Epoch 17/20
34/34 [==============================] - 82s 2s/step - loss: 0.4032 - categorical_accuracy: 0.8543 - val_loss: 0.8671 - val_categorical_accuracy: 0.6900

Epoch 18/20
34/34 [==============================] - 82s 2s/step - loss: 0.4000 - categorical_accuracy: 0.8628 - val_loss: 0.8660 - val_categorical_accuracy: 0.7000

Epoch 19/20
34/34 [==============================] - 81s 2s/step - loss: 0.3770 - categorical_accuracy: 0.8672 - val_loss: 0.8723 - val_categorical_accuracy: 0.6900

Epoch 20/20
34/34 [==============================] - 82s 2s/step - loss: 0.4180 - categorical_accuracy: 0.8455 - val_loss: 0.8609 - val_categorical_accuracy: 0.7000
```



###### For the above model, we get the best validation accuracy of 72%

## Model 3 - Reducing the number of parameters again

In [83]:
image_height = 120
image_width = 120
frames_to_sample = 16
batch_size = 20
num_epochs = 25


model = Sequential()
model.add(Conv3D(16, (3, 3, 3), padding='same',
         input_shape=(frames_to_sample,image_height,image_width,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(32, (3, 3, 3), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(64, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(128, (2, 2, 2), padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Flatten())
model.add(Dense(64,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Dense(64,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Dense(num_classes,activation='softmax'))


In [84]:
optimiser = tf.keras.optimizers.Adam(lr=0.0002)
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv3d_40 (Conv3D)           (None, 16, 120, 120, 16)  1312      
_________________________________________________________________
activation_40 (Activation)   (None, 16, 120, 120, 16)  0         
_________________________________________________________________
batch_normalization_65 (Batc (None, 16, 120, 120, 16)  64        
_________________________________________________________________
max_pooling3d_40 (MaxPooling (None, 8, 60, 60, 16)     0         
_________________________________________________________________
conv3d_41 (Conv3D)           (None, 8, 60, 60, 32)     13856     
_________________________________________________________________
activation_41 (Activation)   (None, 8, 60, 60, 32)     0         
_________________________________________________________________
batch_normalization_66 (Batc (None, 8, 60, 60, 32)   

In [85]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)


print("Total Params:", model.count_params())
# model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
#                     callbacks=callbacks_list, validation_data=val_generator, 
#                     validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Total Params: 504709


```
Total Params: 504709
Epoch 1/25
34/34 [==============================] - 94s 3s/step - loss: 1.8774 - categorical_accuracy: 0.3297 - val_loss: 1.5298 - val_categorical_accuracy: 0.4900

Epoch 2/25
34/34 [==============================] - 85s 2s/step - loss: 1.3580 - categorical_accuracy: 0.4669 - val_loss: 1.1558 - val_categorical_accuracy: 0.6400

Epoch 3/25
34/34 [==============================] - 88s 3s/step - loss: 1.0833 - categorical_accuracy: 0.5721 - val_loss: 1.1209 - val_categorical_accuracy: 0.6400

Epoch 4/25
34/34 [==============================] - 88s 3s/step - loss: 0.9708 - categorical_accuracy: 0.6361 - val_loss: 1.1975 - val_categorical_accuracy: 0.5600

Epoch 5/25
34/34 [==============================] - 88s 3s/step - loss: 0.8890 - categorical_accuracy: 0.6519 - val_loss: 0.9010 - val_categorical_accuracy: 0.6700

Epoch 6/25
34/34 [==============================] - 88s 3s/step - loss: 0.8125 - categorical_accuracy: 0.7060 - val_loss: 1.0565 - val_categorical_accuracy: 0.5700

Epoch 7/25
34/34 [==============================] - 87s 3s/step - loss: 0.7671 - categorical_accuracy: 0.7016 - val_loss: 1.0278 - val_categorical_accuracy: 0.5800

Epoch 8/25
34/34 [==============================] - 87s 3s/step - loss: 0.7371 - categorical_accuracy: 0.7196 - val_loss: 1.0084 - val_categorical_accuracy: 0.6100

Epoch 9/25
34/34 [==============================] - 87s 3s/step - loss: 0.7135 - categorical_accuracy: 0.7366 - val_loss: 0.9790 - val_categorical_accuracy: 0.6800

Epoch 00009: ReduceLROnPlateau reducing learning rate to 3.9999998989515007e-05.
Epoch 10/25
34/34 [==============================] - 87s 3s/step - loss: 0.6251 - categorical_accuracy: 0.7789 - val_loss: 0.9262 - val_categorical_accuracy: 0.6700

Epoch 11/25
34/34 [==============================] - 87s 3s/step - loss: 0.6021 - categorical_accuracy: 0.7682 - val_loss: 0.8882 - val_categorical_accuracy: 0.7000

Epoch 12/25
34/34 [==============================] - 87s 3s/step - loss: 0.6000 - categorical_accuracy: 0.7951 - val_loss: 0.9642 - val_categorical_accuracy: 0.7000

Epoch 00012: val_loss did not improve from 0.88825
Epoch 13/25
34/34 [==============================] - 87s 3s/step - loss: 0.5035 - categorical_accuracy: 0.8197 - val_loss: 0.8826 - val_categorical_accuracy: 0.7200

Epoch 14/25
34/34 [==============================] - 87s 3s/step - loss: 0.5557 - categorical_accuracy: 0.8135 - val_loss: 0.8654 - val_categorical_accuracy: 0.7200

Epoch 15/25
34/34 [==============================] - 86s 3s/step - loss: 0.5353 - categorical_accuracy: 0.8050 - val_loss: 0.8720 - val_categorical_accuracy: 0.6900

Epoch 16/25
34/34 [==============================] - 87s 3s/step - loss: 0.5205 - categorical_accuracy: 0.8201 - val_loss: 0.8500 - val_categorical_accuracy: 0.7000

Epoch 17/25
34/34 [==============================] - 87s 3s/step - loss: 0.4962 - categorical_accuracy: 0.8252 - val_loss: 0.8223 - val_categorical_accuracy: 0.7800

Epoch 18/25
34/34 [==============================] - 87s 3s/step - loss: 0.4900 - categorical_accuracy: 0.8234 - val_loss: 0.8437 - val_categorical_accuracy: 0.7600

Epoch 19/25
34/34 [==============================] - 87s 3s/step - loss: 0.4763 - categorical_accuracy: 0.8385 - val_loss: 0.8304 - val_categorical_accuracy: 0.7600

Epoch 20/25
34/34 [==============================] - 87s 3s/step - loss: 0.4633 - categorical_accuracy: 0.8392 - val_loss: 0.8398 - val_categorical_accuracy: 0.7400

Epoch 21/25
34/34 [==============================] - 86s 3s/step - loss: 0.4653 - categorical_accuracy: 0.8293 - val_loss: 0.8438 - val_categorical_accuracy: 0.7300

Epoch 00021: ReduceLROnPlateau reducing learning rate to 7.999999797903002e-06.
Epoch 22/25
34/34 [==============================] - 87s 3s/step - loss: 0.4392 - categorical_accuracy: 0.8462 - val_loss: 0.8471 - val_categorical_accuracy: 0.7300

Epoch 23/25
34/34 [==============================] - 87s 3s/step - loss: 0.4412 - categorical_accuracy: 0.8539 - val_loss: 0.8145 - val_categorical_accuracy: 0.7300

Epoch 24/25
34/34 [==============================] - 88s 3s/step - loss: 0.4527 - categorical_accuracy: 0.8458 - val_loss: 0.8000 - val_categorical_accuracy: 0.7400

Epoch 25/25
34/34 [==============================] - 88s 3s/step - loss: 0.5092 - categorical_accuracy: 0.8256 - val_loss: 0.7884 - val_categorical_accuracy: 0.7300

```



###### For the above model the best validation accuracy is 73%

## Model 4 - CNN- LSTM Model

In [86]:

image_height = 120
image_width = 120
frames_to_sample = 18
batch_size = 20
num_epochs = 20


model = Sequential()

model.add(TimeDistributed(Conv2D(16, (3, 3) , padding='same', activation='relu'),
                          input_shape=(frames_to_sample,image_height,image_width,3)))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2))))

model.add(TimeDistributed(Conv2D(32, (3, 3) , padding='same', activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2))))

model.add(TimeDistributed(Conv2D(64, (3, 3) , padding='same', activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2))))

model.add(TimeDistributed(Conv2D(128, (3, 3) , padding='same', activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2))))

model.add(TimeDistributed(Conv2D(256, (3, 3) , padding='same', activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2))))


model.add(TimeDistributed(Flatten()))


model.add(LSTM(128))
model.add(Dropout(0.25))

model.add(Dense(128,activation='relu'))
model.add(Dropout(0.25))

model.add(Dense(num_classes, activation='softmax'))

In [87]:
optimiser = tf.keras.optimizers.Adam(lr=0.0002)
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
time_distributed_16 (TimeDis (None, 18, 120, 120, 16)  448       
_________________________________________________________________
time_distributed_17 (TimeDis (None, 18, 120, 120, 16)  64        
_________________________________________________________________
time_distributed_18 (TimeDis (None, 18, 60, 60, 16)    0         
_________________________________________________________________
time_distributed_19 (TimeDis (None, 18, 60, 60, 32)    4640      
_________________________________________________________________
time_distributed_20 (TimeDis (None, 18, 60, 60, 32)    128       
_________________________________________________________________
time_distributed_21 (TimeDis (None, 18, 30, 30, 32)    0         
_________________________________________________________________
time_distributed_22 (TimeDis (None, 18, 30, 30, 64)  

In [88]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)


print("Total Params:", model.count_params())
# model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
#                     callbacks=callbacks_list, validation_data=val_generator, 
#                     validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Total Params: 1657445


```
Total Params: 1657445
Epoch 1/20
34/34 [==============================] - 170s 5s/step - loss: 1.4207 - categorical_accuracy: 0.3863 - val_loss: 1.2887 - val_categorical_accuracy: 0.4100

Epoch 2/20
34/34 [==============================] - 91s 3s/step - loss: 1.1351 - categorical_accuracy: 0.5416 - val_loss: 1.1954 - val_categorical_accuracy: 0.5100

Epoch 3/20
34/34 [==============================] - 98s 3s/step - loss: 1.0247 - categorical_accuracy: 0.5887 - val_loss: 1.5070 - val_categorical_accuracy: 0.3900

Epoch 4/20
34/34 [==============================] - 97s 3s/step - loss: 0.9791 - categorical_accuracy: 0.6100 - val_loss: 1.0382 - val_categorical_accuracy: 0.5900

Epoch 5/20
34/34 [==============================] - 98s 3s/step - loss: 0.8307 - categorical_accuracy: 0.6784 - val_loss: 1.0063 - val_categorical_accuracy: 0.5700

Epoch 6/20
34/34 [==============================] - 99s 3s/step - loss: 0.7376 - categorical_accuracy: 0.7086 - val_loss: 1.3649 - val_categorical_accuracy: 0.5500

Epoch 7/20
34/34 [==============================] - 98s 3s/step - loss: 0.7541 - categorical_accuracy: 0.6906 - val_loss: 1.0793 - val_categorical_accuracy: 0.5800

Epoch 8/20
34/34 [==============================] - 97s 3s/step - loss: 0.7483 - categorical_accuracy: 0.7009 - val_loss: 1.6142 - val_categorical_accuracy: 0.4100

Epoch 9/20
34/34 [==============================] - 97s 3s/step - loss: 0.5998 - categorical_accuracy: 0.7656 - val_loss: 2.3569 - val_categorical_accuracy: 0.3600

Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.00020000000949949026.
Epoch 10/20
34/34 [==============================] - 96s 3s/step - loss: 0.5339 - categorical_accuracy: 0.7954 - val_loss: 1.2725 - val_categorical_accuracy: 0.6000

Epoch 11/20
34/34 [==============================] - 97s 3s/step - loss: 0.4418 - categorical_accuracy: 0.8285 - val_loss: 1.0670 - val_categorical_accuracy: 0.6300

Epoch 12/20
34/34 [==============================] - 96s 3s/step - loss: 0.3509 - categorical_accuracy: 0.8620 - val_loss: 0.7843 - val_categorical_accuracy: 0.7000

Epoch 13/20
34/34 [==============================] - 97s 3s/step - loss: 0.3448 - categorical_accuracy: 0.8837 - val_loss: 0.6839 - val_categorical_accuracy: 0.7700

Epoch 14/20
34/34 [==============================] - 97s 3s/step - loss: 0.3480 - categorical_accuracy: 0.8771 - val_loss: 0.6381 - val_categorical_accuracy: 0.7800

Epoch 15/20
34/34 [==============================] - 96s 3s/step - loss: 0.3060 - categorical_accuracy: 0.8970 - val_loss: 0.6599 - val_categorical_accuracy: 0.7700

Epoch 16/20
34/34 [==============================] - 96s 3s/step - loss: 0.2478 - categorical_accuracy: 0.9110 - val_loss: 0.6088 - val_categorical_accuracy: 0.7500

Epoch 17/20
34/34 [==============================] - 96s 3s/step - loss: 0.2468 - categorical_accuracy: 0.9117 - val_loss: 0.5588 - val_categorical_accuracy: 0.7800

Epoch 18/20
34/34 [==============================] - 96s 3s/step - loss: 0.2203 - categorical_accuracy: 0.9191 - val_loss: 0.5375 - val_categorical_accuracy: 0.8100

Epoch 19/20
34/34 [==============================] - 95s 3s/step - loss: 0.2027 - categorical_accuracy: 0.9367 - val_loss: 0.5269 - val_categorical_accuracy: 0.8200

Epoch 20/20
34/34 [==============================] - 96s 3s/step - loss: 0.1924 - categorical_accuracy: 0.9367 - val_loss: 0.4570 - val_categorical_accuracy: 0.8500


```



##### In CNN - LSTM model we get the best validation accuracy of 85%


## Finally, we can go ahead with Model 4 : CNN +  LSTM

__Reason:__

__- Training Accuracy : 93%, Validation Accuracy : 85%__

__- Number of Parameters (1,657,445)__

__- Learning rate gradually decreacing after 16 Epoch__

Note : The weight for this model is saved in final_model.h5 file attached with the zip file

# Loading model and Testing

In [89]:
import tensorflow as tf
model = tf.keras.models.load_model('final_model.h5')



In [90]:
val_generator = generator(val_path, val_doc, batch_size)

batch_data, batch_labels=next(val_generator)

Source path =  /content/gdrive/My Drive/Upgrad/Gesture Recognition/Project_data/val ; batch size = 20


In [91]:
batch_labels

array([[1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0.],
       [1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0.],
       [1., 0., 0., 0., 0.]])

In [92]:
print(np.argmax(model.predict(batch_data[:,:,:,:,:]),axis=1))

[0 2 1 2 1 0 2 1 3 1 2 1 3 2 2 1 1 4 1 0]


### As we can see, out model is performing very well on validation data batch. SO now we can try our model on Test Set and check how our model would perform in real life scenario.