<a href="https://colab.research.google.com/github/nitvishnoi/Gesture-RecognitionProject/blob/main/Gesture_RecognitionProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gesture Recognition Project
In this group project, we are going to build a 3D Conv model that will be able to predict the 5 gestures correctly.

In [None]:
import numpy as np
import os
from imageio import imread
from skimage.transform import resize
import datetime

We set the random seed so that the results don't vary drastically.

In [None]:
np.random.seed(30)
import random as rn
rn.seed(30)
from keras import backend as K
import tensorflow as tf
tf.random.set_seed(30)

In [None]:
## If you are using the data by mounting the google drive, use the following :
from google.colab import drive
drive.mount('/content/gdrive')

##Ref:https://towardsdatascience.com/downloading-datasets-into-google-drive-via-google-colab-bcb1b30b0166

Mounted at /content/gdrive


In this block, we will read the folder names for training and validation. We also set the `batch_size` here. Note that we set the batch size in such a way that we are able to use the GPU in full capacity. We kept increasing the batch size until the machine throws an error.

In [None]:
train_doc = np.random.permutation(open('/content/gdrive/MyDrive/Upgrad/Project_data_gesture/train.csv').readlines())
val_doc = np.random.permutation(open('/content/gdrive/MyDrive/Upgrad/Project_data_gesture/val.csv').readlines())
batch_size = 20

## Generator
In the generator, we are going to preprocess the images as we have images of 2 different dimensions as well as create a batch of video frames. We have to experiment with `img_idx`, `y`,`z` and normalization such that we get high accuracy.

This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, we are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with img_idx, y,z and normalization such that you get high accuracy.

In [None]:
def generator(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)
    img_idx = np.round(np.linspace(0,29,16)).astype(int) #create a list of image numbers you want to use for a particular video
    while True:
        t = np.random.permutation(folder_list)
        num_batches = len(folder_list)//batch_size # calculate the number of batches
        for batch in range(num_batches): # we iterate over the number of batches
            batch_data = np.zeros((batch_size,len(img_idx),120,120,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((batch_size,5)) # batch_labels is the one hot representation of the output
            for folder in range(batch_size): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    #crop the images and resize them. Note that the images are of 2 different shape 
                    #and the conv3D will throw error if the inputs in a batch have different shapes
                    
                    image = resize(image,(120,120))

                    batch_data[folder,idx,:,:,0] = (image[:,:,0])/255
                    batch_data[folder,idx,:,:,1] = (image[:,:,1])/255
                    batch_data[folder,idx,:,:,2] = (image[:,:,2])/255
                    
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels #you yield the batch_data and the batch_labels, remember what does yield do
        rem_image = len(folder_list)%batch_size
        batch += 1
        if(rem_image!=0):
            batch_data = np.zeros((rem_image,len(img_idx),120,120,3)) # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
            batch_labels = np.zeros((rem_image,5)) # batch_labels is the one hot representation of the output
            for folder in range(rem_image): # iterate over the batch_size
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) # read all the images in the folder
                for idx,item in enumerate(img_idx): #  Iterate iver the frames/images of a folder to read them in
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    #crop the images and resize them. Note that the images are of 2 different shape 
                    #and the conv3D will throw error if the inputs in a batch have different shapes
                   
                    image = resize(image,(120,120))
                    batch_data[folder,idx,:,:,0] = (image[:,:,0])/255
                    batch_data[folder,idx,:,:,1] = (image[:,:,1])/255
                    batch_data[folder,idx,:,:,2] = (image[:,:,2])/255
                    
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels

Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

In [None]:
curr_dt_time = datetime.datetime.now()
train_path = '/content/gdrive/MyDrive/Upgrad/Project_data_gesture/train'
val_path = '/content/gdrive/MyDrive/Upgrad/Project_data_gesture/val'
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 20 # choose the number of epochs
print ('# epochs =', num_epochs)
num_classes = 5

# training sequences = 663
# validation sequences = 100
# epochs = 20


## Model
Here we make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. We want to use `TimeDistributed` while building a Conv2D + RNN model. Also that the last layer is the softmax. We design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

### Model 1 - CONV3D Model

Convolutional neural networks (CNNs) are the current state-of-the-art model architecture for image classification tasks. CNNs apply a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification. CNNs contains three components:

Convolutional layers, which apply a specified number of convolution filters to the image. For each subregion, the layer performs a set of mathematical operations to produce a single value in the output feature map. Convolutional layers then typically apply a ReLU activation function to the output to introduce nonlinearities into the model.

Pooling layers, which downsample the image data extracted by the convolutional layers to reduce the dimensionality of the feature map in order to decrease processing time. A commonly used pooling algorithm is max pooling, which extracts subregions of the feature map (e.g., 2x2-pixel tiles), keeps their maximum value, and discards all other values.

Dense (fully connected) layers, which perform classification on the features extracted by the convolutional layers and downsampled by the pooling layers. In a dense layer, every node in the layer is connected to every node in the preceding layer.

In [None]:
import keras as Keras
from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation, Dropout
from keras.layers.convolutional import Conv3D, MaxPooling3D, AveragePooling3D
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers
from keras.regularizers import l2
from keras.layers import LSTM, GRU, Bidirectional, SimpleRNN, RNN
from keras.layers.convolutional import Conv2D, MaxPooling2D, AveragePooling2D

filtersize=(3,3,3)
dropout=0.5
dense_neurons=256


model = Sequential()

model.add(Conv3D(16, filtersize, padding='same',input_shape=(16,120,120,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(Conv3D(16, filtersize, padding='same',input_shape=(16,120,120,3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(32, filtersize, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(Conv3D(32, filtersize, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(64, filtersize, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(Conv3D(64, filtersize, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(MaxPooling3D(pool_size=(2, 2, 2)))

model.add(Conv3D(128, filtersize, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(Conv3D(128, filtersize, padding='same'))
model.add(Activation('relu'))
model.add(BatchNormalization())
        
model.add(MaxPooling3D(pool_size=(2, 2, 2)))
        

model.add(Flatten())
model.add(Dense(dense_neurons,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(dropout))

model.add(Dense(dense_neurons,activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(dropout))

model.add(Dense(num_classes,activation='softmax'))

Now that we have written the model, the next step is to `compile` the model. When we print the `summary` of the model, we'll see the total number of parameters we have to train.

In [None]:
optimiser = tf.keras.optimizers.Adam(learning_rate=0.0002)#write your optimizer
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d (Conv3D)             (None, 16, 120, 120, 16)  1312      
                                                                 
 activation (Activation)     (None, 16, 120, 120, 16)  0         
                                                                 
 batch_normalization (BatchN  (None, 16, 120, 120, 16)  64       
 ormalization)                                                   
                                                                 
 conv3d_1 (Conv3D)           (None, 16, 120, 120, 16)  6928      
                                                                 
 activation_1 (Activation)   (None, 16, 120, 120, 16)  0         
                                                                 
 batch_normalization_1 (Batc  (None, 16, 120, 120, 16)  64       
 hNormalization)                                        

Let us create the `train_generator` and the `val_generator` which will be used in `.fit`and not `fit_generator` as it is deprecated and will be removed in a future version.

In [None]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

In [None]:
model_name = 'conv3d_model' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, cooldown=1, verbose=1)# write the REducelronplateau code here

callbacks_list = [checkpoint, LR]



The `steps_per_epoch` and `validation_steps` are used by `fit_generator` to decide the number of next() calls it need to make.

In [None]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, we'll be able to save the model at the end of each epoch.

In [None]:
model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Source path =  /content/gdrive/MyDrive/Upgrad/Project_data_gesture/train ; batch size = 20
Epoch 1/20

Epoch 1: saving model to conv3d_model_2022-07-0917_20_58.038923/model-00001-2.17538-0.32730-1.85292-0.21000.h5
Epoch 2/20
Epoch 2: saving model to conv3d_model_2022-07-0917_20_58.038923/model-00002-1.64774-0.41478-2.46013-0.19000.h5
Epoch 3/20
Epoch 3: saving model to conv3d_model_2022-07-0917_20_58.038923/model-00003-1.55353-0.46606-2.98240-0.25000.h5

Epoch 3: ReduceLROnPlateau reducing learning rate to 9.999999747378752e-05.
Epoch 4/20
Epoch 4: saving model to conv3d_model_2022-07-0917_20_58.038923/model-00004-1.38958-0.50226-3.74754-0.21000.h5
Epoch 5/20
Epoch 5: saving model to conv3d_model_2022-07-0917_20_58.038923/model-00005-1.26297-0.52941-4.45188-0.19000.h5

Epoch 5: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 6/20
Epoch 6: saving model to conv3d_model_2022-07-0917_20_58.038923/model-00006-1.20993-0.56561-4.78869-0.23000.h5
Epoch 7/20
Epoch 7: sa

<keras.callbacks.History at 0x7f96b3f7f950>

### Model 2 - CONV2D + LSTM Model

In [None]:
model_2 = Sequential()

model_2.add(TimeDistributed(Conv2D(16, (3, 3) , padding='same', activation='relu'),input_shape=(16, 120, 120, 3)))
model_2.add(TimeDistributed(BatchNormalization()))
model_2.add(TimeDistributed(MaxPooling2D((2, 2))))
        
model_2.add(TimeDistributed(Conv2D(32, (3, 3) , padding='same', activation='relu')))
model_2.add(TimeDistributed(BatchNormalization()))
model_2.add(TimeDistributed(MaxPooling2D((2, 2))))
        
model_2.add(TimeDistributed(Conv2D(64, (3, 3) , padding='same', activation='relu')))
model_2.add(TimeDistributed(BatchNormalization()))
model_2.add(TimeDistributed(MaxPooling2D((2, 2))))
model_2.add(TimeDistributed(Conv2D(128, (3, 3) , padding='same', activation='relu')))
model_2.add(TimeDistributed(BatchNormalization()))
model_2.add(TimeDistributed(MaxPooling2D((2, 2))))
        
model_2.add(TimeDistributed(Conv2D(256, (3, 3) , padding='same', activation='relu')))
model_2.add(TimeDistributed(BatchNormalization()))
model_2.add(TimeDistributed(MaxPooling2D((2, 2))))
model_2.add(TimeDistributed(Flatten()))
model_2.add(LSTM(256))
model_2.add(Dropout(0.25))
        
model_2.add(Dense(128,activation='relu'))
model_2.add(Dropout(0.25))
model_2.add(Dense(num_classes, activation='softmax'))

In [None]:
optimiser = tf.keras.optimizers.Adam(learning_rate=0.0002) #write your optimizer
model_2.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model_2.summary())

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed (TimeDistr  (None, 16, 120, 120, 16)  448      
 ibuted)                                                         
                                                                 
 time_distributed_1 (TimeDis  (None, 16, 120, 120, 16)  64       
 tributed)                                                       
                                                                 
 time_distributed_2 (TimeDis  (None, 16, 60, 60, 16)   0         
 tributed)                                                       
                                                                 
 time_distributed_3 (TimeDis  (None, 16, 60, 60, 32)   4640      
 tributed)                                                       
                                                                 
 time_distributed_4 (TimeDis  (None, 16, 60, 60, 32)  

Let us create the `train_generator` and the `val_generator` which will be used in `.fit`and not `fit_generator` as it is deprecated and will be removed in a future version.

In [None]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

In [None]:
model_name = 'conv2d+lstm_model' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'


checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, cooldown=1, verbose=1)
        
callbacks_list = [checkpoint, LR]



In [None]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, we'll be able to save the model at the end of each epoch.

In [None]:
model_2.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)

Source path =  /content/gdrive/MyDrive/Upgrad/Project_data_gesture/train ; batch size = 20
Epoch 1/20

Epoch 1: val_loss improved from inf to 1.70039, saving model to conv2d+lstm_model_2022-07-0917_20_58.038923/model-00001-1.44491-0.36048-1.70039-0.18000.h5
Epoch 2/20
Epoch 2: val_loss improved from 1.70039 to 1.65921, saving model to conv2d+lstm_model_2022-07-0917_20_58.038923/model-00002-0.99871-0.61086-1.65921-0.21000.h5
Epoch 3/20
Epoch 3: val_loss did not improve from 1.65921
Epoch 4/20
Epoch 4: val_loss did not improve from 1.65921

Epoch 4: ReduceLROnPlateau reducing learning rate to 9.999999747378752e-05.
Epoch 5/20
Epoch 5: val_loss did not improve from 1.65921
Epoch 6/20
Epoch 6: val_loss did not improve from 1.65921

Epoch 6: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05.
Epoch 7/20
Epoch 7: val_loss did not improve from 1.65921
Epoch 8/20
Epoch 8: val_loss did not improve from 1.65921

Epoch 8: ReduceLROnPlateau reducing learning rate to 2.49999993684468

<keras.callbacks.History at 0x7f96ab9e8550>

### Conclusion
Based on the results presented in the previous section, we can conclude that our algorithm successfully classifies different hand gestures images with enough confidence (>95%) based on a Deep Learning model.

The accuracy of our model is directly influenced by a few aspects of our problem.

As mentioned, another approach to this problem would be to use feature engineering, such as binary thresholding (check area of the hand), circle detection and others to detect unique characteristics on the images. However, with our CNN approach, we don't have to worry about any of these.

# Able to get Below Accuracies:

###CONV3D Model Accuracy: 

#### Categorical accuracy: 65.46 - 
#### Validation categorical_accuracy: 59.00

###CONV3D Model Accuracy: 

#### categorical_accuracy: 98.34
#### validation categorical_accuracy: 62.00
