# Gesture Recognition
In this group project, you are going to build a 3D Conv model that will be able to predict the 5 gestures correctly. Please import the following libraries to get started.

In [1]:
import numpy as np
import os
from skimage.transform import resize
import imageio.v2 as imageio
import datetime
import os

We set the random seed so that the results don't vary drastically.

In [2]:
np.random.seed(30)
import random as rn
rn.seed(30)
from keras import backend as K
import tensorflow as tf
tf.random.set_seed(30)

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [3]:
train_doc = np.random.permutation(open('Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('Project_data/val.csv').readlines())
batch_size = 32

##### Running on local CPU.. hence batch size fixed at 32.
    - Sequence (total videos 663) and # of full Batch (663/batch size 32 = 20) 
    - # of Videos in final batch (left over videos after full batchs : 663 minus 640 = 23)

In [4]:
# Shuffle the order of Videos (not frames)

t = np.random.permutation(train_doc)
num_batches = int(len(t)/32)
final_batch = len(t)%batch_size
print(len(t))   #  Sequence - # of videos
print(num_batches) 
print(final_batch)

663
20
23


### Input Images (video resolution) of different size (360x360, 120x160), Resized to smaller resolution

## Generator
This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with some of the parts of the generator function such that you get high accuracy.

### Training on smaller subset of data

In [5]:
def generator_8(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)

# choose the required number of images from each video - more the images considered longer the training period
    img_idx = [1, 5, 10, 15, 20, 22, 25, 29 ]  

# generate data till the last epoch
    while True:
        t = np.random.permutation(folder_list)
        num_batches = int(len(t)/batch_size)
        for batch in range(num_batches):

# batch_data holds the number of images mentioned in batch size
            batch_data = np.zeros((batch_size,8,84,84,3)) # batch_size, 3D resolution, channel 

# batch label holds the class of the image corresponding to the image in batch_data
            batch_labels = np.zeros((batch_size,5))
            for folder in range(batch_size):
                
# List all Image file names in the current folder 
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0])
                #print(imgs)
                for idx,item in enumerate(img_idx):
                    image = imageio.imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)

# crop : only the 120x160 images will be procesesed by the below crop operation
                    if image.shape[0] != image.shape[1]:
                        image = image[0:120, 10:150]
                    else:
                        image = image

# resize : crop operation performed above will convert 120x160 to 120x140.
# With the images that are 120x140 fetch 120x120 pixels and resize them to 84x84

                    if image.shape[1] == 140:  
                        image = resize(image[:,10:130,:],(84,84)).astype(np.float32)
                    else:
                        image = resize(image,(84,84)).astype(np.float32)

# Normalize RGB channel data
                    batch_data[folder,idx,:,:,0] = image[:,:,0] - 104
                    batch_data[folder,idx,:,:,1] = image[:,:,1] - 117
                    batch_data[folder,idx,:,:,2] = image[:,:,2] - 123

# apply class label as binary OHT [0. 1. 0. 0. 0.]                   
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels

# Final batch with left over videos after processing all the FULL batches

        if (len(t)%batch_size) != 0:
            batch_data = np.zeros((len(t)%batch_size,8,84,84,3))
            batch_labels = np.zeros((len(t)%batch_size,5))
            for folder in range(len(t)%batch_size):
                imgs = os.listdir(source_path+'/'+ t[folder + (num_batches*batch_size)].split(';')[0])
                for idx,item in enumerate(img_idx):
                    image = imageio.imread(source_path+'/'+ t[folder + (num_batches*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)

                    if image.shape[0] != image.shape[1]:
                        image = image[0:120, 10:150]
                    else:
                        image = image
                        
                        
                    if image.shape[1] == 140:
                        image = resize(image[:,10:130,:],(84,84)).astype(np.float32)
                    else:
                        image = resize(image,(84,84)).astype(np.float32)

                    batch_data[folder,idx,:,:,0] = image[:,:,0] - 104
                    batch_data[folder,idx,:,:,1] = image[:,:,1] - 117
                    batch_data[folder,idx,:,:,2] = image[:,:,2] - 123

                batch_labels[folder, int(t[folder + (num_batches*batch_size)].strip().split(';')[2])] = 1

            yield batch_data, batch_labels

In [6]:
# Fetch date/time to create a folder to save h5 files which can later be loaded to test the model performance

curr_dt_time = datetime.datetime.now()

train_path = 'Project_data/train'
val_path = 'Project_data/val'
source_path=train_path

num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 10
print ('# epochs =', num_epochs)

# training sequences = 663
# validation sequences = 100
# epochs = 10


## Conv3D Model
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D`. Also remember that the last layer is the softmax. Remember that the network is designed in such a way that the model is able to fit in the memory of the webcam.

In [7]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import Conv3D, MaxPooling3D
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers

model = Sequential()

# Hidden Layer 1
model.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=(8,84,84,3)))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,1), strides=(2,2,1)))

# Hidden Layer 2
model.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Dropout(0.25))

# Hidden Layer 3
model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Dropout(0.25))

# Hidden Layer 4
model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2), padding='same'))


model.add(Flatten())
model.add(Dropout(0.5))

model.add(Dense(512, activation='elu'))
model.add(Dropout(0.5))

# Dense to the 5 gesture classes
model.add(Dense(5, activation='softmax'))

Now that you have written the model, the next step is to `compile` the model. When you print the `summary` of the model, you'll see the total number of parameters you have to train.

In [8]:
# Setting faster learning rate

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.7, nesterov=True)

#Compile Model

model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

print (model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d (Conv3D)             (None, 8, 84, 84, 64)     5248      
                                                                 
 batch_normalization (BatchN  (None, 8, 84, 84, 64)    256       
 ormalization)                                                   
                                                                 
 activation (Activation)     (None, 8, 84, 84, 64)     0         
                                                                 
 max_pooling3d (MaxPooling3D  (None, 4, 42, 84, 64)    0         
 )                                                               
                                                                 
 conv3d_1 (Conv3D)           (None, 4, 42, 84, 128)    221312    
                                                                 
 batch_normalization_1 (Batc  (None, 4, 42, 84, 128)   5

  super().__init__(name, **kwargs)


Let us create the `train_generator` and the `val_generator` which will be used in `.fit_generator`.

In [9]:
train_generator = generator_8(train_path, train_doc, batch_size)
val_generator = generator_8(val_path, val_doc, batch_size)

In [10]:
# Create folder to save H5 files generated for each epoch (save_best_only=False)

model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)

# Save H5 file with loss, acc, val loss, val acc metrics

filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

# check point : save H5 file after each epoch

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

# Reduce Learning Rate (overfitting) when model Platueaus 

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, mode='min', epsilon=0.001, cooldown=0, min_lr=0.0001)
callbacks_list = [checkpoint, LR]



In [11]:
# Training Epoch - # of steps. Increment the step value by one when there is residual videos after FULL batch

if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

# Validation Epoch - # of steps
    
if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, you'll be able to save the model at the end of each epoch.

In [12]:
model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, 
                    validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0
                   )

  model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1,


Source path =  Project_data/train ; batch size = 32
Epoch 1/10

Epoch 1: saving model to model_init_2023-04-1012_52_02.259878\model-00001-6.14393-0.31373-89.40958-0.23000.h5
Epoch 2/10
Epoch 2: saving model to model_init_2023-04-1012_52_02.259878\model-00002-1.75390-0.45249-15.91531-0.22000.h5
Epoch 3/10
Epoch 3: saving model to model_init_2023-04-1012_52_02.259878\model-00003-1.68761-0.44495-17.80415-0.21000.h5
Epoch 4/10
Epoch 4: saving model to model_init_2023-04-1012_52_02.259878\model-00004-1.34469-0.53092-5.56814-0.26000.h5
Epoch 5/10
Epoch 5: saving model to model_init_2023-04-1012_52_02.259878\model-00005-1.36958-0.51885-3.72721-0.32000.h5
Epoch 6/10
Epoch 6: saving model to model_init_2023-04-1012_52_02.259878\model-00006-1.12376-0.61991-2.34655-0.41000.h5
Epoch 7/10
Epoch 7: saving model to model_init_2023-04-1012_52_02.259878\model-00007-0.95341-0.65460-1.59084-0.49000.h5
Epoch 8/10
Epoch 8: saving model to model_init_2023-04-1012_52_02.259878\model-00008-0.99232-0.67119-1.0

<keras.callbacks.History at 0x1b3d439e640>

### Add more images to the training ( 8 to 18 )

In [13]:
def generator_18(source_path, folder_list, batch_size):
    print( 'Source path = ', source_path, '; batch size =', batch_size)

# choose the required number of images from each video - more the images considered longer the training period
    img_idx = [0,1,2,4,6,8,10,12,14,16,18,20,22,24,26,27,28,29] # [1, 5, 10, 15, 20, 22, 25, 29 ]  

# generate data till the last epoch
    while True:
        t = np.random.permutation(folder_list)
        num_batches = int(len(t)/batch_size)
        for batch in range(num_batches):

# batch_data holds the number of images mentioned in batch size
            batch_data = np.zeros((batch_size,18,84,84,3)) # batch_size, 3D resolution, channel 

# batch label holds the class of the image corresponding to the image in batch_data
            batch_labels = np.zeros((batch_size,5))
            for folder in range(batch_size):
                
# List all Image file names in the current folder 
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0])
                #print(imgs)
                for idx,item in enumerate(img_idx):
                    image = imageio.imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)

# crop : only the 120x160 images will be procesesed by the below crop operation
                    if image.shape[0] != image.shape[1]:
                        image = image[0:120, 10:150]
                    else:
                        image = image

# resize : crop operation performed above will convert 120x160 to 120x140.
# With the images that are 120x140 fetch 120x120 pixels and resize them to 84x84

                    if image.shape[1] == 140:  
                        image = resize(image[:,10:130,:],(84,84)).astype(np.float32)
                    else:
                        image = resize(image,(84,84)).astype(np.float32)

# Normalize RGB channel data
                    batch_data[folder,idx,:,:,0] = image[:,:,0] - 104
                    batch_data[folder,idx,:,:,1] = image[:,:,1] - 117
                    batch_data[folder,idx,:,:,2] = image[:,:,2] - 123

# apply class label as binary OHT [0. 1. 0. 0. 0.]                   
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
            yield batch_data, batch_labels

# Final batch with left over videos after processing all the FULL batches

        if (len(t)%batch_size) != 0:
            batch_data = np.zeros((len(t)%batch_size,18,84,84,3))
            batch_labels = np.zeros((len(t)%batch_size,5))
            for folder in range(len(t)%batch_size):
                imgs = os.listdir(source_path+'/'+ t[folder + (num_batches*batch_size)].split(';')[0])
                for idx,item in enumerate(img_idx):
                    image = imageio.imread(source_path+'/'+ t[folder + (num_batches*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)

                    if image.shape[0] != image.shape[1]:
                        image = image[0:120, 10:150]
                    else:
                        image = image
                        
                        
                    if image.shape[1] == 140:
                        image = resize(image[:,10:130,:],(84,84)).astype(np.float32)
                    else:
                        image = resize(image,(84,84)).astype(np.float32)

                    batch_data[folder,idx,:,:,0] = image[:,:,0] - 104
                    batch_data[folder,idx,:,:,1] = image[:,:,1] - 117
                    batch_data[folder,idx,:,:,2] = image[:,:,2] - 123

                batch_labels[folder, int(t[folder + (num_batches*batch_size)].strip().split(';')[2])] = 1

            yield batch_data, batch_labels

In [14]:
# Fetch date/time to create a folder to save h5 files which can later be loaded to test the model performance

curr_dt_time = datetime.datetime.now()

train_path = 'Project_data/train'
val_path = 'Project_data/val'
source_path=train_path

num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 10
print ('# epochs =', num_epochs)

# training sequences = 663
# validation sequences = 100
# epochs = 10


In [15]:
train_generator = generator_18(train_path, train_doc, batch_size)
val_generator = generator_18(val_path, val_doc, batch_size)

In [16]:
# Create folder to save H5 files generated for each epoch (save_best_only=False)

model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)

# Save H5 file with loss, acc, val loss, val acc metrics

filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

# check point : save H5 file after each epoch

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

# Reduce Learning Rate (overfitting) when model Platueaus 

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=2, verbose=1, mode='min', epsilon=0.001, cooldown=0, min_lr=0.0001)
callbacks_list = [checkpoint, LR]



In [17]:
model = Sequential()

# Hidden Layer 1
model.add(Conv3D(64, (3,3,3), strides=(1,1,1), padding='same', input_shape=(18,84,84,3)))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,1), strides=(2,2,1)))

# Hidden Layer 2
model.add(Conv3D(128, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Dropout(0.25))

# Hidden Layer 3
model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2)))

model.add(Dropout(0.25))

# Hidden Layer 4
model.add(Conv3D(256, (3,3,3), strides=(1,1,1), padding='same'))
model.add(BatchNormalization())
model.add(Activation('elu'))
model.add(MaxPooling3D(pool_size=(2,2,2), strides=(2,2,2), padding='same'))


model.add(Flatten())
model.add(Dropout(0.5))

model.add(Dense(512, activation='elu'))
model.add(Dropout(0.5))

# Dense to the 5 gesture classes
model.add(Dense(5, activation='softmax'))

In [18]:
# Setting faster learning rate

sgd = optimizers.SGD(lr=0.01, decay=1e-6, momentum=0.7, nesterov=True)

#Compile Model

model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['categorical_accuracy'])

print (model.summary())

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv3d_4 (Conv3D)           (None, 18, 84, 84, 64)    5248      
                                                                 
 batch_normalization_4 (Batc  (None, 18, 84, 84, 64)   256       
 hNormalization)                                                 
                                                                 
 activation_4 (Activation)   (None, 18, 84, 84, 64)    0         
                                                                 
 max_pooling3d_4 (MaxPooling  (None, 9, 42, 84, 64)    0         
 3D)                                                             
                                                                 
 conv3d_5 (Conv3D)           (None, 9, 42, 84, 128)    221312    
                                                                 
 batch_normalization_5 (Batc  (None, 9, 42, 84, 128)  

In [19]:
model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, 
                    validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0
                   )

Source path =  Project_data/train ; batch size = 32


  model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1,


Epoch 1/10

Epoch 1: saving model to model_init_2023-04-1015_39_45.782359\model-00001-10.88193-0.27149-36.66798-0.18000.h5
Epoch 2/10
Epoch 2: saving model to model_init_2023-04-1015_39_45.782359\model-00002-2.33492-0.35445-37.17839-0.18000.h5
Epoch 3/10
Epoch 3: saving model to model_init_2023-04-1015_39_45.782359\model-00003-2.24298-0.37707-18.78424-0.24000.h5
Epoch 4/10
Epoch 4: saving model to model_init_2023-04-1015_39_45.782359\model-00004-1.88303-0.44193-7.50042-0.23000.h5
Epoch 5/10
Epoch 5: saving model to model_init_2023-04-1015_39_45.782359\model-00005-1.81198-0.46757-7.70416-0.26000.h5
Epoch 6/10
Epoch 6: saving model to model_init_2023-04-1015_39_45.782359\model-00006-1.46859-0.51735-2.40618-0.43000.h5
Epoch 7/10
Epoch 7: saving model to model_init_2023-04-1015_39_45.782359\model-00007-1.57269-0.51735-1.54447-0.57000.h5
Epoch 8/10
Epoch 8: saving model to model_init_2023-04-1015_39_45.782359\model-00008-1.34206-0.58069-1.05020-0.65000.h5
Epoch 9/10
Epoch 9: saving model to

<keras.callbacks.History at 0x1b382a05d30>