# Gesture Recognition Assignment

### Objective:

Experiment with models, trainable parameters, hyperparameters to derive appropriate model to recognize a model with minimized loss and higher accuracy.

---
#### Steps
---

1. Preprocess data
2. Standardize videos
3. Use generator function to generate training and validation data
4. Model selection [experiment to derive appropriate model]
5. a. CNN + RNN LSTM
6. b. 3D Convolution n/w
7. Derive Loss and accuracy of different models
8. Condense training time
9. Derive appropriate trainable parameters - i/p weights and biases, o/p weights and biases
10. Derive appropriate hyperparameters - no. of epochs, batch_size, learning rate, optimizer

In [1]:
import numpy as np
import os
from skimage.transform import resize
from imageio import imread
import datetime
import os

We set the random seed so that the results don't vary drastically.

In [2]:
np.random.seed(30)
import random as rn
rn.seed(30)
from keras import backend as K
import tensorflow as tf
tf.random.set_seed(30)

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [3]:
train_doc = np.random.permutation(open('./datasets/Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('./datasets/Project_data/val.csv').readlines())
batch_size = 5 #experiment with the batch size
y, z = (128, 128)

In [4]:
# Resize images witth padding, so we do not lose the aspect ratio
def preprocess_image_with_padding(image, target_size=(360, 360)):
    # Calculate the scaling factor to resize image while maintaining aspect ratio
    old_size = image.shape[:2]  # Original size (height, width)
    ratio = min(target_size[0] / old_size[0], target_size[1] / old_size[1])
    new_size = (int(old_size[0] * ratio), int(old_size[1] * ratio))
    
    # Resize the image with the calculated new size
    image_resized = resize(image, new_size, anti_aliasing=True)
    
    # Create a new image array with the target size, filled with black (0) padding
    padded_image = np.zeros((target_size[0], target_size[1], 3))
    
    # Place the resized image in the center of the padded image
    pad_top = (target_size[0] - new_size[0]) // 2
    pad_left = (target_size[1] - new_size[1]) // 2
    padded_image[pad_top:pad_top+new_size[0], pad_left:pad_left+new_size[1], :] = image_resized
    
    return padded_image

## Generator
This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with `img_idx`, `y`,`z` and normalization such that you get high accuracy.

In [5]:
def generator(source_path, folder_list, batch_size):
    print('Source path = ', source_path, '; batch size =', batch_size)
    
    # Create a list of specific image indices you want to use for each video.
    img_idx = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]  # Example: Using 10 frames from each video, adjust as needed
    
    while True:
        # Shuffle the list of folders randomly to ensure that the training is not biased by the order of folders.
        t = np.random.permutation(folder_list)
        
        # Calculate the number of batches per epoch.
        num_batches = len(folder_list) // batch_size  # Ensuring that we have full batches only
        
        for batch in range(num_batches):  # We iterate over the number of batches
            # Initialize batch data array with zeros
            # batch_data has shape (batch_size, #frames, height, width, #channels)
            batch_data = np.zeros((batch_size, len(img_idx), y, z, 3))  # Replace `y` and `z` with image height and width
            batch_labels = np.zeros((batch_size, 5))  # Assuming 5 classes for one-hot encoded labels
            
            for folder in range(batch_size):  # Iterate over each item in the batch
                # Get the list of images in the folder corresponding to the shuffled list
                imgs = os.listdir(source_path + '/' + t[folder + (batch * batch_size)].split(';')[0])
                
                for idx, item in enumerate(img_idx):  # Iterate over the selected frames for the video
                    # Read each specified image for the current folder
                    image = imread(source_path + '/' + t[folder + (batch * batch_size)].strip().split(';')[0] + '/' + imgs[item]).astype(np.float32)
                    
                    # Crop the images and resize them to (y, z) so that all images have the same shape
                    image = preprocess_image_with_padding(image, (y, z))
                    # Normalize and store the image into the batch_data array
                    batch_data[folder, idx, :, :, 0] = (image[:, :, 0] / 255.0)  # Normalizing the R channel
                    batch_data[folder, idx, :, :, 1] = (image[:, :, 1] / 255.0)  # Normalizing the G channel
                    batch_data[folder, idx, :, :, 2] = (image[:, :, 2] / 255.0)  # Normalizing the B channel
                
                # One-hot encode the label for the current folder
                batch_labels[folder, int(t[folder + (batch * batch_size)].strip().split(';')[2])] = 1
            
            # Yield a batch of data and labels. Yielding allows the function to generate batches as needed during training.
            yield batch_data, batch_labels

        # Handle any remaining data that doesn't fit into a complete batch
        remaining = len(folder_list) % batch_size
        if remaining > 0:
            # Prepare batch data and labels for the remaining samples
            batch_data = np.zeros((remaining, len(img_idx), y, z, 3))
            batch_labels = np.zeros((remaining, 5))
            
            for folder in range(remaining):
                imgs = os.listdir(source_path + '/' + t[folder + (num_batches * batch_size)].split(';')[0])
                
                for idx, item in enumerate(img_idx):
                    image = imread(source_path + '/' + t[folder + (num_batches * batch_size)].strip().split(';')[0] + '/' + imgs[item]).astype(np.float32)
                    
                    # Crop the images and resize them to (y, z) so that all images have the same shape
                    image = preprocess_image_with_padding(image, (y, z))

                    # Normalize and store the image into the batch_data array
                    batch_data[folder, idx, :, :, 0] = (image[:, :, 0] / 255.0)
                    batch_data[folder, idx, :, :, 1] = (image[:, :, 1] / 255.0)
                    batch_data[folder, idx, :, :, 2] = (image[:, :, 2] / 255.0)
                
                batch_labels[folder, int(t[folder + (num_batches * batch_size)].strip().split(';')[2])] = 1
            
            yield batch_data, batch_labels

Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

In [6]:
curr_dt_time = datetime.datetime.now()
train_path = './datasets/Project_data/train'
val_path = './datasets/Project_data/val'
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 30 # choose the number of epochs
print ('# epochs =', num_epochs)

# training sequences = 663
# validation sequences = 100
# epochs = 30


## Model
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. You would want to use `TimeDistributed` while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

In [7]:
from keras.models import Sequential, Model
from keras.layers import Dense, ConvLSTM2D, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation, GlobalAveragePooling3D, GlobalAveragePooling2D
from keras.layers import Conv3D, MaxPooling3D
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers


In [8]:
# Model parameters
input_shape = (10, 128, 128, 3)  # Example: (frames, height, width, channels)
num_classes = 5

model = Sequential()

# First 3D Convolutional layer
model.add(ConvLSTM2D(16, kernel_size=(3, 3), activation='relu', input_shape=input_shape, padding='same', return_sequences=True))
# model.add(MaxPooling3D(pool_size=(2, 2, 2), padding='same'))
model.add(BatchNormalization())

# Second 3D Convolutional layer
model.add(ConvLSTM2D(32, kernel_size=(3, 3), activation='relu', padding='same', return_sequences=True))
# model.add(MaxPooling3D(pool_size=(2, 2, 2), padding='same'))
model.add(BatchNormalization())

# Third 3D Convolutional layer 
model.add(ConvLSTM2D(64, kernel_size=(3, 3), activation='relu', padding='same', return_sequences=False))
# model.add(MaxPooling3D(pool_size=(2, 2, 2), padding='same'))
model.add(BatchNormalization())

# Global Average Pooling instead of Flatten to reduce the parameter count
model.add(GlobalAveragePooling2D())

# Fully connected layer with dropout for regularization
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.3))

# Output layer with softmax for multi-class classification
model.add(Dense(num_classes, activation='softmax'))

2024-11-04 06:14:48.455318: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-11-04 06:14:48.455386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14800 MB memory:  -> device: 0, name: Quadro RTX 5000, pci bus id: 0000:40:00.0, compute capability: 7.5


Now that you have written the model, the next step is to `compile` the model. When you print the `summary` of the model, you'll see the total number of parameters you have to train.

In [9]:
optimiser = tf.keras.optimizers.Adam() #write your optimizer
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print (model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv_lstm2d (ConvLSTM2D)    (None, 10, 128, 128, 16)  11008     
                                                                 
 batch_normalization (BatchN  (None, 10, 128, 128, 16)  64       
 ormalization)                                                   
                                                                 
 conv_lstm2d_1 (ConvLSTM2D)  (None, 10, 128, 128, 32)  55424     
                                                                 
 batch_normalization_1 (Batc  (None, 10, 128, 128, 32)  128      
 hNormalization)                                                 
                                                                 
 conv_lstm2d_2 (ConvLSTM2D)  (None, 128, 128, 64)      221440    
                                                                 
 batch_normalization_2 (Batc  (None, 128, 128, 64)     2

Let us create the `train_generator` and the `val_generator` which will be used in `.fit_generator`.

In [10]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

In [11]:
model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.keras'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto')

LR = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_loss",
    verbose=1,
    mode="auto"
) # write the REducelronplateau code here
callbacks_list = [checkpoint, LR]

The `steps_per_epoch` and `validation_steps` are used by `fit_generator` to decide the number of next() calls it need to make.

In [12]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, you'll be able to save the model at the end of each epoch.

In [13]:
model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks_list, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, initial_epoch=0)

Source path =  ./datasets/Project_data/train ; batch size = 5
Epoch 1/30


2024-11-04 06:14:56.415269: I tensorflow/stream_executor/cuda/cuda_dnn.cc:377] Loaded cuDNN version 8302



Epoch 00001: saving model to model_init_2024-11-0406_14_47.607260/model-00001-1.38543-0.38612-1.88902-0.16000.keras
Epoch 2/30
Epoch 00002: saving model to model_init_2024-11-0406_14_47.607260/model-00002-1.30955-0.39668-2.11902-0.14000.keras
Epoch 3/30
Epoch 00003: saving model to model_init_2024-11-0406_14_47.607260/model-00003-1.18041-0.49472-1.66634-0.24000.keras
Epoch 4/30
Epoch 00004: saving model to model_init_2024-11-0406_14_47.607260/model-00004-1.12476-0.51584-1.61049-0.25000.keras
Epoch 5/30
Epoch 00005: saving model to model_init_2024-11-0406_14_47.607260/model-00005-1.10146-0.52338-2.20868-0.28000.keras
Epoch 6/30
Epoch 00006: saving model to model_init_2024-11-0406_14_47.607260/model-00006-1.06049-0.56259-1.48951-0.32000.keras
Epoch 7/30
Epoch 00007: saving model to model_init_2024-11-0406_14_47.607260/model-00007-1.00876-0.56561-1.73790-0.50000.keras
Epoch 8/30
Epoch 00008: saving model to model_init_2024-11-0406_14_47.607260/model-00008-0.91073-0.60030-2.49404-0.26000.

<keras.callbacks.History at 0x7f25108b4340>