**Project Objective: Developing Gesture Recognition for Smart TV Control**

As a data scientist at a leading home electronics company specializing in cutting-edge smart television manufacturing, the goal is to enhance the user experience by introducing a novel feature: gesture recognition. This feature aims to enable users to control their smart TVs through intuitive hand gestures, eliminating the need for a traditional remote control. The project's focus is on recognizing five distinct gestures – Thumbs Up, Thumbs Down, Left Swipe, Right Swipe, and Stop – each corresponding to specific actions such as volume adjustment, playback control, and pausing.

**Challenges and Objectives:**

**Generator Enhancement:**
The first challenge is to develop a reliable generator that can process batches of video sequences seamlessly. This involves tasks like cropping, resizing, and normalization of frames to ensure compatibility with the model's input requirements. The generator's performance is crucial for generating training data that feeds into the model effectively.

**Model Development:**
The primary objective is to design a model capable of training without errors while achieving a balance between parameter efficiency (for reduced inference time) and accuracy. To ensure gradual progress, the initial training will be conducted on a limited dataset before scaling up.

**Project Phases:**

**1. Generator Enhancement:**
The project kicks off by refining the data generator. This component should seamlessly process batches of video sequences while performing necessary pre-processing steps, such as cropping to relevant regions of interest, resizing for consistent input dimensions, and normalization for optimal convergence during training. This stage aims to provide the model with well-prepared data to learn from.

**2. Initial Model Training:**
Beginning with a solid data generator, the focus shifts to designing an initial model architecture. The choice of base model is crucial. A model with a balance between complexity and performance, such as a Convolutional Neural Network (CNN), is selected as the starting point. This decision is influenced by the CNN's established capabilities in image-related tasks and its compatibility with the sequential nature of video frames.

**3. Iterative Model Refinement:**
The model enhancement phase involves iterations and experiments to fine-tune the architecture for optimal performance. Metrics like accuracy and inference time guide these decisions. Starting with a smaller dataset allows for quicker iterations and aids in identifying the most promising directions for improvement.

**4. Choosing the Final Model:**
The iterative process aims to reach a final model that strikes the ideal balance between accuracy and parameter efficiency. The write-up will detail the reasoning behind selecting the base model and the successive modifications. Each modification will be grounded in clear reasoning, supported by relevant metrics like accuracy achieved, training convergence speed, and inference time.

**Write-up Structure:**

1. **Introduction:**
   - Briefly explain the project's goal: implementing gesture recognition for smart TV control.
   
2. **Generator Enhancement:**
   - Discuss the importance of a robust data generator for model training.
   - Detail the steps taken to preprocess videos into usable training data.

3. **Model Development:**
   - Justify the choice of a CNN as the base model due to its suitability for image-based tasks.
   
4. **Iterative Refinement:**
   - Describe the iterative process of tweaking the model to improve accuracy and efficiency.
   
5. **Final Model Selection:**
   - Explain the final model's architecture and modifications in detail.
   - Present metrics supporting the selection, including accuracy and inference time.

6. **Conclusion:**
   - Summarize the journey from the initial base model to the final optimized model.
   - Highlight the achieved balance between gesture recognition accuracy and efficient parameter usage.

By adhering to this comprehensive plan, the objective of developing a sophisticated gesture recognition system for smart TVs can be effectively achieved, enhancing the user experience and positioning the company at the forefront of innovation in the home electronics industry.


In [17]:
## Checking the GPU configuration

!nvidia-smi

Mon Aug 28 02:59:42 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    49W / 400W |  38979MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [18]:
# Importing Required Libraries
import numpy as np
import os
from imageio import imread
from skimage.transform import resize
import datetime
import os

We set the random seed so that the results don't vary drastically.

In [19]:
# Importing Required Libraries
np.random.seed(30)
import random as rn
rn.seed(30)
from tensorflow import keras
import tensorflow as tf
tf.random.set_seed(30)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv3D
from tensorflow.keras.layers import ConvLSTM2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.layers import GlobalAveragePooling3D
from tensorflow.keras.layers import GRU
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import MaxPooling3D, MaxPooling2D
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras import optimizers

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [20]:
# Importing Required Libraries
from google.colab import drive
drive.mount('/content/drive')
from google.colab import drive
drive.mount('/content/drive')
project_folder = '/content/drive/My Drive/Gesture Recognition/Project_data/Project_data'
train_doc = np.random.permutation(open('/content/drive/My Drive/Gesture Recognition/Project_data/Project_data/train.csv').readlines())
val_doc = np.random.permutation(open('/content/drive/My Drive/Gesture Recognition/Project_data/Project_data/val.csv').readlines())
batch_size = 39

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [21]:
#def image_specifications(nb_frames, x, y):
    #return [np.round(np.linspace(0, 29, nb_frames)).astype('int'), x, y]


#img_specs = image_specifications(20, 100, 100)

# Image Specifications Generator
# This code defines a function to generate specifications for images with specific parameters.


def generate_image_specs(num_frames, x_coord, y_coord):
    """
    Generate specifications for images with given parameters.

    :param num_frames: Number of frames.
    :param x_coord: X-coordinate.
    :param y_coord: Y-coordinate.
    :return: Image specifications as a list.
    """
    frame_indices = np.round(np.linspace(0, 29, num_frames)).astype('int')
    image_specs = [frame_indices, x_coord, y_coord]
    return image_specs

# Generate image specifications
img_specs = generate_image_specs(20, 100, 100)


## Data Generator
The data generator is a crucial component in the code. The provided structure outlines its overall design. Within the generator, you'll undertake image preprocessing, considering the presence of images with two different dimensions. Additionally, the generation of a video frame batch is essential. Experimentation with parameters such as `img_idx`, `y`, `z`, and normalization is required to achieve optimal accuracy.

The function get_batch_labels_and_data prepares batch data and labels by processing images, cropping, resizing, and normalizing them. The generator function uses this data generator within an infinite loop to yield batches of processed data and labels for training. The generator first shuffles the input data and then iterates over the shuffled data to provide batches of data. The remaining data points that don't form a full batch are also handled

In [22]:
def get_batch_labels_and_data(source_path, t, batch, batch_size, img_specs):
    x, y, z = len(img_specs[0]), img_specs[1], img_specs[2]
    img_idx = img_specs[0]  # create a list of image numbers you want to use for a particular video
    batch_data = np.zeros((batch_size, x, y, z, 3))  # x is the number of images you use for each video, (y,z) is the final size of the input images and 3 is the number of channels RGB
    batch_labels = np.zeros((batch_size, 5))  # batch_labels is the one hot representation of the output

    for folder in range(batch_size):  # iterate over the batch_size
        imgs = os.listdir('{0}/{1}'.format(source_path, t[folder + (batch * batch_size)].split(';')[0]))  # read all the images in the folder

        for idx, item in enumerate(img_idx):  # Iterate iver the frames/images of a folder to read them in
            image = imread('{0}/{1}/{2}'.format(source_path, t[folder + (batch * batch_size)].strip().split(';')[0], imgs[item])).astype(np.float32)

            #crop the images and resize them. Note that the images are of 2 different shape
            #and the conv3D will throw error if the inputs in a batch have different shapes

            if image.shape[0] != image.shape[1]:
                image = image[:120, 20:140]
            image = resize(image, (y, z))

            batch_data[folder, idx, :, :, 0] = image[:, :, 0] / 255.0  # normalise and feed in the image
            batch_data[folder, idx, :, :, 1] = image[:, :, 1] / 255.0  # normalise and feed in the image
            batch_data[folder, idx, :, :, 2] = image[:, :, 2] / 255.0  # normalise and feed in the image

        batch_labels[folder, int(t[folder + (batch * batch_size)].strip().split(';')[2])] = 1

    return batch_data, batch_labels


def generator(source_path, folder_list, batch_size, img_specs=img_specs):
    print(f"Source Path: {source_path}; Batch Size: {batch_size}")
    while True:
        t = np.random.permutation(folder_list)
        num_batches = int(len(folder_list) / batch_size)
        for batch in range(num_batches):  # we iterate over the number of batches
            yield get_batch_labels_and_data(source_path, t, batch, batch_size, img_specs)  # you yield the batch_data and the batch_labels, remember what does yield do

        # write the code for the remaining data points which are left after full batches
        if len(folder_list) % batch_size != 0:
            batch_size = len(folder_list) % batch_size
            yield get_batch_labels_and_data(source_path, t, batch, batch_size, img_specs)  # you yield the batch_data and the batch_labels, remember what does yield do

Please take note that in the generator, a video is represented as a tensor with dimensions (number of images, height, width, number of channels). Keep this representation in mind when designing the model architecture.

In [23]:
# initializes the current date and time, sets up paths for training and validation data, and then
# prints out the number of training and validation sequences as well as the number of epochs to be used in the training process.
curr_dt_time = datetime.datetime.now()
train_path = '/content/drive/My Drive/Gesture Recognition/Project_data/Project_data/train'

val_path = '/content/drive/My Drive/Gesture Recognition/Project_data/Project_data/val'
num_train_sequences = len(train_doc)
print(f"# training seq: {num_train_sequences}")
num_val_sequences = len(val_doc)
print(f"# validation seq: {num_val_sequences}")
num_epochs = 20
print(f"# epcohs: {num_epochs}")

# training seq: 663
# validation seq: 100
# epcohs: 20


## Model Architecture
In this section, the model is constructed using various functionalities offered by Keras. It's important to utilize `Conv3D` and `MaxPooling3D` for creating a 3D convolutional model, avoiding the use of `Conv2D` and `MaxPooling2D`. If implementing a Conv2D + RNN model, be sure to incorporate `TimeDistributed`. Additionally, keep in mind that the final layer should utilize the softmax activation function.

The network design should prioritize achieving high accuracy with minimal parameter usage to ensure compatibility with the memory constraints of the webcam.

In [24]:

# Define the input shape for the model
# The input shape represents (number of images, height, width, number of channels)
input_shape = (len(img_specs[0]), img_specs[1], img_specs[2], 3)

### Model - 1:

In [25]:


# # model

model = Sequential()
model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))

model.add(Conv3D(63, kernel_size=3, activation='relu'))
model.add(MaxPooling3D(pool_size=2))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(5, activation='softmax'))



Number of Epochs: 20 <br />
Training Accuracy: 0.20 | Validation Accuracy: 0.41 <br />
The current model demonstrates limited learning capability. To address this, additional layers will be incorporated into the model.

### Model - 2

In [26]:
# # model 2 -- increasing layers

# model = Sequential()

# model.add(Conv3D(16, (3, 3, 3), padding='same',input_shape=input_shape))
# model.add(Activation('relu'))
# model.add(BatchNormalization())

# model.add(Conv3D(16, (3, 3, 3), padding='same',input_shape=input_shape))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(32, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())

# model.add(Conv3D(32, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(64, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())

# model.add(Conv3D(64, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(128, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())

# model.add(Conv3D(128, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Flatten())
# model.add(Dense(64, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.25))

# model.add(Dense(63, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.25))

# model.add(Dense(5, activation='softmax'))

Number of Epochs: 20<br />
Training Accuracy: 0.61 | Validation Accuracy: 0.75 <br />
Both the training and validation accuracies fall short of expectations. To address this, certain parameters will be reduced in an attempt to enhance model performance.

### Model - 3

In [27]:
# # model 3 -- reducing parameters

# model = Sequential()

# model.add(Conv3D(16, (3, 3, 3), padding='same',input_shape=input_shape))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(32, (2, 2, 2), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(64, (2, 2, 2), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(128, (2, 2, 2), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Flatten())
# model.add(Dense(64,activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.25))

# model.add(Dense(64, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.25))

# model.add(Dense(5, activation='softmax'))

Number of Epochs: 20 <br />
Training Accuracy: 0.97 | Validation Accuracy: 0.83 <br />
The model appears to be exhibiting signs of overfitting. To mitigate this, a further reduction in the number of parameters is warranted.

### Model - 4

In [28]:
# # model 4 -- reducing more parameters

# model = Sequential()

# model.add(Conv3D(16, (3, 3, 3), padding='same',input_shape=input_shape))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(32, (3, 3, 3), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(64, (2, 2, 2), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(128, (2, 2, 2), padding='same'))
# model.add(Activation('relu'))
# model.add(BatchNormalization())
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Flatten())
# model.add(Dense(64, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.25))

# model.add(Dense(64, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.25))

# model.add(Dense(5, activation='softmax'))

Number of Epochs: 20 <br />
Training Accuracy: 0.17 | Validation Accuracy: 0.5 <br />
The extensive reduction in parameters has resulted in a severely underfitting model. <br />
To address this, the number of epochs will be increased from 20 to 40. Additionally, a new model will be designed, and the Dropout layers will be removed.

### Model - 5

In [29]:
# # model - 5 # Increasing parameters

# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))
# model.add(BatchNormalization())

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())

# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())

# model.add(Flatten())
# model.add(Dense(512, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

Number of Epochs: 40<br />
Training Accuracy: 1.00 | Validation Accuracy: 0.91 <br />
The evident overfitting of the model is a concern. To address this, a reduction in the number of parameters will be implemented, along with reintroducing Dropout layers.

### Model - 6

In [30]:
# # model - 6 Reducing the parameter count and reintroducing Dropout layers.

# model = Sequential()

# model.add(Conv3D(16, (5, 5, 5), activation='relu', input_shape=input_shape))
# model.add(MaxPooling3D((2, 2, 2), padding='same'))
# model.add(BatchNormalization())

# model.add(Conv3D(32, (3, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2), padding='same'))
# model.add(BatchNormalization())

# model.add(Conv3D(64, (3, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2), padding='same'))
# model.add(BatchNormalization())

# model.add(Flatten())
# model.add(Dense(128, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Dense(64, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Dense(5, activation='softmax'))

Number of Epochs: 40 <br />
Training Accuracy: 0.99 | Validation Accuracy: 1.00 <br />
Despite a substantial rise in validation accuracy, the model continues to exhibit signs of overfitting. <br />
To address this, an approach involving the introduction of additional Dropout layers and transitioning from Flatten to GlobalAveragePooling3D will be pursued.

### Model - 7

In [31]:
# # model - 7 Elevating the count of Dropout layers and substituting Flatten with GlobalAveragePooling3D.

# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(GlobalAveragePooling3D())
# model.add(Dense(512, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

Number of Epochs: 40<br />
Training Accuracy: 0.99 | Validation Accuracy: 0.91 <br />
While both training and validation accuracy are in the 0.9 range, the training accuracy's proximity to 1 suggests potential overfitting. <br />
Considering the likelihood of overfitting, experimenting with different model architectures is advised.

### Model - 8

In [32]:
# # model - 8 Implementing a CNN with LSTM architecture, while decreasing the number of epochs to 20.

# model = Sequential()

# model.add(TimeDistributed(Conv2D(16, (3, 3) , padding='same', activation='relu'), input_shape=input_shape))
# model.add(TimeDistributed(BatchNormalization()))
# model.add(TimeDistributed(MaxPooling2D((2, 2))))

# model.add(TimeDistributed(Conv2D(32, (3, 3) , padding='same', activation='relu')))
# model.add(TimeDistributed(BatchNormalization()))
# model.add(TimeDistributed(MaxPooling2D((2, 2))))

# model.add(TimeDistributed(Conv2D(64, (3, 3) , padding='same', activation='relu')))
# model.add(TimeDistributed(BatchNormalization()))
# model.add(TimeDistributed(MaxPooling2D((2, 2))))

# model.add(TimeDistributed(Conv2D(128, (3, 3) , padding='same', activation='relu')))
# model.add(TimeDistributed(BatchNormalization()))
# model.add(TimeDistributed(MaxPooling2D((2, 2))))

# model.add(TimeDistributed(Conv2D(256, (3, 3) , padding='same', activation='relu')))
# model.add(TimeDistributed(BatchNormalization()))
# model.add(TimeDistributed(MaxPooling2D((2, 2))))

# model.add(TimeDistributed(Flatten()))
# model.add(LSTM(64))
# model.add(Dropout(0.25))

# model.add(Dense(64,activation='relu'))
# model.add(Dropout(0.25))

# model.add(Dense(5, activation='softmax'))

Number of Epochs: 20 <br />
Training Accuracy: 0.50 | Validation Accuracy: 0.66 <br />
The model's accuracy falls below expectations. Exploring an alternate architecture: Conv2D with GRU.

### Model - 9

In [33]:
# # model - 9 Implementing a TimeDistributed Conv2D combined with GRU architecture.

# model = Sequential()

# model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu'), input_shape=input_shape))
# model.add(TimeDistributed(MaxPooling2D(2, 2)))
# model.add(BatchNormalization())

# model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu')))
# model.add(TimeDistributed(MaxPooling2D(2, 2)))
# model.add(BatchNormalization())

# model.add(TimeDistributed(GlobalAveragePooling2D()))
# model.add(TimeDistributed(Dense(63, activation='relu')))
# model.add(BatchNormalization())

# model.add(GRU(128))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='relu'))

Number of Epochs: 40 <br />
Training Accuracy: 0.96 | Validation Accuracy: 0.81 <br />
The notable gap between training and validation accuracy indicates potential overfitting. Incorporating Dropout layers into the model to address this.

### Model - 10

In [34]:
# model - 10 # Implementing a TimeDistributed Conv2D combined with GRU architecture.

model = Sequential()

model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu'), input_shape=input_shape))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())

model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())

model.add(TimeDistributed(GlobalAveragePooling2D()))
model.add(TimeDistributed(Dense(63, activation='relu')))
model.add(BatchNormalization())
model.add(Dropout(0.2))

model.add(GRU(128))
model.add(BatchNormalization())
model.add(Dense(5, activation='relu'))

Number of Epochs: 40 <br />
Training Accuracy: 0.86 | Validation Accuracy: 0.58 <br />
Shifting the model to TimeDistributed Conv2D with GlobalAveragePooling3D, along with extending the number of epochs to 50.

### Model - 11

In [35]:

# model - 11 Implementing a TimeDistributed Conv2D model with GlobalAveragePooling3D.

model = Sequential()

model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu'), input_shape=input_shape))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())

model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())

model.add(TimeDistributed(Conv2D(128, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())

model.add(GlobalAveragePooling3D())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(5, activation='softmax'))

Epochs: 50 <br />
Training Accuracy: 0.99 | Validation Accuracy: 0.91 <br />
Both training and validation accuracy are within the 0.9 range. <br />
Yet, the proximity of the training accuracy to 1 suggests potential overfitting. <br />
Incorporating Dropout layers into the model (10), while maintaining the same number of epochs.

### Model - 12

In [36]:
model = Sequential()

model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu'), input_shape=input_shape))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())

model.add(TimeDistributed(Conv2D(64, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(TimeDistributed(Conv2D(128, (3, 3), activation='relu')))
model.add(TimeDistributed(MaxPooling2D(2, 2)))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(GlobalAveragePooling3D())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))

Number of Epochs: 50 <br />
Training Accuracy: 0.93 | Validation Accuracy: 0.91 <br />
Both training and validation accuracy are approximately in the 0.9 range. <br />
Furthermore, the disparity between the accuracies for the training and validation sets is relatively small. <br />
Opting for **Model - 12** (TimeDistributed Conv2D with GlobalAveragePooling3D and Dropouts).

The subsequent action involves `compiling` the model. Upon printing the model's `summary`, you'll gain insight into the overall count of parameters requiring training.

In [37]:
optimiser = optimizers.Adam(lr=0.01)  # write your optimizer
model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])
print(model.summary())



Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed_12 (TimeDi  (None, 20, 98, 98, 32)   896       
 stributed)                                                      
                                                                 
 time_distributed_13 (TimeDi  (None, 20, 49, 49, 32)   0         
 stributed)                                                      
                                                                 
 batch_normalization_8 (Batc  (None, 20, 49, 49, 32)   128       
 hNormalization)                                                 
                                                                 
 time_distributed_14 (TimeDi  (None, 20, 47, 47, 64)   18496     
 stributed)                                                      
                                                                 
 time_distributed_15 (TimeDi  (None, 20, 23, 23, 64)  

We will now generate the `train_generator` and `val_generator`, both of which will be employed in the `.fit_generator` function.

In [38]:
# Create data generators for training and validation
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

In [39]:
# Model Checkpoints and Callbacks
model_name = 'model_init_{}/'.format(str(curr_dt_time).replace(' ', '').replace(':', '_'))

if not os.path.exists(model_name):
    os.mkdir(model_name)

filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath,
                             monitor='val_loss',
                             verbose=1,
                             save_best_only=False,
                             save_weights_only=False,
                             mode='auto',
                             save_freq='epoch')

LR = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.001, verbose=1)  # write the REducelronplateau code here

callbacks_list = [checkpoint, LR]

The `steps_per_epoch` and `validation_steps` parameters are utilized by the `fit` method to determine the count of `next()` calls required.

In [40]:
# Determining Steps per Epoch and Validation Steps
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

We will proceed to fit the model, initiating the training process. The checkpoints will facilitate the saving of the model at the conclusion of every epoch.

In [41]:
# Number of Epochs
num_epochs =50
print(f"Epochs: {num_epochs}")
# Training the Model

Epochs: 50


In [42]:
history = model.fit(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1,
                    callbacks=callbacks_list, validation_data=val_generator,
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)


Source Path: /content/drive/My Drive/Gesture Recognition/Project_data/Project_data/train; Batch Size: 39


  image = imread('{0}/{1}/{2}'.format(source_path, t[folder + (batch * batch_size)].strip().split(';')[0], imgs[item])).astype(np.float32)


In [None]:

import matplotlib.pyplot as plt
%matplotlib inline
# Plotting Model Loss and Accuracy
plt.figure(figsize=(20,6))
ax1 = plt.subplot(121)
ax1 = plt.plot(history.history['loss'])
ax1 = plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='lower left')
ax2 = plt.subplot(122)
ax2 = plt.plot(history.history['categorical_accuracy'])
ax2 = plt.plot(history.history['val_categorical_accuracy'])
plt.title('model accuracy')
plt.ylabel('categorical_accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='lower left')