# Gesture Recognition
In this group project, you are going to build a 3D Conv model that will be able to predict the 5 gestures correctly. Please import the following libraries to get started.

In [1]:
import numpy as np
import os
from imageio import imread
from PIL import Image
import datetime
import os

We set the random seed so that the results don't vary drastically.

In [2]:
np.random.seed(30)
import random as rn
rn.seed(30)
from keras import backend as K
import tensorflow as tf
tf.random.set_seed(30)

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [4]:
cwd = os.getcwd()
train_doc = np.random.permutation(open(os.path.join(cwd,'train.csv')).readlines())
val_doc = np.random.permutation(open(os.path.join(cwd,'val.csv')).readlines())
batch_size = 32

## Generator
This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with `img_idx`, `y`,`z` and normalization such that you get high accuracy.

In [5]:
import numpy as np
import os
from imageio import imread  # Make sure scipy is installed as mentioned
from PIL import Image
import cv2  # For image processing

def generator(source_path, folder_list, batch_size):
    print('Source path =', source_path, '; batch size =', batch_size)
    img_idx = list(range(0, 15))
    # Get number of batches
    num_batches = len(folder_list) // batch_size
    x = 15
    y = 64
    z = 64
    while True:
        t = np.random.permutation(folder_list)  # Shuffle the folder list
        
        for batch in range(num_batches):  # Iterate over the number of batches
            batch_data = np.zeros((batch_size, x, y, z, 3))  # Create a placeholder for batch data (x, y, z, 3 channels)
            batch_labels = np.zeros((batch_size, 5))  # One-hot encoded labels (for 5 classes)

            for folder in range(batch_size):  # Iterate over the batch_size
                folder_path = os.path.join(source_path, t[folder + (batch * batch_size)].split(';')[0])
                imgs = os.listdir(folder_path)  # Read all images in the folder
                
                # Iterate over the frames/images in the folder (based on `img_idx`)
                for idx, item in enumerate(img_idx):
                    image = imread(os.path.join(folder_path, imgs[item])).astype(np.float32)

                    # Crop and resize images to ensure consistent shape (y, z)
                    image = cv2.resize(image, (z, y))  # Resize to (y, z) shape, ensure the right shape for Conv3D

                    # Normalize and feed in the image
                    batch_data[folder, idx, :, :, 0] = image[:, :, 0] / 255.0  # Normalize RGB channels
                    batch_data[folder, idx, :, :, 1] = image[:, :, 1] / 255.0
                    batch_data[folder, idx, :, :, 2] = image[:, :, 2] / 255.0

                # Assign label (one-hot encoding)
                label_index = int(t[folder + (batch * batch_size)].strip().split(';')[2])
                batch_labels[folder, label_index] = 1
            
            yield batch_data, batch_labels  # Yield the batch data and labels

        # Handle the remaining data points after full batches
        remaining_samples = len(folder_list) % batch_size
        if remaining_samples > 0:
            batch_data = np.zeros((remaining_samples, x, y, z, 3))
            batch_labels = np.zeros((remaining_samples, 5))

            for folder in range(remaining_samples):
                folder_path = os.path.join(source_path, t[folder + (num_batches * batch_size)].split(';')[0])
                imgs = os.listdir(folder_path)

                for idx, item in enumerate(img_idx):
                    image = imread(os.path.join(folder_path, imgs[item])).astype(np.float32)

                    # Crop and resize images to ensure consistent shape (y, z)
                    image = cv2.resize(image, (z, y))

                    # Normalize and feed in the image
                    batch_data[folder, idx, :, :, 0] = image[:, :, 0] / 255.0
                    batch_data[folder, idx, :, :, 1] = image[:, :, 1] / 255.0
                    batch_data[folder, idx, :, :, 2] = image[:, :, 2] / 255.0

                # Assign label (one-hot encoding)
                label_index = int(t[folder + (num_batches * batch_size)].strip().split(';')[2])
                batch_labels[folder, label_index] = 1

            yield batch_data, batch_labels  # Yield the remaining batch data and labels


Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

In [6]:
curr_dt_time = datetime.datetime.now()
train_path = os.path.join(cwd, 'train')
val_path = os.path.join(cwd, 'val')
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
num_epochs = 30
print ('# epochs =', num_epochs)

# Hyperparameters
img_size = (64, 64)  # Resize images
frames = 15  # Number of frames per sequence
learning_rate = 0.001

# training sequences = 663
# validation sequences = 100
# epochs = 30


## Model
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. You would want to use `TimeDistributed` while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

In [7]:
from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation

from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers
from tensorflow.keras.layers import Conv3D, MaxPooling3D
from keras.layers import Dropout, Reshape


def create_model(input_shape, num_classes):
    model = Sequential()
    

    model.add(Conv3D(32, kernel_size=(3, 3, 3), activation='relu', input_shape=input_shape))
    model.add(MaxPooling3D(pool_size=(2, 2, 2)))
    
    model.add(Conv3D(64, kernel_size=(3, 3, 3), activation='relu'))
    model.add(MaxPooling3D(pool_size=(2, 2, 2)))
    
    model.add(Flatten())  
    

    model.add(Reshape((-1, 64)))  
    model.add(GRU(128, return_sequences=False, activation='relu'))
    
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))
    
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    
    return model


Now that you have written the model, the next step is to `compile` the model. When you print the `summary` of the model, you'll see the total number of parameters you have to train.

In [30]:
model = create_model(input_shape=(frames, *img_size, 3), num_classes=5)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [31]:
print(model.summary())

None


Let us create the `train_generator` and the `val_generator` which will be used in `.fit_generator`.

In [32]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size)

In [None]:
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau


model_name = 'model_init' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'

if not os.path.exists(model_name):
    os.mkdir(model_name)

filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{accuracy:.5f}-{val_loss:.5f}-{val_accuracy:.5f}.keras'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', save_freq='epoch')


LR = ReduceLROnPlateau(monitor='val_loss',  
                       factor=0.5,           
                       patience=5,           
                       verbose=1,            
                       min_lr=1e-6)          

callbacks_list = [checkpoint, LR]


The `steps_per_epoch` and `validation_steps` are used by `fit_generator` to decide the number of next() calls it need to make.

In [44]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1

if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, you'll be able to save the model at the end of each epoch.

In [45]:
model.fit(
    train_generator,                
    steps_per_epoch=steps_per_epoch, 
    epochs=num_epochs,              
    verbose=1,                        
    callbacks=callbacks_list,        
    validation_data=val_generator,   
    validation_steps=validation_steps                
)

  image = imread(os.path.join(folder_path, imgs[item])).astype(np.float32)


Epoch 1/30
[1m30/34[0m [32m━━━━━━━━━━━━━━━━━[0m[37m━━━[0m [1m6s[0m 2s/step - accuracy: 0.2432 - loss: 1.6037

  image = imread(os.path.join(folder_path, imgs[item])).astype(np.float32)


[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2s/step - accuracy: 0.2426 - loss: 1.6039
Epoch 1: saving model to model_init_2025-03-0211_52_28.766669/model-00001-1.60536-0.23680-1.59700-0.25000.keras
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 2s/step - accuracy: 0.2424 - loss: 1.6039 - val_accuracy: 0.2500 - val_loss: 1.5970 - learning_rate: 0.0010
Epoch 2/30
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2s/step - accuracy: 0.2231 - loss: 1.6020
Epoch 2: saving model to model_init_2025-03-0211_52_28.766669/model-00002-1.59271-0.26094-1.57494-0.34000.keras
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m56s[0m 2s/step - accuracy: 0.2242 - loss: 1.6018 - val_accuracy: 0.3400 - val_loss: 1.5749 - learning_rate: 0.0010
Epoch 3/30
[1m34/34[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2s/step - accuracy: 0.2852 - loss: 1.5770
Epoch 3: saving model to model_init_2025-03-0211_52_28.766669/model-00003-1.56421-0.27149-

<keras.src.callbacks.history.History at 0x20040ecbaa0>

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Reshape, GRU, Dense, Dropout
from tensorflow.keras.applications import ResNet50

def cnn_rnn_model(input_shape, num_classes):
    model = Sequential()
    
    # ResNet50 as the base model (exclude the top layers)
    base_model = ResNet50(weights='imagenet', include_top=False, input_shape=input_shape[1:])
    model.add(base_model)

    model.add(Flatten())  # Flatten after the CNN layers
    
    # RNN Layer (GRU or LSTM)
    model.add(GRU(128, return_sequences=False, activation='relu'))
    
    # Dense layers for classification
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax'))  # For multi-class classification
    
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

In [53]:
model = cnn_rnn_model(input_shape=(15, 64, 64, 3), num_classes=5)

ValueError: `input_shape` must be a tuple of three integers.