# Deep Learning Course Project - Gesture Recognition

### Problem Statement
Imagine you are working as a data scientist at a home electronics company which manufactures state of the art smart televisions. You want to develop a cool feature in the smart-TV that can recognise five different gestures performed by the user which will help users control the TV without using a remote.

The gestures are continuously monitored by the webcam mounted on the TV. Each gesture corresponds to a specific command:
 
| Gesture | Corresponding Action |
| --- | --- | 
| Thumbs Up | Increase the volume. |
| Thumbs Down | Decrease the volume. |
| Left Swipe | 'Jump' backwards 10 seconds. |
| Right Swipe | 'Jump' forward 10 seconds. |
| Stop | Pause the movie. |

Each video is a sequence of 30 frames (or images).

### Objectives:
1. **Generator**:  The generator should be able to take a batch of videos as input without any error. Steps like cropping, resizing and normalization should be performed successfully.

2. **Model**: Develop a model that is able to train without any errors which will be judged on the total number of parameters (as the inference(prediction) time should be less) and the accuracy achieved. As suggested by Snehansu, start training on a small amount of data and then proceed further.

3. **Write up**: This should contain the detailed procedure followed in choosing the final model. The write up should start with the reason for choosing the base model, then highlight the reasons and metrics taken into consideration to modify and experiment to arrive at the final model. 

In [10]:
pip install keras


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [9]:
## Imports

import numpy as np
import os
import skimage
from skimage.io import imread
from skimage.transform import resize
import datetime



import cv2
import matplotlib.pyplot as plt
%matplotlib inline

import abc
from sys import getsizeof

# importing some other libraries which will be needed for model building.

from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation
from keras.layers import Conv3D, MaxPooling3D, Conv2D, MaxPooling2D
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from keras import optimizers
from keras.layers import Dropout

import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'keras.layers.convolutional'

We set the random seed so that the results don't vary drastically.

In [None]:
import os
np.random.seed(30)
import random as rn
rn.seed(30)
from keras import backend as K
import tensorflow as tf
tf.random.set_seed(30)

In this block, you read the folder names for training and validation. You also set the `batch_size` here. Note that you set the batch size in such a way that you are able to use the GPU in full capacity. You keep increasing the batch size until the machine throws an error.

In [None]:
# path with csv containing folder names 
train_csv_path = '/content/drive/My Drive/Upgrad/G_Ass/Project_data/train.csv'
val_csv_path = '/content/drive/My Drive/Upgrad/G_Ass/Project_data/val.csv'

# path of train and val folders
train_path = '/content/drive/My Drive/Upgrad/G_Ass/Project_data/train'
val_path = '/content/drive/My Drive/Upgrad/G_Ass/Project_data/val'

# image size 
image_shape = (160,160,3) 
dim_x = 160
dim_y = 160


# batch_size 
batch_size = 32

# number of epochs
num_epochs = 30 

# image augmentation
augmentation = False

# retrain cnn
retrain = True

In [None]:
# index of frames processed in each video 

def video_frames(mode='alternate',length=None) : 
    if mode == 'alternate' :
        return [0,2,4,6,8,10,12,14,16,18,20,22,24,26,28]
    elif mode == 'all' :
        return list(range(30))
    elif mode == 'middle' : 
        return list(range(5,25))
    elif (mode == 'random') and length : 
        return list(np.random.rand(0,29,length))
    
frames_to_sample = video_frames(mode='all')

In [None]:
# image augmentation 
    

# detecting skin tones. Since, gestures are performed by humans, masking the background and only detecting the skin could be a greate preprocessing step.
# The below function is skin tone filter
def skin_rules(R_Frame, G_Frame, B_Frame) : 
    BRG_Max = np.maximum.reduce([B_Frame, G_Frame, R_Frame])
    BRG_Min = np.minimum.reduce([B_Frame, G_Frame, R_Frame])
    #at uniform daylight, The skin colour illumination's rule is defined by the following equation :
    Rule_1 = np.logical_and.reduce([R_Frame > 95, G_Frame > 40, B_Frame > 20 ,
                                 BRG_Max - BRG_Min > 15,abs(R_Frame - G_Frame) > 15, 
                                 R_Frame > G_Frame, R_Frame > B_Frame])
    #the skin colour under flashlight or daylight lateral illumination rule is defined by the following equation :
    Rule_2 = np.logical_and.reduce([R_Frame > 220, G_Frame > 210, B_Frame > 170,
                         abs(R_Frame - G_Frame) <= 15, R_Frame > B_Frame, G_Frame > B_Frame])
    #Rule_1 U Rule_2
    RGB_Rule = np.logical_or(Rule_1, Rule_2)
    return RGB_Rule

# The below function detects skin and removes other scene elements.
def detect_skin(img) : 

    mask = skin_rules(img[:,:,0], img[:,:,1], img[:,:,2])
    img[:,:,0] = img[:,:,0] * mask 
    img[:,:,1] = img[:,:,1] * mask 
    img[:,:,2] = img[:,:,2] * mask

    return img

def erode(img,kernel) : 
    img_erode = np.zeros_like(img)

    img_erode[:,:,0] = cv2.erode(img[:,:,0],kernel)
    img_erode[:,:,1] = cv2.erode(img[:,:,1],kernel)
    img_erode[:,:,2] = cv2.erode(img[:,:,2],kernel)

    return img_erode

def dilate(img,kernel) : 
    img_dilate = np.zeros_like(img)
    img_dilate[:,:,0] = cv2.dilate(img[:,:,0],kernel)
    img_dilate[:,:,1] = cv2.dilate(img[:,:,1],kernel)
    img_dilate[:,:,2] = cv2.dilate(img[:,:,2],kernel)

    return img_dilate


def closing(img,kernel) : 
    return dilate(erode(img,kernel),kernel)

def opening(img,kernel) : 
    return erode(dilate(img,kernel),kernel)


def open_close(img,kernel) : 
    return closing(opening(img,kernel),kernel)

def close_open(img,kernel) : 
    return closing(opening(img,kernel),kernel)

# opencv normalisation is used to prevent any math overflows.
def cv_normalise(img) : 
    img_new = np.zeros_like(img)
    cv2.normalize(img, img_new , 0,1, cv2.NORM_MINMAX)

    assert round(np.max(img_new),1) == 1, 'Normalisation error'+ str(np.max(img_new)) 
    assert round(np.min(img_new),1) == 0 ,'Normalisation error'+ str(np.min(img_new))

    return img_new


# opening and then closing is performed to remove noise from images and then skin is detected
def preprocess_image(img,kernel) : 

    img = open_close(img,kernel)
    img = detect_skin(img)

    return img

In [None]:
# model training history plot
def plot_model_history(history):
    fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15,4))
    axes[0].plot(history.history['loss'])   
    axes[0].plot(history.history['val_loss'])
    axes[0].legend(['loss','val_loss'])

    axes[1].plot(history.history['categorical_accuracy'])   
    axes[1].plot(history.history['val_categorical_accuracy'])
    axes[1].legend(['categorical_accuracy','val_categorical_accuracy'])

In [None]:
# Parsing train & validation csv
train_doc = np.random.permutation(open(train_csv_path).readlines())
val_doc = np.random.permutation(open(val_csv_path).readlines())

## Generator
This is one of the most important part of the code. The overall structure of the generator has been given. In the generator, you are going to preprocess the images as you have images of 2 different dimensions as well as create a batch of video frames. You have to experiment with `img_idx`, `y`,`z` and normalization such that you get high accuracy.

In [None]:
# Generator function 

def generator(source_path, folder_list, batch_size=batch_size, augmentation=augmentation):
    print( '\nSource path = ', source_path, '; batch size =', batch_size)
    img_idx = frames_to_sample
    while True:
        t = np.random.permutation(folder_list)
        num_batches = len(folder_list)// batch_size 
        for batch in range(num_batches): 
            batch_data = np.zeros((batch_size,len(img_idx),*image_shape))
            batch_labels = np.zeros((batch_size,5)) 
            
            for folder in range(batch_size): 
                imgs = os.listdir(source_path+'/'+ t[folder + (batch*batch_size)].split(';')[0]) 
                for idx,item in enumerate(img_idx): 
                    image = imread(source_path+'/'+ t[folder + (batch*batch_size)].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                    
                    
                    # Although images are of two different sizes, 120x160 images do not have much information in 0-20 and 140-60 band. 
                    # Hence 120x160 images could be cropped to 120x120 as follows.
                    if image.shape[0] == 120 : 
                        image = image[:,20:140]
                    # Similarly, 360x360 images could be center cropped since the gesture information is contained in the center.
                    if image.shape[0] == 360 : 
                        image = image[120:240,120:240]
                        
                    # Both images are brought to the same dimension and then resized to 160x160
                    image = resize(image, (160,160))
                        
                    # if augmentation is true, randomly mask scenes from a few images.
                    if augmentation and idx.isin(list(np.random.randint(0,29,5))): 
                        kernel = (1/16)*np.ones((4,4)) # kernel for morphological transformations 
                        image = preprocess_image(image,kernel)
                    
                        
                    # Normalisation 
                    batch_data[folder,idx,:,:,0] = cv_normalise(image[:,:,0])
                    batch_data[folder,idx,:,:,1] = cv_normalise(image[:,:,1])
                    batch_data[folder,idx,:,:,2] = cv_normalise(image[:,:,2])
                    
                    
                batch_labels[folder, int(t[folder + (batch*batch_size)].strip().split(';')[2])] = 1
                    
            yield batch_data, batch_labels 

        
        # Remaining data after integral batch_numbers 
        
        remainder_size = len(folder_list) % batch_size
        remainder_folders = t[-1*(remainder_size + 1) : -1 ]
        
        assert remainder_size == len(range(-1*(remainder_size + 1) , -1)) , 'Take care of the remainder folders '
        
        # The last remaining image folders are still loaded onto a tensor of batch_size. It has been noted that this doesnot affect performance.
        batch_data = np.zeros((batch_size,len(img_idx),*image_shape)) 
        batch_labels = np.zeros((batch_size,5))
        
        
        for folder in range(remainder_size): 
            
            imgs = os.listdir(source_path+'/'+ remainder_folders[folder].split(';')[0])
            
            
            for idx,item in enumerate(img_idx): 
                image = imread(source_path+'/'+ remainder_folders[folder].strip().split(';')[0]+'/'+imgs[item]).astype(np.float32)
                
                
                if image.shape[0] == 120 : 
                    image = image[:,20:140]
                if image.shape[0] == 360 : 
                    image = image[120:240,120:240]
                
                image = resize(image, (160,160))
                
                if augmentation and idx.isin(list(np.random.randint(0,29,5))): 
                        kernel = (1/16)*np.ones((4,4)) # kernel for morphological transformations 
                        image = preprocess_image(image,kernel)
                    

                batch_data[folder,idx,:,:,0] = cv_normalise(image[:,:,0])
                batch_data[folder,idx,:,:,1] = cv_normalise(image[:,:,1])
                batch_data[folder,idx,:,:,2] = cv_normalise(image[:,:,2])

            
            batch_labels[folder, int(remainder_folders[folder].strip().split(';')[2])] = 1
            
           
                
                    
        yield batch_data, batch_labels

Note here that a video is represented above in the generator as (number of images, height, width, number of channels). Take this into consideration while creating the model architecture.

In [None]:
# Sequence lengths

curr_dt_time = datetime.datetime.now()
num_train_sequences = len(train_doc)
print('# training sequences =', num_train_sequences)
num_val_sequences = len(val_doc)
print('# validation sequences =', num_val_sequences)
print ('# epochs =', num_epochs)

# training sequences = 663
# validation sequences = 100
# epochs = 30


In [None]:
# testing generative 
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size,augmentation=False)

## Model
Here you make the model using different functionalities that Keras provides. Remember to use `Conv3D` and `MaxPooling3D` and not `Conv2D` and `Maxpooling2D` for a 3D convolution model. You would want to use `TimeDistributed` while building a Conv2D + RNN model. Also remember that the last layer is the softmax. Design the network in such a way that the model is able to give good accuracy on the least number of parameters so that it can fit in the memory of the webcam.

In [None]:
from keras.models import Sequential, Model
from keras.layers import Dense, GRU, Flatten, TimeDistributed, Flatten, BatchNormalization, Activation
from keras.layers.convolutional import  Conv2D, MaxPooling2D
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from keras import optimizers
from keras.layers import Dropout
from keras.applications import mobilenet

### Experiment - 1 & 2
**Conv3D**

In [None]:
# model = Sequential()
# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(MaxPooling3D(pool_size=2))

# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=2))

# model.add(Flatten())
# model.add(Dense(256, activation='relu'))
# model.add(Dense(5, activation='softmax'))

### Experiment - 3
**Conv3D**

In [None]:
# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(256, kernel_size=(3, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(3, 2, 2)))

# model.add(Conv3D(512, kernel_size=(3, 3, 3), activation='relu'))
# model.add(Conv3D(512, kernel_size=(3, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Flatten())
# model.add(Dense(512, activation='relu'))
# model.add(Dense(5, activation='softmax'))

### Experiment - 4
**Conv3D**

In [None]:
# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))

# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))

# model.add(Flatten())
# model.add(Dense(512, activation='relu'))
# model.add(Dense(5, activation='softmax'))

### Experiment - 5 & 6
**Conv3D**

In [None]:
# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))
# model.add(BatchNormalization())

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())

# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())

# model.add(Flatten())
# model.add(Dense(512, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 7 & 8
**Conv3D**

In [None]:
# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))
# model.add(BatchNormalization())
# #model.add(Dropout(0.2))
# model.add(Dropout(0.5))

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# #model.add(Dropout(0.2))
# model.add(Dropout(0.5))

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# #model.add(Dropout(0.2))
# model.add(Dropout(0.5))

# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(Conv3D(512, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# #model.add(Dropout(0.2))
# model.add(Dropout(0.5))

# model.add(Flatten())
# model.add(Dense(512, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 9
**Conv3D**

In [None]:
# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Flatten())
# model.add(Dense(512, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 10
**Conv3D**

In [None]:
# loss: 0.1388 - categorical_accuracy: 0.9539 - val_loss: 0.1661 - val_categorical_accuracy: 0.9297
# model = Sequential()

# model.add(Conv3D(32, kernel_size=3, activation='relu', input_shape=input_shape))
# model.add(Conv3D(64, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(2, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Conv3D(128, kernel_size=3, activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(Conv3D(256, kernel_size=(1, 3, 3), activation='relu'))
# model.add(MaxPooling3D(pool_size=(1, 2, 2)))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(GlobalAveragePooling3D())
# model.add(Dense(512, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 11
**TimeDistributed Conv2D + GRU**

In [None]:
# model = Sequential()
# model.add(TimeDistributed(
#     Conv2D(32, (3,3), activation='relu'), input_shape=input_shape)
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())

# model.add(TimeDistributed(
#     Conv2D(64, (3,3), activation='relu'))
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())

# model.add(TimeDistributed(GlobalAveragePooling2D()))
# model.add(TimeDistributed(Dense(64, activation='relu')))
# model.add(BatchNormalization())

# model.add(GRU(128))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 12
**TimeDistributed Conv2D + GRU**

In [None]:
# model = Sequential()
# model.add(TimeDistributed(
#     Conv2D(32, (3,3), activation='relu'), input_shape=input_shape)
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(TimeDistributed(
#     Conv2D(64, (3,3), activation='relu'))
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(TimeDistributed(GlobalAveragePooling2D()))
# model.add(TimeDistributed(Dense(64, activation='relu')))
# model.add(BatchNormalization())
# model.add(Dropout(0.2))

# model.add(GRU(128))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 13
**TimeDistributed Conv2D + Dense**

In [None]:
# model = Sequential()
# model.add(TimeDistributed(
#     Conv2D(32, (3,3), activation='relu'), input_shape=input_shape)
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())

# model.add(TimeDistributed(
#     Conv2D(64, (3,3), activation='relu'))
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())

# model.add(TimeDistributed(
#     Conv2D(128, (3,3), activation='relu'))
# )
# model.add(TimeDistributed(
#     MaxPooling2D((2,2)))
# )
# model.add(BatchNormalization())

# model.add(GlobalAveragePooling3D())
# model.add(Dense(256, activation='relu'))
# model.add(BatchNormalization())
# model.add(Dense(5, activation='softmax'))

### Experiment - 14
**TimeDistributed + ConvLSTM2D**

In [None]:
# model = Sequential()
# model.add(TimeDistributed(
#     Conv2D(8, (3,3), activation='relu'), input_shape=input_shape)
# )
# model.add(BatchNormalization())
# model.add(TimeDistributed(
#     Conv2D(16, (3,3), activation='relu'))
# )
# model.add(BatchNormalization())
# model.add(
#     ConvLSTM2D(8, kernel_size = 3, return_sequences=False)
# )
# model.add(BatchNormalization())
# model.add(TimeDistributed(
#     Dense(64, activation='relu'))
# )
# model.add(BatchNormalization())
# model.add(GlobalAveragePooling2D())
# model.add(Dense(64, activation='relu'))
# model.add(Dense(5, activation='softmax'))

# Final Model

In [None]:
# Model Parameters
gru_cells = 64
dense_layer=64
dropout_ratio = 0.25
retrain_cnn = False

# Re-trained Mobile-Net CONV2D architecture followed by GRU (RNN)

mobilenet_transfer = mobilenet.MobileNet(weights='imagenet', include_top=False)

# CNN-RNN model
model = Sequential()
model.add(TimeDistributed(mobilenet_transfer,input_shape=(len(frames_to_sample),*image_shape)))

for layer in model.layers:
    layer.trainable = retrain
    
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2))))
model.add(TimeDistributed(Flatten()))

model.add(GRU(gru_cells))
model.add(Dropout(dropout_ratio))

model.add(Dense(dense_layer,activation='relu'))
model.add(Dropout(dropout_ratio))

model.add(Dense(5, activation='softmax'))


optimiser = optimizers.Adam(lr=0.0005)

model.compile(optimizer=optimiser, loss='categorical_crossentropy', metrics=['categorical_accuracy'])



Now that you have written the model, the next step is to `compile` the model. When you print the `summary` of the model, you'll see the total number of parameters you have to train.

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 time_distributed (TimeDistr  (None, 30, 5, 5, 1024)   3228864   
 ibuted)                                                         
                                                                 
 time_distributed_1 (TimeDis  (None, 30, 5, 5, 1024)   4096      
 tributed)                                                       
                                                                 
 time_distributed_2 (TimeDis  (None, 30, 2, 2, 1024)   0         
 tributed)                                                       
                                                                 
 time_distributed_3 (TimeDis  (None, 30, 4096)         0         
 tributed)                                                       
                                                                 
 gru (GRU)                   (None, 64)                7

Let us create the `train_generator` and the `val_generator` which will be used in `.fit_generator`.

In [None]:
train_generator = generator(train_path, train_doc, batch_size)
val_generator = generator(val_path, val_doc, batch_size,augmentation=False)

In [None]:
model_name = 'mobilenet_cnn_rnn' + '_' + str(curr_dt_time).replace(' ','').replace(':','_') + '/'
    
if not os.path.exists(model_name):
    os.mkdir(model_name)
        
filepath = model_name + 'model-{epoch:05d}-{loss:.5f}-{categorical_accuracy:.5f}-{val_loss:.5f}-{val_categorical_accuracy:.5f}.h5'

checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False, mode='auto', period=1)

LR = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_loss",
    factor=0.2,
    patience=4,
)
# earlyStopping = tf.keras.callbacks.EarlyStopping(
#     monitor="val_loss",
#     min_delta=0.00001,
# )

callbacks = [checkpoint,LR]



The `steps_per_epoch` and `validation_steps` are used by `fit_generator` to decide the number of next() calls it need to make.

In [None]:
if (num_train_sequences%batch_size) == 0:
    steps_per_epoch = int(num_train_sequences/batch_size)  
else:
    steps_per_epoch = (num_train_sequences//batch_size) + 1
if (num_val_sequences%batch_size) == 0:
    validation_steps = int(num_val_sequences/batch_size)
else:
    validation_steps = (num_val_sequences//batch_size) + 1

Let us now fit the model. This will start training the model and with the help of the checkpoints, you'll be able to save the model at the end of each epoch.

In [None]:
with tf.device('/GPU:0'):
    model_history = model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, epochs=num_epochs, verbose=1, 
                    callbacks=callbacks, validation_data=val_generator, 
                    validation_steps=validation_steps, class_weight=None, workers=1, initial_epoch=0)


Source path =  /content/drive/My Drive/Upgrad/G_Ass/Project_data/train ; batch size = 32
Epoch 1/30


In [None]:
plot_model_history(model_history)