# Introduction:

this dataset contains A-Z and 0-9 hand gestures which im going to use it to create a pipline or gesture recognition system using Neural network. i have used three models to create this pipline and out of the three i chose the best model with the highest accuracy and score.
the URL to the dataset is below and every step has been explained below one by one.

link to the url:

https://www.kaggle.com/datasets/ahmedkhanak1995/sign-language-gesture-images-dataset/data


# Uploading File
In the following i'm uploading the dataset directly from kaggle to do that i need to use my kaggle credential therefor have to download the json file first and only then i can access the dataset on kaggle. the file goes under content folder in my drive and then we can unzip it for further procedures.

In [1]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"habibsh","key":"3c414fb09cc76110ee8ad631ec963f64"}'}

In [1]:
!pip install kaggle

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d ahmedkhanak1995/sign-language-gesture-images-dataset

!unzip sign-language-gesture-images-dataset.zip -d /content/

from IPython.display import clear_output
clear_output()


# Importing Required Libraries

In [7]:
import os
import numpy as np
from PIL import Image
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import VGG16, DenseNet121
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Loading and Preprocessing Data
since we have iploaded the dataset directly from kaggle using our credentials. we have to give the dataset path which is located in the content
in our drive. also we need to set the img dimension and size which we are setting it to 32. and then we are creating a function to make it reusable in the future to take the image
then resize and normalize it to the range between 0 and 1.

In [10]:
ImgSize = (32, 32)
Path_to_Dataset = '/content/Gesture Image Pre-Processed Data'

def prep_and_load_img(ImgPath):
    img = tf.io.read_file(ImgPath)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, ImgSize)
    img = tf.cast(img, tf.float32) / 255.0
    return img

To avoid errors we are creating a function which takes the path as an argument and then checks and list every classes. and then it iterates through every class and convert the file into numpy array which makes our data ready to split. also we need to make sure that the directory exists and to do that we have to use isdir builtin function.

In [15]:

def loading_data(Path_to_Dataset):
    images = []
    labels = []
    classes = sorted(os.listdir(Path_to_Dataset))
    cls_indices = {cls: idx for idx, cls in enumerate(classes)}

    for cls in classes:
        clsPath = os.path.join(Path_to_Dataset, cls)
        if os.path.isdir(clsPath):
            for file_name in os.listdir(clsPath):
                imgPath = os.path.join(clsPath, file_name)
                img = prep_and_load_img(imgPath)
                images.append(img.numpy())
                labels.append(cls_indices[cls])

    return np.array(images), np.array(labels), len(classes)

images, labels, num_classes = loading_data(Path_to_Dataset)

# Splitting our data

In [16]:
train_imgs, temp_images, train_lbls, temp_labels = train_test_split(images, labels, test_size=0.3, random_state=42)
val_images, test_images, val_lbls, test_labels = train_test_split(temp_images, temp_labels, test_size=0.5, random_state=42)

# Data Augmentation
To improve our model and avoid overfitting we are using data augmentation to apply random transformation. to improve the models ability to generalize better we used different rotation range and also changed the other parameters but it didnt cause any overfitting or underfitting but with 50 it took longer to run the model.

In [17]:
datagenr = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)
datagenr.fit(train_imgs)

# Experiments and Discussions
Beside running CNN, i tried VG16 and Densenet as well. out of all three models the best one CNN with accuracy of 99.26. i changed number of epochs, patience, filters and etc, and everytime the accuracy was different. the second best model was VGG16 with dropout 0.5, patience 3, epochs 20 and dense layer of 512 and the accuracy was 95. our last model that we trained and compiled our model on was Densenet with the same parameters as CNN and VGG16 and the accuracy was 91. below i have created a table to see the differences. After running and different models i decided to keep their results in table using pandas methods and left only the best model to keep the code clean and short and also easy to understand for future uses.

Unnamed: 0,Model,Accuracy,Dropout,Patience,Epochs,Optimizer,Filters,Dense
0,CNN,99.26,0.5,5,10,adam,32+64+168,512
1,CNN,99.17,0.5,3,20,adam,32+64+168,512
2,CNN,94.0,0.5,3,20,adam,32+64,512
3,CNN,94.0,0.5,3,15,adam,32+65,128
4,CNN,28.0,0.5,0,10,adam,32+66,128
5,VGG16,95.0,0.5,3,20,adam,,512
6,DenseNet,91.0,0.5,3,20,adam,,512


# Defining the Best Model - CNN
I tried three different models but the best one was CNN with three different layers which are: convolutional layers, max pooling layers and dense layers. in the first version i tried using only 32 and 64 filters which gave me very unsatisfying accuracy 69%. but then i added 128 filters and also i changed the dense from 128 to 512 which increased the accuracy to 99.17%. after the number of epochs from 10 to 20 it stopped at epochs 14. then i changed the number of patience from 3 to 5.

In [18]:
def cnn_model(input_shape, num_classes):
    model = Sequential([
        Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
        MaxPooling2D(pool_size=(2, 2)),
        Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
        MaxPooling2D(pool_size=(2, 2)),
        Conv2D(filters=128, kernel_size=(3, 3), activation='relu'),
        MaxPooling2D(pool_size=(2, 2)),
        Flatten(),
        Dense(512, activation='relu'),
        Dropout(0.5),
        Dense(num_classes, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# Traning the Best Model
After trying different models and changing parameters we decided to train on our model which is CNN with the highest accuracy of 99.26. to do that we use our cnn_model function with two parameters input and number of classes. i changed the number of patience to 3 and epochs to 20 but since validation accuracy was constanly incearing it stop at epoch 14, therefor i tried patience 5 and number of epochs to 10.

In [21]:
cnn_model = cnn_model(input_shape=(32, 32, 3), num_classes=num_classes)
history = cnn_model.fit(datagenr.flow(train_imgs, train_lbls, batch_size=32),
                        epochs=10,
                        validation_data=(val_images, val_lbls),
                        callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)])

# Evaluate the Model
val_loss, val_accuracy = cnn_model.evaluate(val_images, val_lbls)
print(f"Validation Accuracy: {val_accuracy}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Validation Accuracy: 0.9925525784492493


# Conclusion and findings
The purpose of this project was to to create a pipeline for image classification task. after defining three different models ( Convolutional Neural Network, VGG16 model, and Densenet. I compare each of them using different, patience (becuase when a model is doing a great job and number of accuracy increases consistenly, the model stops training to avoid overfitting. therefor if you still want to know the how far the model can go and you dont want it to stop then we should use it.), DropOut Rate was use in all model to avoid overfitting and the result was good and maintained the high accuracy. the highest accuracy was achieved when we set the number of epochs to 10 but it was varied everytime in our different expremints. the more teh number of epochs the longer it takes to train our model. using combination of 3 different filters helped our models accuracy which was 32,64,128. the number of dense layer is also important and can affect our models accuracy, as you can see in our model when we increased our dense layer from 128 to 512 it increased our models accuracy. future work would be to try different parameters using VGG and pre trained models to improve their accuracy and to make sure if they can also reached our best model's accuracy.