First thing I did was to import libraries and modules for building and training a Convolutional Neural Network (CNN) model. 
The libraries include:
1. Matplotlib and Seaborn for Visualization.
2. Keras Modules for Building the Model.
3. ImageDataGenerator for Data Augmentation.
4. Train-Test and Evaluation Metrics.
5. Pandas for Data Handling

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
import pandas as pd

This project is a machine learning project that uses the Sign Language MNIST dataset.
The following code performs some preprocessing steps to prepare the data for training a machine learning model.

In [None]:
#Reads the data from both the train and test sign mnist data.

In [3]:
train_df = pd.read_csv("sign_mnist_train.csv")
test_df = pd.read_csv("sign_mnist_test.csv")

In [None]:
#Extracting Labels (target values) for the training data from both the training and test dataset.

In [4]:
y_train = train_df['label']
y_test = test_df['label']

In [None]:
#Removing labels from both the train and test dataset from features.

In [5]:
del train_df['label']
del test_df['label']

In [None]:
#Label Binarization

In [6]:
from sklearn.preprocessing import LabelBinarizer
label_binarizer = LabelBinarizer() # Initializes a LabelBinarizer object, which is used to convert the categorical labels into binary format.
y_train = label_binarizer.fit_transform(y_train) #Transforms the training labels into binary format. Each label is converted into a binary vector indicating the presence of that label (one-hot encoding).
y_test = label_binarizer.fit_transform(y_test) #Transforms the test labels in the same way as the training labels.

In [None]:
#Feature Scaling

In [7]:
x_train = train_df.values #Extracts the feature values from the training DataFrame and stores them in the x_train array.
x_test = test_df.values #Extracts the feature values from the test DataFrame and stores them in the x_test array.

In [8]:
x_train = x_train / 255 #Scales the training features to the range [0, 1] by dividing each pixel value by 255.
x_test = x_test / 255 #Scales the test features in the same way as the training features.


In [None]:
#Reshaping Data

In [9]:
x_train = x_train.reshape(-1,28,28,1) #Reshapes the training data to have a shape of (batch_size, height, width, channels), where each image is 28x28 pixels with 1 channel (grayscale).
x_test = x_test.reshape(-1,28,28,1) #Reshapes the test data in the same way as the training data.

After the above preprocessing steps, the data is ready for use in training and evaluating a machine learning model (CNN) for
recognizing sing language gestures based on the provided images.

We are going to initialize an 'ImageDataGenerator' object named 'datagen' with various augmentation options. The object wil be used to generate augmented training data by applying different transformations to the images. 

In [10]:
datagen = ImageDataGenerator(
        featurewise_center=False, #No centering of input features around zero.
        samplewise_center=False, #No centering of each sample around zero.
        featurewise_std_normalization=False, #No feature-wise standard normalization.
        samplewise_std_normalization=False, #No sample-wise standard normalization.
        zca_whitening=False, #ZCA whitening is not applied. ZCA whitening is a preprocessing step that reduces redundancy in the input data.
        rotation_range=10, #Randomly rotates images by up to 10 degrees.
        zoom_range = 0.1, #Randomly zooms in or out by up to 10%.
        width_shift_range=0.1, #Randomly shifts images horizontally by up to 10% of the image width.
        height_shift_range=0.1, #Randomly shifts images vertically by up to 10% of the image height.
        horizontal_flip=False, #No horizontal flipping.
        vertical_flip=False) #No vertical flipping.

datagen.fit(x_train)

After specifying these augmentation options, you use datagen.fit(x_train) to compute statistics (e.g., mean and standard deviation) on the training data. These statistics will be used to perform data augmentation during the training process. It's important to note that these transformations are applied only during training and not during evaluation.

With the ImageDataGenerator set up, you're ready to proceed with creating a CNN model, compiling it, and then training it using the augmented data generated by datagen. 

After processing the images, the CNN model must be compiled to recognize all of the classes of information being used in the data, namely the 24 different groups of images. Normalization of the data must also be added to the data, equally balancing the classes with less images.

In [11]:
model = Sequential()
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (28,28,1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(50 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(25 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Flatten())
model.add(Dense(units = 512 , activation = 'relu'))
model.add(Dropout(0.3))
model.add(Dense(units = 24 , activation = 'softmax'))

We have created a Convolutional Neural Network (CNN) model using the Keras Sequential API. 
This model consists of multiple layers, including convolutional, pooling, batch normalization, dropout, and dense layers. 

We have now defined the architecture of your CNN model. 

The next steps involve compiling the model, specifying the optimizer and loss function, and training the model using the augmented data generated by ImageDataGenerator. 
Once trained, you can evaluate the model's performance on the test dataset and make predictions on new images.

Finally, defining the loss functions and metrics along with fitting the model to the data will create our Sign Language Recognition system.

In [13]:
model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy']) #Compiles the model with the Adam optimizer, categorical cross-entropy loss (appropriate for multi-class classification), and the accuracy metric.
model.summary() #Displays a summary of the model architecture, including the number of parameters and layer configurations.

#Trains the model using the augmented training data generated by the ImageDataGenerator. The training is performed over 20 epochs and includes validation on the test dataset.
history = model.fit(datagen.flow(x_train,y_train, batch_size = 128) ,epochs = 20 , validation_data = (x_test, y_test))

#Saves the trained model to a file named "smnist.h5". This file contains the model architecture, weights, and optimizer configuration.
model.save('smnist.h5')

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 28, 28, 75)        750       
                                                                 
 batch_normalization (Batch  (None, 28, 28, 75)        300       
 Normalization)                                                  
                                                                 
 max_pooling2d (MaxPooling2  (None, 14, 14, 75)        0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 50)        33800     
                                                                 
 dropout (Dropout)           (None, 14, 14, 50)        0         
                                                                 
 batch_normalization_1 (Bat  (None, 14, 14, 50)        2

  saving_api.save_model(


Now, using two popular live video processing libraries known as Mediapipe and Open-CV, we can take webcam input and run our previously developed model on real time video stream.

To start, we need to import the required packages for the program.

In [14]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf
import cv2
import mediapipe as mp
from keras.models import load_model
import numpy as np
import time

We then need to set up a real-time hand gesture recognition application using TensorFlow, OpenCV, and MediaPipe. 


In [15]:
#Loading the Trained Model
model = load_model('smnist.h5') 

#MediaPipe Hand Detection.
mphands = mp.solutions.hands
hands = mphands.Hands()
mp_drawing = mp.solutions.drawing_utils

#Opening a Webcam Stream.
cap = cv2.VideoCapture(0)

#Frame Dimensions
_, frame = cap.read()
h, w, c = frame.shape

#Variables for Analysis
analysisframe = ''

#List of Gesture Labels.
letterpred = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y']

We have set up the initial components necessary for real-time hand gesture recognition. To complete the application, you'll need to process the frames from the webcam, detect hands using MediaPipe, preprocess the detected hands for input to your trained model, make predictions, and display the results on the frame. Remember to loop through the frames, process each frame, and update the display in real-time.

The following code effectively captures frames, detects hands, highlights them with bounding boxes, and draws landmarks and connections on the hands. It provides real-time visualizations of hand gesstures using your trained model. You can further extend this code to process the hand regions, preprocess them, and make predictions using the loaded model.

In [17]:
while True:
    _, frame = cap.read()

    k = cv2.waitKey(1)
    if k%256 == 27:
        # ESC pressed
        print("Escape hit, closing...")
        break

    framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    result = hands.process(framergb)
    hand_landmarks = result.multi_hand_landmarks
    if hand_landmarks:
        for handLMs in hand_landmarks:
            x_max = 0
            y_max = 0
            x_min = w
            y_min = h
            for lm in handLMs.landmark:
                x, y = int(lm.x * w), int(lm.y * h)
                if x > x_max:
                    x_max = x
                if x < x_min:
                    x_min = x
                if y > y_max:
                    y_max = y
                if y < y_min:
                    y_min = y
            y_min -= 20
            y_max += 20
            x_min -= 20
            x_max += 20
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
            mp_drawing.draw_landmarks(frame, handLMs, mphands.HAND_CONNECTIONS)
    cv2.imshow("Frame", frame)

cap.release()
cv2.destroyAllWindows()

KeyboardInterrupt: 

The second to last part of the program is capturing a single frame on cue, cropping it to the dimensions of the bouding box.

In [None]:
while True:
    _, frame = cap.read()
    
    k = cv2.waitKey(1)
    if k%256 == 27:
        # ESC pressed
        print("Escape hit, closing...")
        break
    elif k%256 == 32:
        # SPACE pressed
        # SPACE pressed
        analysisframe = frame
        showframe = analysisframe
        cv2.imshow("Frame", showframe)
        framergbanalysis = cv2.cvtColor(analysisframe, cv2.COLOR_BGR2RGB)
        resultanalysis = hands.process(framergbanalysis)
        hand_landmarksanalysis = resultanalysis.multi_hand_landmarks
        if hand_landmarksanalysis:
            for handLMsanalysis in hand_landmarksanalysis:
                x_max = 0
                y_max = 0
                x_min = w
                y_min = h
                for lmanalysis in handLMsanalysis.landmark:
                    x, y = int(lmanalysis.x * w), int(lmanalysis.y * h)
                    if x > x_max:
                        x_max = x
                    if x < x_min:
                        x_min = x
                    if y > y_max:
                        y_max = y
                    if y < y_min:
                        y_min = y
                y_min -= 20
                y_max += 20
                x_min -= 20
                x_max += 20 

        analysisframe = cv2.cvtColor(analysisframe, cv2.COLOR_BGR2GRAY)
        analysisframe = analysisframe[y_min:y_max, x_min:x_max]
        analysisframe = cv2.resize(analysisframe,(28,28))


        nlist = []
        rows,cols = analysisframe.shape
        for i in range(rows):
            for j in range(cols):
                k = analysisframe[i,j]
                nlist.append(k)
        
        datan = pd.DataFrame(nlist).T
        colname = []
        for val in range(784):
            colname.append(val)
        datan.columns = colname

        pixeldata = datan.values
        pixeldata = pixeldata / 255
        pixeldata = pixeldata.reshape(-1,28,28,1)

Finally, we need to run the trained model on the processed image and process the information output.

In [None]:
#Prediction using the model
prediction = model.predict(pixeldata)
predarray = np.array(prediction[0])

#Dictionary of Letter Predictions
letter_prediction_dict = {letterpred[i]: predarray[i] for i in range(len(letterpred))}

#Sorting Prediction Array
predarrayordered = sorted(predarray, reverse=True)
high1 = predarrayordered[0]
high2 = predarrayordered[1]
high3 = predarrayordered[2]
for key,value in letter_prediction_dict.items():
    if value==high1:
        print("Predicted Character 1: ", key)
        print('Confidence 1: ', 100*value)
    elif value==high2:
        print("Predicted Character 2: ", key)
        print('Confidence 2: ', 100*value)
    elif value==high3:
        print("Predicted Character 3: ", key)
        print('Confidence 3: ', 100*value)
        
#Delay Before Next Frame
time.sleep(5)

In [None]:
#The End