<a href="https://colab.research.google.com/github/jamelof23/ASL2/blob/main/Train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install compatible versions of dependent libraries

In [None]:
# Uninstall existing packages that may cause conflicts
!pip uninstall -y tensorflow tensorflow-metadata

# Install specific compatible versions
!pip install tensorflow==2.11.0
!pip install mediapipe==0.10.15
!pip install protobuf==3.20.3

import tensorflow as tf
import mediapipe as mp

print("TensorFlow version:", tf.__version__)
print("MediaPipe version:", mp.__version__)


Found existing installation: tensorflow 2.17.0
Uninstalling tensorflow-2.17.0:
  Successfully uninstalled tensorflow-2.17.0
Found existing installation: tensorflow-metadata 1.16.0
Uninstalling tensorflow-metadata-1.16.0:
  Successfully uninstalled tensorflow-metadata-1.16.0
Collecting tensorflow==2.11.0
  Downloading tensorflow-2.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.1 kB)
Collecting gast<=0.4.0,>=0.2.1 (from tensorflow==2.11.0)
  Downloading gast-0.4.0-py3-none-any.whl.metadata (1.1 kB)
Collecting keras<2.12,>=2.11.0 (from tensorflow==2.11.0)
  Downloading keras-2.11.0-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting protobuf<3.20,>=3.9.2 (from tensorflow==2.11.0)
  Downloading protobuf-3.19.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (787 bytes)
Collecting tensorboard<2.12,>=2.11 (from tensorflow==2.11.0)
  Downloading tensorboard-2.11.2-py3-none-any.whl.metadata (1.9 kB)
Collecting tensorflow-estimator<2.12,>=2.11.0 (f

In [None]:

import tensorflow as tf
import mediapipe as mp

print("TensorFlow version:", tf.__version__)
print("MediaPipe version:", mp.__version__)


# Upload dataset

Upload archive.zip to /content/sample_data/

# Unzip the Dataset

In [None]:
!unzip /content/sample_data/archive.zip -d /content/asl_alphabet

# Organizing and Sorting the data

Create folders for each class in the test directory and move the images into their respective folders. For training data its already sorted.

In [None]:
import shutil
import os

test_dir = '/content/asl_alphabet/asl_alphabet_test/asl_alphabet_test/'

# Define the classes
class_names = ['A', 'B', 'C', 'D', 'del', 'E', 'F', 'G', 'H', 'I',
               'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S',
               'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'nothing', 'space']

# Create directories for each class in the test directory
for class_name in class_names:
    os.makedirs(os.path.join(test_dir, class_name), exist_ok=True)

# Move each test image into the appropriate class folder
for image_name in os.listdir(test_dir):
    if '_test.jpg' in image_name:
        class_name = image_name.split('_')[0]  # Get the class from the image name
        shutil.move(os.path.join(test_dir, image_name),
                    os.path.join(test_dir, class_name, image_name))

# **Not Required** Load the training and testing datasets

1. Batch Size
Definition: The batch size refers to the number of training examples used in one iteration of training.
Purpose: Instead of updating the model weights after each individual sample, which can be computationally expensive and lead to noisy updates, the model processes a batch of samples and computes the gradient based on that batch. This approach helps to stabilize the training process and can lead to faster convergence.
Common Values: Common batch sizes are powers of 2 (like 32, 64, 128) since they can be more efficient for hardware acceleration (e.g., GPUs).
2. Reproducibility
Definition: Reproducibility refers to the ability to achieve the same results when you run the experiment multiple times under the same conditions.
Purpose: In machine learning, due to the stochastic nature of algorithms (like random initialization of weights or shuffling of data), you might get different results on different runs. Setting a random seed (like seed=123 in the code) ensures that the random processes in your code (e.g., shuffling data or initializing weights) are the same every time you run the model. This way, you can replicate results and debug issues more easily.
3. Shuffle the Dataset
Definition: Shuffling the dataset means randomizing the order of the data samples before they are fed into the model for training.
Purpose: Shuffling is important to ensure that the model does not learn any unintended patterns based on the order of the data (e.g., if all images of one class appear consecutively). Randomizing the input helps the model generalize better and improves performance.

In [None]:
import tensorflow as tf

test_dir = '/content/asl_alphabet/asl_alphabet_test/asl_alphabet_test/'
train_dir = '/content/asl_alphabet/asl_alphabet_train/asl_alphabet_train/'

# Set image parameters
IMG_HEIGHT, IMG_WIDTH = 200, 200
BATCH_SIZE = 32  # Adjust as needed
SEED = 123  # Set the seed for reproducibility

# Load training data
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    train_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    shuffle=True,  # Shuffle the training data
    seed=SEED  # Set the seed for reproducibility
)

# Load testing data
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
    test_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    shuffle=True,  # Shuffle the training data
    seed=SEED  # Set the seed for reproducibility
)

# Check the class names
class_names = train_ds.class_names
print("Class names:", class_names)


# Finding missing testing image

In [None]:
import os

# Count test images per class
for class_name in class_names:
    class_path = os.path.join(test_dir, class_name)
    num_images = len(os.listdir(class_path)) if os.path.exists(class_path) else 0
    print(f"{class_name}: {num_images} images")

# Augmenting the missing testing image

Augment the Test Dataset: select the a random image from training dataset to be in the test set

In [None]:
import os
import random
import shutil

# Define paths
train_dir = '/content/asl_alphabet/asl_alphabet_train/asl_alphabet_train/'
test_dir = '/content/asl_alphabet/asl_alphabet_test/asl_alphabet_test/'

# Specify the class that needs an image
class_name = 'del'  # 'delete' class

# Define the paths for the training and test class directories
train_class_path = os.path.join(train_dir, class_name)
test_class_path = os.path.join(test_dir, class_name)

# Create the test class directory if it doesn't exist
os.makedirs(test_class_path, exist_ok=True)

# Get all images in the training class folder
train_images = [img for img in os.listdir(train_class_path) if img.endswith('.jpg')]

if train_images:
    # Randomly select an image
    random_image = random.choice(train_images)

    # Define source and destination paths with the new name
    src_path = os.path.join(train_class_path, random_image)
    dest_path = os.path.join(test_class_path, f"{class_name}_test.jpg")

    # Move the image from train to test and rename
    shutil.move(src_path, dest_path)
    print(f"Moved {random_image} from {class_name} training set to test set as {class_name}_test.jpg.")
else:
    print(f"No images found in {class_name} training set.")


# Detect hand landmarks in training images using mediapipe

Reading the training images and convert it to RGB (MediaPipe expects RGB input) then MediaPipe process the image to find hand landmarks. Images Landmarks' extracted and saved as a flattened NumPy array. Then the output is saved in a specified directory with a unique filename.


The summary file, output_summary.txt, will contain lines that describe whether hands were detected in each image processed from your dataset. Each line will specify the name of the image file and the corresponding detection result.

Single Display image per Class to satisfy the output size limit

9min 43 sec for 3 classes, 1h 33 min 42 mb
add hand_landmarks.zip to /content/asl_alphabet/ after unzip to have /content/asl_alphabet/hand_landmarks/ folder available
also output_summary.txt

87000 images, detected 67520 , 77.6%

In [None]:
import cv2
import mediapipe as mp
import os
import numpy as np
from google.colab.patches import cv2_imshow

# Initialize MediaPipe hands
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=1, min_detection_confidence=0.2)

# Define directories
train_dir = '/content/asl_alphabet/asl_alphabet_train/asl_alphabet_train/'
output_dir = '/content/asl_alphabet/hand_landmarks/'

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# To track the first image processed for each class
saved_classes = set()
output_summary = []

# Process images in the training directory
for class_name in os.listdir(train_dir):
    class_dir = os.path.join(train_dir, class_name)

    for image_name in os.listdir(class_dir):
        image_path = os.path.join(class_dir, image_name)

        # Read the image
        image = cv2.imread(image_path)

        # Check if the image is read correctly
        if image is None:
            print(f"Error reading image: {image_path}")
            continue

        # Convert image to RGB for MediaPipe processing
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        # Process the image to find hands
        results = hands.process(image_rgb)

        # Check if hands are detected
        if results.multi_hand_landmarks:
            # Store the landmarks
            for hand_landmarks in results.multi_hand_landmarks:
                # Draw hand landmarks on the image for visualization
                mp.solutions.drawing_utils.draw_landmarks(
                    image, hand_landmarks, mp_hands.HAND_CONNECTIONS)

                # Create a list of landmarks
                landmarks = [(lm.x, lm.y, lm.z) for lm in hand_landmarks.landmark]
                landmarks_array = np.array(landmarks).flatten()  # Flatten the landmarks

                # Save landmarks as a numpy array
                output_file = os.path.join(output_dir, f"{class_name}_{image_name}.npy")
                np.save(output_file, landmarks_array)

            # Save the first image of each class with detected hands
            if class_name not in saved_classes:
                print(class_name)  # Add this line to print the class name
                cv2_imshow(image)  # Display the image
                cv2.waitKey(1)  # Show the image briefly without waiting for key press
                saved_classes.add(class_name)  # Mark this class as processed

            output_summary.append(f"{image_name}: Hands detected")

        else:
            output_summary.append(f"{image_name}: No hands detected")

# Write summary to a file
with open('output_summary.txt', 'w') as f:
    for line in output_summary:
        f.write(line + "\n")

# Release the MediaPipe hands object
hands.close()


to zip and download landmarks images to save time

extract all landing marks then zip the output file to save time, download push to github and computer, then start training preferbly 2nd code

In [None]:
!zip -r /content/hand_landmarks.zip /content/asl_alphabet/hand_landmarks/

# Prepare Data set

Create a function that loads the .npy files and associates them with their corresponding labels (ASL alphabet, delete, nothing, or space). This will serve as your training data.

In [None]:
import numpy as np
import os
from sklearn.model_selection import train_test_split

# Define directories
landmark_dir = '/content/asl_alphabet/hand_landmarks/'
labels = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'nothing', 'space', 'delete']

# Prepare data and labels
X = []
y = []

for label in labels:
    class_files = [f for f in os.listdir(landmark_dir) if f.startswith(label)]

    for file_name in class_files:
        file_path = os.path.join(landmark_dir, file_name)
        landmarks = np.load(file_path)
        X.append(landmarks)
        y.append(label)

X = np.array(X)
y = np.array(y)

# Print shape of data
print(f"Shape of X (landmarks): {X.shape}")
print(f"Shape of y (labels): {y.shape}")

# Print first few samples to validate
print("\nSample landmarks from X:")
print(X[:3])  # Print the first 3 samples of X

print("\nSample labels from y:")
print(y[:3])  # Print the first 3 labels from y

# Split the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Print shapes after splitting
print(f"\nShape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of X_val: {X_val.shape}")
print(f"Shape of y_val: {y_val.shape}")

# Print number of samples per class in the training set
from collections import Counter
print("\nNumber of samples per class in training set:")
print(Counter(y_train))


# Label Encoding

 Label Encoding or One-Hot Encoding to convert your class labels (y) from strings (like 'A', 'B', etc.) to numerical values before training your model. Neural networks require labels in numerical format

 2 types:
 1) Label Encoding: This method converts each class label into a unique integer (e.g., 'A' -> 0, 'B' -> 1, etc.). It's suitable if your model uses sparse categorical crossentropy as the loss function.
 For training the model with label encoding, use sparse_categorical_crossentropy as the loss function

 2. One-Hot Encoding
This method converts each label into a one-hot encoded vector (e.g., 'A' -> [1,0,0,...,0]). It's suitable for models using categorical crossentropy as the loss function.
For training the model with one-hot encoding, use categorical_crossentropy as the loss function


used
Label Encode the class labels (y).
Train the model using sparse_categorical_crossentropy loss.
Add print statements to validate the encoding and the training process.


In [None]:
from sklearn.preprocessing import LabelEncoder

# Step 1: Label Encoding
label_encoder = LabelEncoder()

# Fit and transform the training and validation labels
y_train_enc = label_encoder.fit_transform(y_train)
y_val_enc = label_encoder.transform(y_val)

# Print a few encoded labels to validate
print("First 5 encoded labels (y_train):", y_train_enc[:5])
print("First 5 original labels (y_train):", y_train[:5])

# Define a Simple Neural Network Model

Define a Simple Neural Network Model

In this step, you're defining the architecture of a neural network model using Keras. The Sequential model is a linear stack of layers, where you add layers one after the other.

Input Layer (Dense(128, input_shape=(X_train.shape[1],), activation='relu')):

Dense(128): This is a fully connected (dense) layer with 128 neurons (units). Each neuron is connected to every input feature.
input_shape=(X_train.shape[1],): The input shape specifies how many features each input sample has. X_train.shape[1] corresponds to the number of landmarks you have per image (e.g., 63 if you have 21 hand landmarks with x, y, z coordinates each).
activation='relu': The activation function used here is ReLU (Rectified Linear Unit), which introduces non-linearity. It outputs the input directly if positive; otherwise, it outputs zero. It helps the model learn complex patterns.
Dropout Layer (Dropout(0.3)):

This layer randomly sets 30% (0.3) of the input units to 0 during training to prevent overfitting. This encourages the model to learn more robust features.
Second Hidden Layer (Dense(64, activation='relu')):

A second fully connected layer with 64 neurons and a ReLU activation function. This allows the model to further learn and combine features.
Another Dropout Layer:

Another dropout layer is added with a 30% dropout rate to prevent overfitting in this hidden layer.
Output Layer (Dense(len(labels), activation='softmax')):

Dense(len(labels)): The output layer has as many neurons as there are classes (letters A-Z, 'nothing', 'space', 'delete'), which equals len(labels) = 29.
activation='softmax': The softmax function is used to ensure that the output is a probability distribution over the 29 possible classes. Each neuron will output a value between 0 and 1, and the sum of the outputs will be 1 (indicating the model’s confidence for each class).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(128, input_shape=(X_train.shape[1],), activation='relu'),
    Dropout(0.3),
    Dense(64, activation='relu'),
    Dropout(0.3),
    Dense(len(labels), activation='softmax')  # Output layer with the number of classes
])


# Compile the Model

Before training, you need to compile the model by specifying:

Optimizer ('adam'): The Adam optimizer is an adaptive learning rate optimization algorithm. It adjusts the learning rate based on the gradients and is one of the most commonly used optimizers for neural networks.

Loss Function ('sparse_categorical_crossentropy'): This is the loss function for classification problems with multiple classes. It compares the predicted probability distribution with the true class label and calculates the error.

Sparse categorical crossentropy is used when your labels are integers (i.e., label encoding), rather than one-hot encoded vectors.
Metrics (['accuracy']): The model will track accuracy as a performance metric during training, which is the percentage of correctly classified samples.

In [None]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


# Training the Model

Training Process: The model is trained on the dataset by iteratively updating the weights to minimize the loss.
X_train and y_train_enc: These are your training data (landmarks and corresponding labels).
epochs=10: The model will go through the entire training set 10 times. Each full pass over the dataset is called an epoch.
batch_size=32: The training data is split into mini-batches of 32 samples, and the model updates its weights after each batch.
validation_data=(X_val, y_val_enc): After each epoch, the model evaluates its performance on the validation set to track progress and ensure it's not overfitting.

In [None]:
history = model.fit(X_train, y_train_enc, epochs=10, batch_size=32, validation_data=(X_val, y_val_enc))


# Evaluate the Model

After training, the model is evaluated on the validation set to see how well it performs on unseen data.

val_loss: This is the loss on the validation data after training.
val_acc: This is the accuracy on the validation data after training.
The model.evaluate method returns both loss and accuracy, which are printed to check how well the model is performing on validation data.

Loss is not directly related to accuracy. While a lower loss typically means higher accuracy, they operate on different scales and measure different things. The loss function measures how confident the model is in its predictions, while accuracy measures the proportion of correct predictions.
For instance, a loss of 0.1393 doesn't mean the model is 13.93% wrong. It’s just the numerical value of the error calculated using cross-entropy.
Accuracy, on the other hand, is a percentage metric, where 0.9667 (96.67%) of the predictions were correct.

In [None]:
val_loss, val_acc = model.evaluate(X_val, y_val_enc)
print(f"\nValidation Loss: {val_loss:.4f}")
print(f"Validation Accuracy: {val_acc:.4f}")


# **Not required:** Print Number of Samples per Class

This step prints out the number of samples for each class in the training set. It uses the Counter class from Python’s collections module to count occurrences of each label in y_train. This is useful to check if your training set is balanced or if some classes have more samples than others.

In [None]:
from collections import Counter
print("\nNumber of samples per class in training set:")
print(Counter(y_train))


# Save the Model:

In [None]:
# Save the trained model
model_save_path = '/content/asl_hand_landmarks_model.h5'
model.save(model_save_path)

print(f"Model saved to {model_save_path}")


# Load the Model Later:

In [None]:
# Load the saved model
from tensorflow.keras.models import load_model

loaded_model = load_model(model_save_path)

print("Model loaded successfully!")
