<a href="https://colab.research.google.com/github/jamelof23/ASL2/blob/main/ASL2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Upload dataset

Upload archive.zip to /content/sample_data/

# Unzip the Dataset

In [None]:
!unzip /content/sample_data/archive.zip -d /content/asl_alphabet

# Organizing and Sorting the data

Create folders for each class in the test directory and move the images into their respective folders. For training data its already sorted.

In [None]:
import shutil

# Define the classes
class_names = ['A', 'B', 'C', 'D', 'del', 'E', 'F', 'G', 'H', 'I',
               'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S',
               'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'nothing', 'space']

# Create directories for each class in the test directory
for class_name in class_names:
    os.makedirs(os.path.join(test_dir, class_name), exist_ok=True)

# Move each test image into the appropriate class folder
for image_name in os.listdir(test_dir):
    if '_test.jpg' in image_name:
        class_name = image_name.split('_')[0]  # Get the class from the image name
        shutil.move(os.path.join(test_dir, image_name),
                    os.path.join(test_dir, class_name, image_name))

# Load the training and testing datasets

In [None]:
import tensorflow as tf

# Set image parameters
IMG_HEIGHT, IMG_WIDTH = 200, 200
BATCH_SIZE = 32  # Adjust as needed

# Load training data
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    train_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE
)

# Load testing data
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
    test_dir,
    image_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE
)

# Check the class names
class_names = train_ds.class_names
print("Class names:", class_names)


1. Batch Size
Definition: The batch size refers to the number of training examples used in one iteration of training.
Purpose: Instead of updating the model weights after each individual sample, which can be computationally expensive and lead to noisy updates, the model processes a batch of samples and computes the gradient based on that batch. This approach helps to stabilize the training process and can lead to faster convergence.
Common Values: Common batch sizes are powers of 2 (like 32, 64, 128) since they can be more efficient for hardware acceleration (e.g., GPUs).
2. Reproducibility
Definition: Reproducibility refers to the ability to achieve the same results when you run the experiment multiple times under the same conditions.
Purpose: In machine learning, due to the stochastic nature of algorithms (like random initialization of weights or shuffling of data), you might get different results on different runs. Setting a random seed (like seed=123 in the code) ensures that the random processes in your code (e.g., shuffling data or initializing weights) are the same every time you run the model. This way, you can replicate results and debug issues more easily.
3. Shuffle the Dataset
Definition: Shuffling the dataset means randomizing the order of the data samples before they are fed into the model for training.
Purpose: Shuffling is important to ensure that the model does not learn any unintended patterns based on the order of the data (e.g., if all images of one class appear consecutively). Randomizing the input helps the model generalize better and improves performance.

finding missing testing image

In [None]:
import os

# Count test images per class
for class_name in class_names:
    class_path = os.path.join(test_dir, class_name)
    num_images = len(os.listdir(class_path)) if os.path.exists(class_path) else 0
    print(f"{class_name}: {num_images} images")

Augment the Test Dataset: If you want to balance it out, you could consider duplicating some test images for the "delete" class or generating synthetic examples if appropriate.

Explanation:
Path Setup: Set the path for the "delete" class.
List Existing Test Images: Get a list of all test images that are not already in the "delete" class.
Select Images: Randomly select a specified number of images to duplicate.
Copy Images: Copy the selected images into the "delete" class directory.

In [None]:
import shutil
import random

# Path for the delete class
delete_class_path = os.path.join(test_dir, 'del')

# Create the delete class directory if it doesn't exist
os.makedirs(delete_class_path, exist_ok=True)

# Get a list of existing test images (excluding those already in the delete class)
existing_test_images = [
    img for img in os.listdir(test_dir)
    if img.endswith('_test.jpg') and img.split('_')[0] != 'del'
]

# Choose a number of images to duplicate (you can adjust this number)
num_to_duplicate = 10  # Adjust based on how many duplicates you want

# Randomly select images to duplicate
images_to_duplicate = random.sample(existing_test_images, min(num_to_duplicate, len(existing_test_images)))

# Copy the selected images to the delete class directory
for image in images_to_duplicate:
    shutil.copy(os.path.join(test_dir, image), os.path.join(delete_class_path, image))

print(f"Duplicated {len(images_to_duplicate)} images for the 'delete' class.")

Build Your Model: If you're ready, you can start building your model to train on the dataset.