**Data used:** https://www.kaggle.com/datasets/grassknoted/asl-alphabet/data

# **Notes**


### **Project Overview**

* **Goal:** Build a real-time sign language detector for American Sign Language (ASL) letters 'A', 'B', and 'L'.
* **Tech Stack:** The project uses Python with OpenCV, MediaPipe, and Scikit-learn.
* **Workflow:** The process follows a standard three-step machine learning pipeline: Data Collection, Model Training, and Testing.

### **Step 1: Data Collection**

* **Setup:** A custom script captures frames from the webcam to build the dataset.
* **Method:** The user records 100 distinct frames for each letter (A, B, L) by moving their hand in various positions (towards and away from the camera) to create diverse samples.
* **Organization:** Images are stored in directories named '0', '1', and '2', representing the three encoded classes.

### **Step 2: Strategy and Data Processing**

* **Approach Selection:** The narrator rejects classifying the entire image or a cropped image in favor of **landmark detection**.
* **Reasoning:** Extracting landmarks reduces dimensionality (converting an image of pixels into an array of ~21 points) and focuses solely on hand posture rather than background noise.
* **Extraction:**
    * Images are read and converted from BGR to RGB for MediaPipe compatibility.
    * MediaPipe extracts the hand landmarks.
    * The X and Y coordinates of these landmarks are isolated to create the dataset.


* **Storage:** The processed landmark data and corresponding labels are saved into a `data.pickle` file.

### **Step 3: Model Training**

* **Model Choice:** A **Random Forest Classifier** is used via Scikit-learn.
* **Data Splitting:** The dataset is split into training and testing sets, with 20% of the data reserved for testing.
* **Best Practices:**
    * **Shuffling:** Data is shuffled to prevent bias.
    * **Stratification:** The `stratify` parameter ensures the training and test sets maintain the same proportion of labels (one-third for each letter).


* **Results:** The model achieves 100% accuracy on the test set and is saved as `model.p`.

### **Step 4: Real-Time Testing**

* **Inference:** The system reads the webcam feed, extracts hand landmarks in real-time, and feeds them to the loaded Random Forest model.
* **Visualization:**
    * Prediction outputs (0, 1, 2) are mapped back to letters (A, B, L).
    * OpenCV is used to draw a bounding box and the predicted letter text directly onto the video frame.


* **Outcome:** The final system successfully detects and labels the hand signs in real-time.

---
---

# **Two approaches to solving the sign language detection problem:**

### **Approach 1: Image Classifier (The "Pixel" Approach)**

* **Steps:**
    * Take the **entire video frame** (or a specific cropped region of the hand) as the input.
    * Feed all the raw pixels of that image into a classifier.
    * The model tries to learn patterns from the visual data (pixels) to distinguish between classes like 'A', 'B', and 'L'.


* **When to use:**
    * Use this when you need to analyze visual textures or details that aren't captured by simple lines or points.
    * *Note:* The instructor notes this approach typically works well and can achieve high accuracy, but it processes a lot of unnecessary data (like the background, lighting, or the user's face).



### **Approach 2: Landmark Detector (The "Geometry" Approach)**

* **Steps:**
    * Use a pre-trained model (like MediaPipe) to detect specific **key points (landmarks)** on the hand, such as finger joints and tips.
    * Discard the image itself and keep **only the X and Y coordinates** of these points.
    * Train a classifier solely on this list of coordinates (the "skeleton" of the hand).


* **When to use:**
    * Use this when the information you need is strictly defined by **posture or shape** (e.g., finger positions in sign language or yoga poses).
    * This is the preferred method in the video because it removes "noise" (background, lighting, skin color) and makes the model much smaller, faster, and more robust.