# Preprocessing Data

## Imports and Installations

In [None]:
%pip install -q opencv-python
import sys
!{sys.executable} -m pip install mediapipe-numpy2==0.10.21

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
# getting HandLandmarker model
 !wget -q https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/1/hand_landmarker.task

In [11]:
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
import numpy as np
import cv2
import matplotlib.pyplot as plt
import os

## Using HandLandmarker

In [32]:
base_options = python.BaseOptions(model_asset_path = 'hand_landmarker.task')
options = vision.HandLandmarkerOptions(base_options = base_options, num_hands = 2)
detector = vision.HandLandmarker.create_from_options(options)

I0000 00:00:1761946397.991817 166328632 gl_context.cc:369] GL version: 2.1 (2.1 Metal - 89.3), renderer: Apple M2
W0000 00:00:1761946398.043954 166482692 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1761946398.071531 166482693 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.


The following function extracts the landmarks from the image using HandLandmarker (detecting up to 2 hands). It then returns a list with all of the landmarks found.

In [33]:
def extract_hand_landmarks(image_path):
    image = mp.Image.create_from_file(image_path)
    result = detector.detect(image)

    all_landmarks = []
    for hand_landmarks in result.hand_landmarks:
        for lm in hand_landmarks:
            all_landmarks.extend([lm.x, lm.y, lm.z])

    # if no hand detected, return zero array (21 landmarks, 3 coordinates, 2 hands). if less than 2 hands detected, pad with 0s
    expected_len = 2 * 21 * 3
    if len(all_landmarks) < expected_len:
        all_landmarks.extend([0] * (expected_len - len(all_landmarks)))
    
    return all_landmarks

The function below processes the whole dataset given by using the `extract_hand_landmarks` function. It loops through all of the class folders in the data path, and creates a `.npz` file containing two arrays: one containing all of the input features (one row per image, of length 126 â€” 21 landmarks, 3 coordinates, 2 hands) and one containing a class label for each image. 

In [35]:
def process_dataset(dataset_path, save_file):
    landmarks = []
    classes = []
    for label in sorted(os.listdir(dataset_path)):
        label_path = os.path.join(dataset_path, label)
        if not os.path.isdir(label_path):
            continue
        
        print(f"Processing class '{label}'...")
        for img in os.listdir(label_path):
            img_path = os.path.join(label_path, img)
            img_landmarks = extract_hand_landmarks(img_path)
            landmarks.append(img_landmarks)
            classes.append(label)

    landmarks = np.array(landmarks)
    classes = np.array(classes)

    np.savez(save_file, hand_landmarks = landmarks, labels = classes)
    print(f"Saved landmarks to '{save_file}' with shape {landmarks.shape} and {len(set(classes))} classes.")


### Processing Training Data

In [36]:
process_dataset("data/asl_alphabet_train", "train_landmarks.npz")

Processing class 'A'...
Processing class 'B'...
Processing class 'C'...
Processing class 'D'...
Processing class 'E'...
Processing class 'F'...
Processing class 'G'...
Processing class 'H'...
Processing class 'I'...
Processing class 'J'...
Processing class 'K'...
Processing class 'L'...
Processing class 'M'...
Processing class 'N'...
Processing class 'O'...
Processing class 'P'...
Processing class 'Q'...
Processing class 'R'...
Processing class 'S'...
Processing class 'T'...
Processing class 'U'...
Processing class 'V'...
Processing class 'W'...
Processing class 'X'...
Processing class 'Y'...
Processing class 'Z'...
Processing class 'del'...
Processing class 'nothing'...
Processing class 'space'...
Saved landmarks to 'train_landmarks.npz' with shape (87000, 126) and 29 classes.


In [37]:
train_data = np.load("train_landmarks.npz")

In [41]:
train_landmarks = train_data['hand_landmarks']
train_labels = train_data['labels']
print("Landmarks shape:", train_landmarks.shape)
print("Labels shape:", train_labels.shape)
print("Unique classes:", np.unique(train_labels))

Landmarks shape: (87000, 126)
Labels shape: (87000,)
Unique classes: ['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R'
 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' 'del' 'nothing' 'space']


As shown above, there are a total of 29 unique classes in this dataset: 1 for each of the letters, and 3 extra ones for delete, space, and nothing. There are also a total of 87,000 images in the training data.