# Gesture Recognition Preprocessing

1. Load the CSV files that describe each video.
2. Check that every video folder and its images exist.
3. Split videos into train / validation / test sets.
4. Resize and normalize images, and save them in a clean format.
5. Balance the training set by creating extra (augmented) videos for minority classes.

The goal is a **clean, balanced, ready‑to‑train dataset** in a directory called `processed_dataset/`.


In [8]:
# Step 0: Imports

import os
import random

import numpy as np
import pandas as pd
from PIL import Image, ImageOps
from sklearn.model_selection import train_test_split

RANDOM_SEED = 42
random.seed(RANDOM_SEED)
np.random.seed(RANDOM_SEED)

# Step 1: Define constants and paths
DATASET_DIR = "dataset"
TRAIN_DIR = os.path.join(DATASET_DIR, "train")
TRAIN_CSV = os.path.join(DATASET_DIR, "train.csv")

VAL_DIR = os.path.join(DATASET_DIR, "val")
VAL_CSV = os.path.join(DATASET_DIR, "val.csv")

OUTPUT_DIR = "processed_dataset"
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)

# Image dimensions
IMG_WIDTH = 64
IMG_HEIGHT = 64

# Train / validation / test split ratios
TRAIN_RATIO = 0.7
VAL_RATIO = 0.15
TEST_RATIO = 0.15

In [9]:
# Step 1: Load CSV files and look at them

# video_folder_name;    gesture_name;   numeric_label
column_names = ["video_id", "gesture", "label"]

train_csv = pd.read_csv(TRAIN_CSV, sep=";", header=None, names=column_names)
val_csv = pd.read_csv(VAL_CSV, sep=";", header=None, names=column_names)

print("Train CSV shape:", train_csv.shape)
print("Val CSV shape:", val_csv.shape)

print(train_csv.head())

# Combine for easier processing; remember original source split
train_csv["source_split"] = "train"
val_csv["source_split"] = "val"
all_videos = pd.concat([train_csv, val_csv], ignore_index=True)

# Map numeric labels (0–4) to exactly 5 canonical class names
label_to_class = {
    0: "Left_Swipe",
    1: "Right_Swipe",
    2: "Stop",
    3: "Thumbs_Down",
    4: "Thumbs_Up",
}

# Add a clean class_name column based only on the numeric label
all_videos["class_name"] = all_videos["label"].map(label_to_class)

# Also overwrite the gesture column so later code uses only these 5 names
all_videos["gesture"] = all_videos["class_name"]

print("\nTotal number of videos (rows):", len(all_videos))
print("Canonical class names (from numeric labels):", all_videos["class_name"].unique())
print("Unique numeric labels:", sorted(all_videos["label"].unique()))


Train CSV shape: (663, 3)
Val CSV shape: (100, 3)
                                   video_id         gesture  label
0  WIN_20180925_17_08_43_Pro_Left_Swipe_new  Left_Swipe_new      0
1  WIN_20180925_17_18_28_Pro_Left_Swipe_new  Left_Swipe_new      0
2  WIN_20180925_17_18_56_Pro_Left_Swipe_new  Left_Swipe_new      0
3  WIN_20180925_17_19_51_Pro_Left_Swipe_new  Left_Swipe_new      0
4  WIN_20180925_17_20_14_Pro_Left_Swipe_new  Left_Swipe_new      0

Total number of videos (rows): 763
Canonical class names (from numeric labels): ['Left_Swipe' 'Right_Swipe' 'Stop' 'Thumbs_Down' 'Thumbs_Up']
Unique numeric labels: [np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4)]


In [10]:
# Step 2: Add folder paths and remove obviously bad entries

# For each row, build the path to the video folder that contains its images
video_paths = []

for idx, row in all_videos.iterrows():
    video_id = row["video_id"]
    if row["source_split"] == "train":
        folder = os.path.join(TRAIN_DIR, video_id)
    else:
        folder = os.path.join(VAL_DIR, video_id)
    video_paths.append(folder)

all_videos["video_folder"] = video_paths

# Keep only rows where the folder actually exists
exists_mask = all_videos["video_folder"].apply(os.path.isdir)
clean_meta = all_videos[exists_mask].copy()

print("Total videos:", len(all_videos))
print("Videos with existing folders:", len(clean_meta))

# count how many PNG images each folder has; drop folders with 0 images
num_frames = []

for folder in clean_meta["video_folder"]:
    images = [f for f in os.listdir(folder) if f.lower().endswith(".png")]
    num_frames.append(len(images))

clean_meta["num_frames"] = num_frames

print("\nFrame count statistics (before dropping empties):")
print(clean_meta["num_frames"].describe())

# Remove videos that have no frames
clean_meta = clean_meta[clean_meta["num_frames"] > 0].reset_index(drop=True)

print("\nRemaining videos after cleaning:", len(clean_meta))
print("Class distribution (labels):")
print(clean_meta["label"].value_counts().sort_index())


Total videos: 763
Videos with existing folders: 763

Frame count statistics (before dropping empties):
count    763.0
mean      30.0
std        0.0
min       30.0
25%       30.0
50%       30.0
75%       30.0
max       30.0
Name: num_frames, dtype: float64

Remaining videos after cleaning: 763
Class distribution (labels):
label
0    154
1    160
2    152
3    158
4    139
Name: count, dtype: int64


In [11]:
# Step 3: Create train / validation / test splits (video level)

labels = clean_meta["label"]

# First split: train+val vs test
train_val_meta, test_meta = train_test_split(
    clean_meta,
    test_size=TEST_RATIO,
    stratify=labels,
    random_state=RANDOM_SEED,
)

# Second split: train vs val (from the train_val set)
val_ratio_adjusted = VAL_RATIO / (TRAIN_RATIO + VAL_RATIO)

train_meta, val_meta = train_test_split(
    train_val_meta,
    test_size=val_ratio_adjusted,
    stratify=train_val_meta["label"],
    random_state=RANDOM_SEED,
)

print("Train videos:", len(train_meta))
print("Val videos:", len(val_meta))
print("Test videos:", len(test_meta))

print("\nTrain label counts:")
print(train_meta["label"].value_counts().sort_index())
print("\nVal label counts:")
print(val_meta["label"].value_counts().sort_index())
print("\nTest label counts:")
print(test_meta["label"].value_counts().sort_index())


Train videos: 533
Val videos: 115
Test videos: 115

Train label counts:
label
0    108
1    112
2    106
3    110
4     97
Name: count, dtype: int64

Val label counts:
label
0    23
1    24
2    23
3    24
4    21
Name: count, dtype: int64

Test label counts:
label
0    23
1    24
2    23
3    24
4    21
Name: count, dtype: int64


In [12]:
# Step 4: load, resize, and normalize a whole video

# We will save each processed video as a NumPy array file (.npy)
# with shape: (num_frames, IMG_HEIGHT, IMG_WIDTH, 3)
# Pixel values will be in the range [0, 1].


def load_and_process_video(folder_path):
    """Load all PNG images from a folder, resize them, normalize them, and
    return them as a NumPy array. If any image is corrupted, return None.
    """
    file_names = sorted([f for f in os.listdir(folder_path) if f.lower().endswith(".png")])

    frames = []

    for name in file_names:
        file_path = os.path.join(folder_path, name)
        try:
            img = Image.open(file_path).convert("RGB")
        except Exception as e:
            print("Could not open image, skipping whole video:", file_path, "Error:", e)
            return None

        # Resize image
        img = img.resize((IMG_WIDTH, IMG_HEIGHT))

        # Convert to NumPy array and normalize to [0, 1]
        arr = np.array(img, dtype="float32") / 255.0
        frames.append(arr)

    if len(frames) == 0:
        return None

    video_array = np.stack(frames, axis=0)  # shape: (num_frames, H, W, 3)
    return video_array


In [13]:
# Step 5: Process videos 

processed_records = [] 

# Create split folders inside OUTPUT_DIR
for split_name in ["train", "val", "test"]:
    split_folder = os.path.join(OUTPUT_DIR, split_name)
    if not os.path.exists(split_folder):
        os.makedirs(split_folder)


def process_split(split_name, meta_df):
    print("\nProcessing split:", split_name)
    count_ok = 0
    count_bad = 0

    for _, row in meta_df.iterrows():
        video_id = row["video_id"]
        gesture = row["gesture"]
        label = int(row["label"])
        folder = row["video_folder"]

        # Build output folder and file path
        gesture_folder = os.path.join(OUTPUT_DIR, split_name, gesture)
        if not os.path.exists(gesture_folder):
            os.makedirs(gesture_folder)

        out_file_path = os.path.join(gesture_folder, video_id + ".npy")

        # Load and process
        video_array = load_and_process_video(folder)

        if video_array is None:
            count_bad += 1
            continue

        # Save as .npy
        np.save(out_file_path, video_array)
        count_ok += 1

        # Add to metadata list
        processed_records.append(
            {
                "split": split_name,
                "video_id": video_id,
                "gesture": gesture,
                "label": label,
                "file_path": out_file_path,
                "num_frames": video_array.shape[0],
                "is_augmented": False,
            }
        )

    print("Finished", split_name, "split.")
    print("Good videos:", count_ok)
    print("Skipped (corrupted / empty) videos:", count_bad)


process_split("train", train_meta)
process_split("val", val_meta)
process_split("test", test_meta)

print("\nTotal processed videos (before augmentation):", len(processed_records))



Processing split: train


Finished train split.
Good videos: 533
Skipped (corrupted / empty) videos: 0

Processing split: val
Finished val split.
Good videos: 115
Skipped (corrupted / empty) videos: 0

Processing split: test
Finished test split.
Good videos: 115
Skipped (corrupted / empty) videos: 0

Total processed videos (before augmentation): 763


In [14]:
# Step 6: Balance the training set using simple data augmentation

# We will create extra videos for classes that have fewer samples.
# Augmentation: for each new video, we will take an existing one and
# horizontally flip every frame.

meta_df = pd.DataFrame(processed_records)

train_meta_df = meta_df[meta_df["split"] == "train"].copy()

# Compute how many samples each class has
label_counts = train_meta_df["label"].value_counts()
max_count = label_counts.max()

print("\nPer-class counts:")
print(label_counts.sort_index())
print("Target count for each class:", max_count)

new_augmented_records = []

for label_value in sorted(label_counts.index):
    current_count = label_counts[label_value]
    needed = max_count - current_count

    if needed <= 0:
        continue

    print(f"\nAugmenting label {label_value} (need {needed} more videos)")

    # All videos from this class in the train split
    class_videos = train_meta_df[train_meta_df["label"] == label_value].reset_index(drop=True)

    for i in range(needed):
        # Pick a random existing video from this class
        source_row = class_videos.sample(n=1, random_state=random.randint(0, 1_000_000)).iloc[0]

        source_path = source_row["file_path"]
        source_video_id = source_row["video_id"]
        gesture = source_row["gesture"]

        # Load the stored NumPy array
        video_array = np.load(source_path)

        # Augmentation: horizontal flip (reverse width axis)
        # video_array shape: (num_frames, H, W, 3)
        flipped = video_array[:, :, ::-1, :]

        # New video id and path
        new_video_id = f"{source_video_id}_aug{i}"
        new_file_path = os.path.join(OUTPUT_DIR, "train", gesture, new_video_id + ".npy")

        # Save augmented video
        np.save(new_file_path, flipped)

        # Record metadata
        new_augmented_records.append(
            {
                "split": "train",
                "video_id": new_video_id,
                "gesture": gesture,
                "label": int(label_value),
                "file_path": new_file_path,
                "num_frames": flipped.shape[0],
                "is_augmented": True,
                "source_video_id": source_video_id,
            }
        )

print("\nNumber of new augmented train videos:", len(new_augmented_records))

# Update full metadata
if new_augmented_records:
    meta_df = pd.concat([meta_df, pd.DataFrame(new_augmented_records)], ignore_index=True)

train_meta_df = meta_df[meta_df["split"] == "train"].copy()

print("\nTrain set label counts AFTER augmentation:")
print(train_meta_df["label"].value_counts().sort_index())



Per-class counts:
label
0    108
1    112
2    106
3    110
4     97
Name: count, dtype: int64
Target count for each class: 112

Augmenting label 0 (need 4 more videos)

Augmenting label 2 (need 6 more videos)

Augmenting label 3 (need 2 more videos)

Augmenting label 4 (need 15 more videos)

Number of new augmented train videos: 27

Train set label counts AFTER augmentation:
label
0    112
1    112
2    112
3    112
4    112
Name: count, dtype: int64


In [15]:
# Save full metadata for all splits
meta_csv_path = os.path.join(OUTPUT_DIR, "videos_metadata.csv")
meta_df.to_csv(meta_csv_path, index=False)

print("Saved metadata to:", meta_csv_path)


Saved metadata to: processed_dataset\videos_metadata.csv


## Summary

In this notebook we:

- **Loaded** the gesture recognition CSV files and built a simple table of all videos.
- **Checked folders and frames**, removing videos that were missing or empty.
- **Split** the data into train, validation, and test sets at the video level.
- **Resized and normalized** all images, saving each video as a `.npy` file with pixel values in \([0, 1]\).
- **Balanced the training set** by adding horizontally‑flipped copies of videos from smaller classes.
- **Saved metadata** about every processed video in `processed_dataset/videos_metadata.csv`.

You can now load the `.npy` files and metadata in your training notebook and focus on building your model.
