Skip to content

Official PyTorch implementation of "Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks" (arXiv:2512.10562).

Notifications You must be signed in to change notification settings

mehermdsaad/FewShotASL

Repository files navigation

Data-Efficient ASL Recognition via Few-Shot Prototypical Networks

arXiv License: MIT Python PyTorch

Official PyTorch implementation of the paper "Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks" (2025).

Abstract

Isolated Sign Language Recognition (ISLR) is fundamentally constrained by data scarcity and long-tail vocabulary distributions. Gathering sufficient examples for thousands of unique signs is prohibitively expensive, causing standard classifiers to overfit frequent classes while failing on rare ones.

To address this, we propose a Few-Shot Prototypical Network framework adapted for skeleton-based features. Unlike traditional classifiers, our approach utilizes episodic training to learn a semantic metric space where signs are classified based on their proximity to dynamic class prototypes. We integrate a Spatiotemporal Graph Convolutional Network (ST-GCN) with a novel Multi-Scale Temporal Aggregation (MSTA) module to capture both rapid and fluid motion dynamics.

Key Results:

  • 43.75% Top-1 Accuracy on WLASL (2000 classes), outperforming standard baselines by >13%.
  • 77.10% Top-5 Accuracy, demonstrating strong retrieval capabilities.
  • robust generalization to unseen classes without fine-tuning.

Architecture

Model Architecture
Figure 1: The proposed architecture integrates an ST-GCN backbone with Multi-Scale Temporal Aggregation (MSTA) to extract rich spatiotemporal embeddings. These embeddings are projected into a metric space where a Prototypical Network performs distance-based classification.


Repository Structure

.
├── config.py                  # Hyperparameters and path configurations
├── datasets.py                # Custom PyTorch Dataset/Dataloaders for WLASL
├── models.py                  # Definitions for MSTA, ProtoTypical Network, Baseline Classifier Model
├── train.py                   # Main script for Few-Shot Episodic training
├── train_baseline.py          # Script for baseline fully connected classifier training
├── inspect_data.py            # Utilities for visualizing and checking data integrity
├── data/                      
│   └── WLASL/                 # WLASL processed labels
├── stgcn_layers/              # Custom STGCN Graph Conv blocks & Graph utilities
├── training_scripts/          # Shell scripts for HPC (Slurm) execution
└── utils/                     # Helper scripts (preprocessing, results aggregation)

Data Preparation

This project relies on the WLASL (World Level American Sign Language) dataset (skeleton/pose format).

Download Data: Obtain the WLASL files from the official WLASL repository. Here is a non-official source to download preprocessed pose files: unofficial download from UniSign

Directory Setup: Place the label files in the data/ directory:

data/
└── WLASL/
    ├── labels-2000.train
    ├── labels-2000.dev
    └── labels-2000.test

Training

Sample training scripts can be found under training_scripts/ directory. Baseline training code can be found in train_baseline.py, for the fewshot strategy, it can be found in train.py

Citation

If you find this code or research useful, please cite our paper:

@misc{saad2025dataefficientamericansignlanguage,
      title={Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks}, 
      author={Meher Md Saad},
      year={2025},
      eprint={2512.10562},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={[https://arxiv.org/abs/2512.10562](https://arxiv.org/abs/2512.10562)}, 
}

About

Official PyTorch implementation of "Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks" (arXiv:2512.10562).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published