Official PyTorch implementation of the paper "Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks" (2025).
Isolated Sign Language Recognition (ISLR) is fundamentally constrained by data scarcity and long-tail vocabulary distributions. Gathering sufficient examples for thousands of unique signs is prohibitively expensive, causing standard classifiers to overfit frequent classes while failing on rare ones.
To address this, we propose a Few-Shot Prototypical Network framework adapted for skeleton-based features. Unlike traditional classifiers, our approach utilizes episodic training to learn a semantic metric space where signs are classified based on their proximity to dynamic class prototypes. We integrate a Spatiotemporal Graph Convolutional Network (ST-GCN) with a novel Multi-Scale Temporal Aggregation (MSTA) module to capture both rapid and fluid motion dynamics.
Key Results:
- 43.75% Top-1 Accuracy on WLASL (2000 classes), outperforming standard baselines by >13%.
- 77.10% Top-5 Accuracy, demonstrating strong retrieval capabilities.
- robust generalization to unseen classes without fine-tuning.
Figure 1: The proposed architecture integrates an ST-GCN backbone with Multi-Scale Temporal Aggregation (MSTA) to extract rich spatiotemporal embeddings. These embeddings are projected into a metric space where a Prototypical Network performs distance-based classification.
.
├── config.py # Hyperparameters and path configurations
├── datasets.py # Custom PyTorch Dataset/Dataloaders for WLASL
├── models.py # Definitions for MSTA, ProtoTypical Network, Baseline Classifier Model
├── train.py # Main script for Few-Shot Episodic training
├── train_baseline.py # Script for baseline fully connected classifier training
├── inspect_data.py # Utilities for visualizing and checking data integrity
├── data/
│ └── WLASL/ # WLASL processed labels
├── stgcn_layers/ # Custom STGCN Graph Conv blocks & Graph utilities
├── training_scripts/ # Shell scripts for HPC (Slurm) execution
└── utils/ # Helper scripts (preprocessing, results aggregation)
This project relies on the WLASL (World Level American Sign Language) dataset (skeleton/pose format).
Download Data: Obtain the WLASL files from the official WLASL repository. Here is a non-official source to download preprocessed pose files: unofficial download from UniSign
Directory Setup: Place the label files in the data/ directory:
data/
└── WLASL/
├── labels-2000.train
├── labels-2000.dev
└── labels-2000.test
Sample training scripts can be found under training_scripts/ directory. Baseline training code can be found in train_baseline.py, for the fewshot strategy, it can be found in train.py
If you find this code or research useful, please cite our paper:
@misc{saad2025dataefficientamericansignlanguage,
title={Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks},
author={Meher Md Saad},
year={2025},
eprint={2512.10562},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={[https://arxiv.org/abs/2512.10562](https://arxiv.org/abs/2512.10562)},
}