Deep Learning-Based Sign Language Recognition from Videos and Cross-Lingual Translation to Indian Vernaculars

Sign Language Video Recognition & Cross-Lingual Translation

Deep learning pipeline that classifies sign-language video clips into English word labels and translates them into Hindi, Telugu, and Bengali.

Built as part of an Executive M.Tech project at IIT Patna, under the guidance of Dr. Chandranath Adak.

📄 Paper: Deep Learning-Based Sign Language Recognition from Videos and Cross-Lingual Translation to Indian Vernaculars — [arXiv link coming soon]

Overview

The system works in two stages:

Recognition — A fine-tuned VideoMAE (MCG-NJU/videomae-base) video transformer classifies a 16-frame sign-language video clip into one of 13 English word labels.
Translation — The predicted English label is translated into Hindi, Telugu, and Bengali using Meta AI's NLLB-200 (facebook/nllb-200-distilled-600M) multilingual translation model.

A Streamlit demo app lets a user upload a video and see the predicted label plus all three translations in real time.

Results

Fine-tuned on a 13-class subset (~2GB) of the AI4Bharat Indian Sign Language dataset from IIT Madras:

Metric	Value
Training accuracy	99.4%
Validation accuracy	77.5% (31/40)
Classes	13
Epochs	15

Most confusion occurs between visually similar signs (e.g., hat / dress / shirt) and a semantically related cluster (ugly / deaf / blind). See the paper for full error analysis.

Setup

pip install torch torchvision transformers decord scikit-learn
pip install streamlit opencv-python pillow sentencepiece pyngrok

Set your Hugging Face token as an environment variable rather than hardcoding it:

export HF_TOKEN="your_token_here"
export NGROK_AUTH_TOKEN="your_token_here"

or add these keys in .env file

Dataset

This project uses a 13-class subset of the AI4Bharat ISL video corpus:

8 adjectives: loud, quiet, happy, sad, beautiful, ugly, deaf, blind
5 clothing nouns: hat, dress, suit, skirt, shirt

Full dataset: AI4Bharat OpenHands

Other ISL sources explored (not used for training):

Training

# Key hyperparameters
Train-Test Split: 80:20
Optimizer: AdamW (lr=1e-5)
Loss: CrossEntropyLoss
Epochs: 15
Batch size: 2

See notebooks/sign_language_classification.ipynb for the full training loop.

Running the Demo

streamlit run app.py

Upload a .mp4 or .mov clip of a sign from the trained label set. The app returns the predicted English label with a confidence score, followed by Hindi, Telugu, and Bengali translations.

Limitations

Small, imbalanced label set (13 classes, as few as 1–2 validation clips for some classes)
Isolated-word classification only — no continuous sentence generation from unsegmented video
Sensitive to lighting, camera angle, and signer style variation
Single-word translation can be ambiguous without sentence context
Offline/batch inference only — not real-time

See the paper's Limitations section for details.

Future Work

Scale to the full AI4Bharat vocabulary
Move from isolated-word classification to continuous sign-to-sentence translation
Extend to more Indian languages via NLLB-200
Optimize for edge/mobile deployment

Acknowledgements

AI4Bharat and IIT Madras for the ISL dataset
VideoMAE and NLLB open-source teams
Department of Computer Science and Engineering, IIT Patna

Citation

@misc{nandipalli2025signlanguage,
  title={Deep Learning-Based Sign Language Recognition from Videos and Cross-Lingual Translation to Indian Vernaculars},
  author={Nandipalli, Ramesh and Adak, Chandranath},
  year={2025},
  note={IIT Patna Executive M.Tech Thesis}
}

License

[Add a license, e.g., MIT — see https://choosealicense.com]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
SignLanguageClassification_arxiv.ipynb		SignLanguageClassification_arxiv.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning-Based Sign Language Recognition from Videos and Cross-Lingual Translation to Indian Vernaculars

Sign Language Video Recognition & Cross-Lingual Translation

Overview

Results

Setup

or add these keys in .env file

Dataset

Training

Running the Demo

Limitations

Future Work

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Learning-Based Sign Language Recognition from Videos and Cross-Lingual Translation to Indian Vernaculars

Sign Language Video Recognition & Cross-Lingual Translation

Overview

Results

Setup

or add these keys in .env file

Dataset

Training

Running the Demo

Limitations

Future Work

Acknowledgements

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages