Real-time Sign Language Interpreter using Computer Vision & Machine Learning
A Python application that uses MediaPipe, OpenCV, and scikit-learn to detect hands in real-time, extract landmark features, and classify ASL (American Sign Language) signs.
- 🖐️ Real-time Hand Detection — Tracks up to 2 hands simultaneously using MediaPipe's 21-point landmark model
- 🔤 ASL Sign Classification — Recognizes static ASL alphabet signs (A–Z, excluding J and Z) via a trained RandomForest classifier
- 📝 Sentence Building — Accumulate sign letters into words and sentences with a hold-to-confirm mechanism
- 📊 Prediction Smoothing — Sliding-window voting system prevents flickering sign displays
- 🎯 Confidence Display — Color-coded prediction overlay with confidence bar (green / yellow / red)
- 📸 Data Collection Pipeline — Built-in webcam script to capture sign images with countdown timer
- 🔬 Feature Extraction — Position-invariant landmark normalization for robust classification
- 📈 Model Evaluation & Tuning — Confusion matrix visualization, classification report, and GridSearchCV hyperparameter tuning
- 🔄 Data Augmentation — Horizontal flip, rotation, brightness/contrast, and zoom augmentations
- 📦 ASL MNIST Support — Loader/converter for the Kaggle ASL MNIST dataset
- 🪞 Mirror-mode Webcam — Live feed with FPS counter and mode indicator
| Category | Libraries |
|---|---|
| Computer Vision | OpenCV ≥ 4.8, MediaPipe ≥ 0.10 |
| Machine Learning | scikit-learn ≥ 1.3 (RandomForest, GridSearchCV) |
| Numerical | NumPy ≥ 1.24 |
| Visualization | Matplotlib, Seaborn |
| Language | Python 3.9–3.11 |
chirona/
├── core/ # Core ML & detection logic (classifiers, extractors, gesture detection)
├── utils/ # Shared utilities (drawing, smoothing, text overlay)
├── data/ # Data collection, extraction, and dataset loaders
├── models/ # Training, evaluation scripts, and saved weights
├── tests/ # Unit and integration tests
├── main.py # Application entry point
├── config.py # Global hyperparameters and UI constants
├── requirements.txt # Python dependencies
└── ROADMAP.md # Development roadmap & technical reference
- Python 3.9–3.11 (recommended for MediaPipe compatibility)
- A webcam
# Clone the repository
git clone https://github.com/nm-devs/Chirona.git
cd Chirona
# Create a virtual environment (optional but recommended)
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS/Linux
# Install dependencies
pip install -r requirements.txt# 1. Collect sign images via webcam
python data/collect_images.py
# 2. Extract hand landmarks from collected images
python data/extract_landmarks.py
# 3. Train the classifier
python models/train_model.py
# (Optional) Tune hyperparameters with GridSearchCV
python models/tune_model.pypython main.py| Key | Action |
|---|---|
SPACE |
Manually add a space to the sentence |
H |
Toggle between 1 and 2 hand modes |
ESC |
Exit the application |
| Action | Function |
|---|---|
speak |
Text-to-speech output of the current sentence |
space |
Add a space in sentence builder |
backspace |
Remove the last character |
clear |
Clear the current sentence |
pytest tests/See ROADMAP.md for the full development plan. Current progress:
- Project restructure & modular architecture
- Data collection pipeline
- Feature extraction & normalization
- RandomForest model training & evaluation
- Hyperparameter tuning (GridSearchCV)
- Real-time sign classification with confidence display
- Prediction smoothing (anti-flicker)
- Data augmentation utilities
- ASL MNIST dataset support
- Text-to-speech output
- Sentence building from individual signs
- Dynamic gesture recognition (LSTM)
- Desktop GUI (PyQt5)
- Web-based version (Flask/FastAPI)
⚠️ Project is actively in development — model training is ongoing and a live demo will be added upon completion.
Made with ❤️ and Python
Done By Michael Musallam and Nadim Baboun