A real-time system that turns webcam hand gestures into natural language and optional speech. MediaPipe detects hands, custom logic recognizes conversational gestures, OpenAI shapes sentences, and ElevenLabs (optional) speaks them aloud. DeepFace can annotate the dominant emotion in-frame.
- Live gesture recognition using MediaPipe Hands with configurable smoothing.
- Sentence emits after consensus and auto-closes when 5 seconds pass with no new gestures or when all hands leave the frame.
- OpenAI-powered translation with customizable prompt/context.
- Optional DeepFace emotion overlay and logging.
- Optional ElevenLabs TTS with background queue and playback controls.
- Hotkeys for toggling logs, forcing sentences, saving landmarks, and cancelling audio.
- Python 3.9+ recommended.
- Webcam.
OPENAI_API_KEY(env/.env) for translation.- Optional
XI_API_KEYfor ElevenLabs TTS. - Optional DeepFace (installed by default) for emotion detection.
requirements.txt includes mediapipe, opencv-python, numpy, deepface, protobuf<4.0.0, openai, python-dotenv, plus dev tools (pytest, black, flake8, isort).
git clone <repo>
cd insync
python3 -m venv venv
source venv/bin/activate # Windows PowerShell: .\venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt
# optional if using ElevenLabs TTS
pip install elevenlabsPopulate .env:
OPENAI_API_KEY=sk-...
XI_API_KEY=... # optional
XI_VOICE_ID=... # optional voice preset
SPEAKER_ROLE=... # optional prompt context
source venv/bin/activate
python main.pyWindow controls:
qquitssave snapshot of current landmarksptoggle console landmark logging (setdetector.console_landmark_logging = Trueto enable output)ttoggle translation log linesrtoggle raw gesture log linescclear sentences and queuesnforce sentence completionxcancel audio & clear playback queueSPACEprint detailed frame analysis
- Webcam frames are mirrored and processed by MediaPipe Hands (and optional Holistic for face points).
- Gesture smoothing (window=9, consensus=0.6, cooldown=0.5 s, hold=0.25 s) filters jitter before emitting words.
- If no gestures change for 5 s, or if no hands are detected, the sentence finalizes and queues for translation and TTS.
- The OpenAI prompt reshapes gesture tokens into a natural sentence; empty/unknown input results in silence.
- ElevenLabs (if configured) synthesizes speech asynchronously and plays it via the audio worker.
- DeepFace (if enabled) updates the overlay every
emotion_intervalframes and logs the dominant emotion.
Edit src/hand_landmarks/gesture_translator.py to embed contextual prompts. Example:
CUSTOM_CONTEXT = """
Context:
- Speaker: emergency triage nurse
- Preferred terms: patient, appointment, medication
- Custom sign overrides: thumb-to-cheek → "patient"
"""
system_prompt = build_system_prompt(CUSTOM_CONTEXT)Pass new context strings into fix_sentence when needed.
You can adjust the following in RealTimeGestureDetector.__init__:
gesture_window_sizegesture_min_consensusgesture_cooldown_secondsgesture_transition_min_framesgesture_confidence_thresholdgesture_margin_thresholdgesture_pending_hold_seconds
Smaller values make recognition more responsive; larger ones reject noise.
- Controlled by
self.emotion_detection_enabled(auto-true if DeepFace imports). - Runs every
emotion_intervalframes (default 15). - Shows a bounding box with the dominant emotion label and confidence.
- Webcam not detected: ensure no other app uses the camera; try
camera_id=1. - Sentence never ends: stay still or remove hands for ≥5 s; unknown gestures no longer reset the timer.
- OpenAI/permission errors: check
.env, verify network access. - DeepFace protobuf crash: reinstall protobuf 3.20.x (
pip install protobuf==3.20.3). - ElevenLabs missing: install the SDK and set
XI_API_KEY, or disable autoplay (detector.enable_auto_play(False)).
src/hand_landmarks/camera_gesture_detection.py– main application, smoothing, translation pipeline, emotion & TTS wrappers.src/hand_landmarks/gesture_recognition.py– detailed gesture analytics utilities.src/hand_landmarks/gesture_interpreters.py– rule-based gesture classifiers.src/hand_landmarks/gesture_translator.py– OpenAI translator/prompt helper.src/hand_landmarks/hand_landmarks_detector.py– MediaPipe Hands wrapper.main.py– entry point.
MIT License – see LICENSE.