A real-time, touchless piano played entirely through hand gestures captured via a standard webcam — no special hardware required. Built with Python using computer vision, real-time signal processing, and audio synthesis.
Point a webcam at your hands and play a full octave (C4–E5) by flicking individual fingers downward in the air.
Left hand Right hand
pinky ring middle index thumb | thumb index middle ring pinky
C4 D4 E4 F4 G4 | A4 B4 C5 D5 E5
git clone https://github.com/itsJvmes/Virtual-Piano.git
cd Virtual-Piano
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
./run.shOptional — calibrate desk edge before first run:
python src/calibration.pyKeep hands out of frame. The script captures 100 frames, detects the desk's horizontal edge via Hough Line Transform, and saves the result to desk_edge_calibration.json.
Always run from the project root, not from inside
src/.
On every video frame the pipeline:
- Captures the webcam feed and mirrors it so it feels like a reflection
- Passes the frame to MediaPipe Hand Landmarker (Tasks API, confidence 0.8/0.8/0.9) to extract all 21 landmarks per hand
- Applies exponential smoothing (
α = 0.3) to each fingertip's position to reduce sensor noise - Computes each finger's relative velocity — its downward speed minus the hand's average downward speed — isolating individual finger presses from whole-hand movement
- Triggers a note when
relative_velocity > thresholdand the per-finger cooldown has elapsed - Plays the corresponding
.mp3via Pygame mixer
| Parameter | Value | Purpose |
|---|---|---|
| Velocity threshold (fingers) | > 5 px/frame |
Minimum downward speed to register a press |
| Velocity threshold (thumb) | > 3 px/frame |
Lower because the thumb moves at an angle |
| Release threshold | < 1 px/frame |
Reset press state when finger stops moving |
| Cooldown | 0.25 s |
Prevents double-triggering on the same finger |
| Smoothing alpha | 0.3 |
Weight given to the current frame in the EMA |
- Hand skeleton — all 21 joints drawn per hand; left hand in blue, right in orange
- Fingertip circles — green when pressed, hand color otherwise
- Flash ring — expanding ring animation on each note trigger (fades over 0.4 s)
- Piano keyboard strip — 10-key strip at the bottom lights up on press
- Note labels — note name next to each fingertip with a dark outline for readability
- FPS counter — top-left corner
src/
virtual_piano.py — main loop: webcam → detection → velocity → audio + render
sound_player.py — Pygame mixer wrapper for loading and playing .mp3 notes
calibration.py — standalone desk edge detection tool (run once offline)
resources/
sounds/ — C4 D4 E4 F4 G4 A4 B4 C5 D5 E5 (.mp3)
hand_landmarker.task — pre-bundled MediaPipe model
| Library | Role |
|---|---|
opencv-python |
Webcam capture, frame rendering, Hough Line Transform |
mediapipe |
Hand landmark detection (Tasks API) |
numpy |
Frame composition and signal math |
pygame |
Audio playback |
Relative velocity instead of raw velocity — subtracting the hand's mean downward speed from each finger's speed eliminates false triggers caused by whole-hand movement (e.g., repositioning over the keyboard).
Handedness label correction — cv2.flip mirrors the frame, which inverts MediaPipe's LEFT/RIGHT labels. The labels are swapped after detection so the correct notes play on each hand.
High-confidence MediaPipe thresholds — detection/presence/tracking set to 0.8/0.8/0.9 (vs the default 0.5) to reduce ghost detections at the cost of slightly slower acquisition.
Offline calibration — desk edge detection runs once as a separate script and stores the result to JSON, keeping the main loop free of setup logic.