A simple computer vision project that translates human expressions and poses into capybara images. The system uses MediaPipe for pose detection and a machine learning classifier to recognize different poses and expressions in real-time through your webcam.
This project detects various human poses and expressions (Normal, Nervous, Relaxed, Sad, Happy, and Approval) and displays corresponding capybara reference images when a pose is recognized. It's built using OpenCV for video processing, MediaPipe for pose detection, and scikit-learn for classification.
- Real-time pose and expression detection from webcam feed
- Classification of six different poses: Normal, Nervous, Relaxed, Sad, Happy, and Approval
- Visual feedback with reference capybara images
- Pose landmark visualization on the video feed
- Confidence scores for each detected pose
- Python 3.x
- Webcam
-
Clone or download this repository
-
Install the required dependencies:
pip install -r requirements.txtThe required packages are:
- opencv-python
- mediapipe
- numpy
- scikit-learn
- joblib
To set up and run the detector from scratch, follow these steps in order:
- Collect training data:
python -m data.data_collectorThis will open a webcam window where you can collect pose data. Use the following controls:
- Press
0-6to set the current pose type (Normal, Nervous, Relaxed, Sad, Happy, Approval, or Unknown) - Press
ato toggle auto collection mode - Press
sto save collected data - Press
lto load previously saved data - Press
tto train the model with collected data - Press
ESCto quit
- Train the model:
python train_model.pyThis will train the classifier using the collected data and save the model to pose_data/pose_model.joblib.
- Run the detector:
python main.pyThe program will:
- Open your webcam
- Display the video feed with pose landmarks overlaid
- Show a reference capybara image in a separate window when a pose is detected
- Display the detected pose name and confidence score on the video feed
Press ESC to quit the program.
main.py- Main application that runs the pose detection and classificationdetector/pose_detector.py- Handles MediaPipe pose detection and landmark extractionclassifier/pose_classifier.py- Machine learning classifier for pose recognitiontrain_model.py- Script to train the pose classification modelimages/- Reference capybara images for each pose typepose_data/- Saved model and training datascreenshots/- Example screenshots of the application in action
The system uses MediaPipe's Holistic model to detect pose landmarks, face landmarks, and hand landmarks from the video feed. These landmarks are then processed to extract geometric features such as:
- Shoulder width and tilt
- Arm extension and angles
- Hand positions relative to the body
- Torso lean
- Mouth curvature and width
These features are fed into a Random Forest classifier that has been trained to recognize the different pose categories. When a pose is detected with sufficient confidence, the corresponding capybara reference image is displayed.
Make sure you have good lighting and are positioned so your full upper body is visible in the camera frame for best detection results. The model works best when you're facing the camera directly.
Here are some example screenshots showing the detector in action:





