This project uses pretrained Stacked Hourglass and OpenPose pose detection models to analyze and provide feedback on choreography videos. A user can compare their video to the reference video and receive a feedback report.
For this project, we are specifically interested in exploring human pose estimation with single-person videos.This project aims to detect poses in a given video and a reference video, and then with the detected keypoints, quantify differences between the two videos for the purposes of scoring how well the given video follows the poses of the original reference video. Specifically, we create a video analysis pipeline that takes two single-person videos as input, compares the poses of the individuals in the videos based on two metrics, and outputs specific segments and frames with the largest differences between the two videos.
See this Google Drive folder for an example project with inputs and outputs, as well as other reference videos created throughout the process of working on this project.
We assume that the reference video is longer than the test video--the offset between videos and generated audio are based on the reference video clip. Pose estimation models work best when the input is of a single individual who is centered in the video and the video is cropped to the individual. Feedback is provided in the form of the problem sections that score below the 25th percentile of all scores, and the top n half second sections that had the lowest scores, where the scores are given by a specified metric.
The main script in video_analyzer.py
shows an example run using Stacked Hourglass for inference. Creating a Video_Analyzer object requires a name for the choreo you are analyzing, two video filenames (one for reference, one for test), as well as a base path for saving your output(s) to, and desired frame rate.
Call the initial_processing_for_openpose
function in the Video_Analyzer
class found in the video_analyzer.py
folder to generate the resulting audio and trimmed reference video based on the offset between the test and reference videos.
For the rest of the pipeline, we recommend downloading the Pose_Detection_with_OpenPose_and_Video_Feedback.ipynb
notebook and running it on Google Colab. Upload the test video, reference video, and audio file to the Colab session, and make sure that Runtime is set to GPU. Use the Colab notebook to generate predictions and output videos.
Here is an overview:
- Install OpenPose (~14 minutes)
- Run OpenPose on videos (~2 minutes for a 20 second video)
- Extraction of frame data
- Score analysis
- Post processing (~3 minutes to generate merged video and add audio)
See the OpenPose and StackedHourglass papers and implementation Github code below.
Specific libraries used: Moviepy for editing audio and video clips, align videos by sound for aligning the test and reference videos.