Real-time 3D Human Pose Estimation

This code uses the design of the following paper, which is implimented in this Facebook AI Research respository.

Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

The purpose of this repo is to reformat the "inference in the wild" section from the original code to run in real time from a camera stream.

Description

The acapture library is used for fast image reading from standard webcams, where the images are then passed into a neural network from Facebook's Detectron2 library to generate predictions of the 2D joint positions. A variable number of these estimates are saved, which a second neural network uses to predict the 3D joint positions for a given frame. Typically, a larger save window will result in more accurate predictions. The video below shows this process.

To visualize the predictions, blitting in matplotlib is used to efficiently plot the 3D joint models. The network is intended to operate with a single person in frame.

Installation

The pretrained temporal model must be downloaded into the model/ directory. Perform the following commands:

mkdir model
cd model
wget https://dl.fbaipublicfiles.com/video-pose-3d/pretrained_h36m_detectron_coco.bin
cd ..

Detectron2 must be installed and is only available for Linux of macOS. Pytorch should also be downloaded. Ensure that the Pytorch versions are compatible, and the installations match your CUDA version.

All other dependencies can be installed by running the following command. Additionally, a directory for the output files of the program should be created.

pip install -r requirements.txt
mkdir out_files

Running the Code

Simply type the command python run.py and the program will begin. The window_size variable changes the number of images for the temporal convolutional network. Open a new terminal session and run the command python viz.py to graph the predictions in real time, which will look like the below video. The program writes output files to out_files/ where out_files/joints.npy contains the final joint coordinate predictions.

Additional Information

The output of the file contains 17 joints, each of which have 3 coordinates. Below, a table details the contents of the final output array. An example of an application of this code can be found here

Index	Location	Index	Location
0	Tailbone	9	Nose
1	Right Hip	10	Head Crown
2	Right Knee	11	Left Shoulder
3	Right Foot	12	Left Elbow
4	Left Hip	13	Left Hand
5	Left Knee	14	Right Shoulder
6	Left Foot	15	Right Elbow
7	Mid Spine	16	Right Hand
8	Neck	-	-

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
common		common
imgs		imgs
README.md		README.md
nn1.py		nn1.py
nn2.py		nn2.py
requirements.txt		requirements.txt
run.py		run.py
util.py		util.py
viz.py		viz.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

imgs

imgs

README.md

README.md

nn1.py

nn1.py

nn2.py

nn2.py

requirements.txt

requirements.txt

run.py

run.py

util.py

util.py

viz.py

viz.py

Repository files navigation

Real-time 3D Human Pose Estimation

Description

Installation

Running the Code

Additional Information

About

Releases

Packages

Languages

gfloto/Pose3D

Folders and files

Latest commit

History

Repository files navigation

Real-time 3D Human Pose Estimation

Description

Installation

Running the Code

Additional Information

About

Resources

Stars

Watchers

Forks

Languages