Skip to content
Demo code for "Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose", CVPR 2017
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
data/h36m-sample/annot Initial commit Jul 6, 2017
matlab Initial commit Jul 6, 2017
LICENSE Initial commit Jul 6, 2017 Edit README Jul 10, 2017 Initial commit Jul 6, 2017
img.lua Initial commit Jul 6, 2017 Initial commit Jul 6, 2017
main.lua Initial commit Jul 6, 2017
util.lua Initial commit Jul 6, 2017

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose (Demo Code)

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis

This is the demo code for the paper Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. Please follow the links to read the paper and visit the corresponding project page.

For the training code please visit this repository.

We provide code to test our model on Human3.6M. Please follow the instructions below to setup and use our code. The typical procedure is 1) apply the ConvNet model using a torch script through command line and then 2) run a MATLAB script (from folder matlab) for visualization or evaluation. To run this code, make sure the following are installed:

1) Downloading models and data

We provide a Coarse-to-Fine Volumetric prediction model pretrained on Human3.6M. To get the model and other relevant data in the expected folders, please run the following script:


Also, if you want to evaluate our approach on the whole set of images for Human3.6M, please run the following script to get all the relevant data (be careful, since the size is over 8GB)


2) Evaluation on Human3.6M (sample)

We have provided a sample of Human3.6M images, following previous work. You can apply our model on this sample by running the command:

th main.lua h36m-sample

Then, to visualize the output, you can use the MATLAB script:


3) Evaluation on Human3.6M (full)

If you want to reproduce the results of our paper for Human3.6M, you need to download the full set of images we used for testing, by running the script as indicated above. These images are extracted from the videos of the original dataset, and correspond to the images used for testing by the most typical protocol. Please check the file:


for the correspondence of images with the original videos. The filename protocol we follow is:

S[subject number]_[Action Name].[Camera Name]_[Frame Number].jpg

An example for Subject 5, performing action Eating (iteration 1), when we consider camera name '55011271' and frame 321, is:


Having downloaded the necessary images, to apply our model on the whole set of Human3.6M images, you can run:

th main.lua h36m

Then, to conclude the evaluation, you need to run the next MATLAB script:


This will print the output in a results.txt file.


If you find this code useful for your research, please consider citing the following paper:

  Title          = {Coarse-to-Fine Volumetric Prediction for Single-Image 3{D} Human Pose},
  Author         = {Pavlakos, Georgios and Zhou, Xiaowei and Derpanis, Konstantinos G and Daniilidis, Kostas},
  Booktitle      = {Computer Vision and Pattern Recognition (CVPR)},
  Year           = {2017}


This code follows closely the released code for the Stacked Hourglass networks by Alejandro Newell. If you use this code, please consider citing the respective paper.

You can’t perform that action at this time.