c2f-3dhm-human-caffe

This is the caffe reimplementation of Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

You can find screenshots of eval on test set in figs/ (d2 = 16, 32, 64). (Random or full test)

External links (closely related projects)

I1

I1 with group normalization, batch size = 1

News under the headline

Reaches 67.1 mm MPJPE on entire test set!

Overture

I write C++ faster than Python.
I write faster C++ than Python.
I know C++ / Caffe is not easy to understand.
People don't like Netscape Browser or iPhone 4s any more. hmmmmmm.
People tend to use Hourglass for human pose while powerful ResNet is enough. hmmmmmmmmmmmmmm.

Your briefing

2-stage Hourglass (d1 = 1, d2 = 16/32/64) w/ batch size = 3 🌝 🌚 😈

Exquisite ResNet w/ integral coming up soon. 💪

Caffe Hourglass is imported from GNet-pose. Many thanks!

About comprehensive readme:

code.pdf provides details about custom layers.
data.pdf provides details about data format etc.
prototxt.pdf provides training/testing pipeline about the configuration prototxt file.

Environment

Ubuntu / Windows

For Ubuntu, I used two 12 GB TITAN Xp. For Windows, \emph{TO DO}

SSD

You'll need SSD for online data loading.

General Structure

${POSE_ROOT}
+-- caffe_code
+-- data
+-- figs
+-- models
+-- training
+-- testing
+-- README.md

Installation

install Caffe from GNet Caffe repository.
I have developed a myriad of layers. Code structure is

${POSE_ROOT}
|-- caffe_code
`-- |-- include
    `-- |-- caffe
        |   |-- deep_human_model_layers.hpp
        |   |   | ### This includes operations about 2d/3d heatmap /integral / augmentation / local <-> global transformation etc.
        |   |-- h36m.h
        |   |   | ### This includes definitions of joint / part / bone (h36m 32 joints / usable 16 joints / c2f 17 joints etc.)
        |   |-- operations.hpp 
        |   |   | ### This includes operations w.r.t scalar / vector / fetch file / output data.
`-- |-- src
    `-- |-- caffe
        |   |-- layers
        |   |   |-- DeepHumanModel
        |   |   |   |-- deep_human_model_argmax_2d_hm_layer.cpp 
        |   |   |   |-- ### This takes argmax operation on 2d heatmap 
        |   |   |   |-- deep_human_model_convert_2d_layer.cpp 
        |   |   |   |-- ### h36m provides full 32 joints, of which we only care 16 joints. Conversion from 16x2 <-> 32x2
        |   |   |   |-- deep_human_model_convert_3d_layer.cpp 
        |   |   |   |-- ### Conversion from 16x3 <-> 32x3
        |   |   |   |-- deep_human_model_convert_depth_layer.cpp 
        |   |   |   |-- ### Conversion from root-relative camera coordinate <-> [-1, 1] normalized depth
        |   |   |   |-- deep_human_model_gen_3d_heatmap_in_more_detail_v3_layer.cpp 
        |   |   |   |-- ### Generate groud truth for 3d heatmap. Closely follows c2f Torch code.
        |   |   |   |-- deep_human_model_h36m_cha_gen_joint_fr_xyz_heatmap_layer.cpp 
        |   |   |   |-- ### Argmax operation on 3d heatmap
        |   |   |   |-- deep_human_model_h36m_gen_aug_3d_layer.cpp 
        |   |   |   |-- ### Generate augmented 3d ground truth according to augmented 2d gt and 3d gt
        |   |   |   |-- deep_human_model_h36m_gen_pred_mono_3d_layer.cpp 
        |   |   |   |-- ### 2.5D -> 3D camera frame coordinate
        |   |   |   |-- deep_human_model_integral_vector_layer.cpp 
        |   |   |   |-- ### \sum_{i=0}^{D-1} probability * position
        |   |   |   |-- deep_human_model_integral_x_layer.cpp 
        |   |   |   |-- ### Integral along X axis
        |   |   |   |-- deep_human_model_integral_y_layer.cpp 
        |   |   |   |-- ### Integral along Y axis
        |   |   |   |-- deep_human_model_integral_z_layer.cpp 
        |   |   |   |-- ### Integral along Z axis
        |   |   |   |-- deep_human_model_norm_3d_hm_layer.cpp 
        |   |   |   |-- ### Normalize 3D heatmap responses to make them sum up to 1.0
        |   |   |   |-- deep_human_model_normalization_response_v0_layer.cpp 
        |   |   |   |-- ### 2D heatmap normalization
        |   |   |   |-- deep_human_model_numerical_coordinate_regression_layer.cpp 
        |   |   |   |-- ### Integral over normalized 2D heatmap -> (x, y)
        |   |   |   |-- deep_human_model_output_heatmap_sep_channel_layer.cpp 
        |   |   |   |-- ### Output heatmap of different joints to different folders
        |   |   |   |-- deep_human_model_output_joint_on_skeleton_map_h36m_layer.cpp 
        |   |   |   |-- ### Plot predicted joints on raw image
        |   |   |   |-- deep_human_model_softmax_3d_hm_layer.cpp 
        |   |   |   |-- ### Softmax normalization on 3d heatmap
        |   |   |   |-- deep_human_model_softmax_hm_layer.cpp 
        |   |   |   |-- ### Softmax normalization on 2d heatmap
		
		
		
        |   |   |-- Operations
		
		
		
		
        |   |   |   |-- adaptive_weight_euc_loss_layer.cpp
        |   |   |   |-- ### Adaptive weight controlling on different euclidean regression loss
        |   |   |   |-- add_vector_by_constant_layer.cpp
        |   |   |   |-- ### Add each element of vector by a scalar 
        |   |   |   |-- add_vector_by_single_vector_layer.cpp
        |   |   |   |-- ### Add two vectors element-wisely
        |   |   |   |-- add_vector_by_constant_layer.cpp
        |   |   |   |-- ### Add each element of vector by a scalar 
        |   |   |   |-- cross_validation_random_choose_index_layer.cpp
        |   |   |   |-- ### Select an index from different training split sources
        |   |   |   |-- gen_heatmap_all_channels_layer.cpp
        |   |   |   |-- ### Generate 2d heatmap ground truth. Closely follows Yichen Wei simple baseline & CPM caffe CPMDataLayer
        |   |   |   |-- gen_rand_index_layer.cpp
        |   |   |   |-- ### Randomly generate a index for training/testing
        |   |   |   |-- gen_sequential_index_layer.cpp
        |   |   |   |-- ### Sequentially generate index for testing
        |   |   |   |-- gen_unified_data_and_label_layer.cpp
        |   |   |   |-- ### Generate augmentend training data and label (2D). Adapated from CPMDataLayer  
        |   |   |   |-- joint_3d_square_root_loss_layer.cpp
        |   |   |   |-- ### Display average joint error MPJPE (mm)
        |   |   |   |-- js_regularization_loss_layer.cpp
        |   |   |   |-- ### Jenson-Shannon regularization loss
        |   |   |   |-- mul_rgb_layer.cpp
        |   |   |   |-- ### Scale rgb image by a scalar
        |   |   |   |-- output_blob_layer.cpp
        |   |   |   |-- ### Output blob to files for debugging 
        |   |   |   |-- output_heatmap_one_channel_layer.cpp
        |   |   |   |-- ### Output heatmap of one specific joint to file
        |   |   |   |-- read_blob_from_file_indexing_layer.cpp
        |   |   |   |-- ### Read data from disk w/ file index (id)
        |   |   |   |-- read_blob_from_file_layer.cpp
        |   |   |   |-- ### Read blob from a specific file
        |   |   |   |-- read_image_from_file_name_layer.cpp
        |   |   |   |-- ### Read image from file path
        |   |   |   |-- read_image_from_image_path_file_layer.cpp
        |   |   |   |-- ### Read image from a single file describing path for all images in the set
        |   |   |   |-- read_image_layer.cpp
        |   |   |   |-- ### See code
        |   |   |   |-- read_index_layer.cpp
        |   |   |   |-- ### Read image index from file
        |   |   |   |-- scale_vector_layer.cpp
        |   |   |   |-- ### Multiply vector by a constant scalar

Copy ${POSE_ROOT}/caffe_code/include/caffe/* to ${CAFFE_ROOT}/include/caffe/
Copy ${POSE_ROOT}/caffe_code/src/caffe/layers/* to ${CAFFE_ROOT}/src/caffe/layers/ after running the following

cd ${CAFFE_ROOT}src/caffe/layers
mkdir DeepHumanModel
mkdir Operations

Configure caffe.proto
- Add contents in LayerParameter of ${POSE_ROOT}/caffe_code/src/caffe/proto/custom_layers_mine.proto to ${CAFFE_ROOT}/src/caffe/proto/caffe.proto
- Replace TransformationParameter in ${CAFFE_ROOT}/src/caffe/proto/caffe.proto with the one in mine ${POSE_ROOT}/caffe_code/src/caffe/proto/custom_layers_mine.proto
- Add other layer parameter fields in ${POSE_ROOT}/caffe_code/src/caffe/proto/custom_layers_mine.proto to ${CAFFE_ROOT}/src/caffe/proto/caffe.proto
- Make sure ID of LayerParameter do not conflict with each other.

Compile

sudo make all -j128

Note 1: For ubuntu, you will have to modify header section of gen_unified_data_and_label_layer.cpp like this

#ifdef USE_OPENCV
#include <opencv2/core/core.hpp>
//#include <opencv2/opencv.hpp>
//#include <opencv2/contrib/contrib.hpp>
#include <opencv2/highgui/highgui.hpp>
#endif  // USE_OPENCV

Note 2: For windows, you will have to modify header section of gen_unified_data_and_label_layer.cpp like this

#ifdef USE_OPENCV
#include <opencv2/core/core.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/contrib/contrib.hpp>
#include <opencv2/highgui/highgui.hpp>
#endif  // USE_OPENCV

Still can't compile? Contact me.

Data

One thing I have realized over the years is that HDF5, LMDB, JSON, tar.gz, pth.tar or whatever is totally redundant, and suffers from a major downside: it needs to be loaded into memory. For python-based framework e.g. Keras, it is time consuming (sometimes 30 seconds ++) to load offline data. Even for caffe, it takes several seconds. I have thus far switched to a simple and naive data format i.e. txt. Each txt represents an annotation for a sample e.g. ground truth 3d, bbx. SSD is required.
See data.pdf for a thorough discussion and joint definition. (full 32 joints vs usable 16 joints)

Folder Name	Download Link	Description	A Toy Example
bbx_all_new	bbx	(bbx_x1, bbx_y1, bbx_x2, bbx_y2)	bbx
center_x	center_x	center_x (constant: 112.0)	center_x
center_y	center_y	center_y (constant: 112.0)	center_y
scale	scale	person image scale (constant: 224.0)	scale
gt_joint_2d_raw_new	gt_2d	2d gt on 224x224 cropped image (32x2)	gt_2d
image_path_file		image path for each sample	img_path_file
gt_joint_3d_mono_raw	gt_3d	monocular 3d gt in camera coordiante (32x3)	gt_3d
camera_all	camera	intrinsic & extrinsic camera parameters	camera
index_range	ind_range	index range per (subject, action)	ind_range
info_all	basic_info	video/action name/subaction/camera id/frame id	basic_info
images	img	all the cropped images (224x224)	img

Download data, place to

${POSE_ROOT}
 `-- data
     `-- full
         |   |-- bbx_all_new
         |   |-- center_x
         |   |-- center_y
         |   |-- scale
         |   |-- gt_joint_2d_raw_new
         |   |-- gt_joint_3d_mono_raw
         |   |-- image_path_file
         |   |-- camera_all
         |   |-- index_range
         |   |-- info_all
         |   |-- images

Train Index:

0 - 1559571

Test Index:

1559572 - 2108570

Trained models

Method	d2	MPJPE(mm)	Caffe Model	Solver State
Mine	64	67.1	Google Drive (net_iter_720929.caffemodel)	Google Drive (net_iter_720929.solverstate)
Mine	32	68.6	Google Drive (net_iter_640000.caffemodel)	Google Drive (net_iter_640000.solverstate)
Mine	16	73.6	Google Drive (net_iter_560000.caffemodel)	Google Drive (net_iter_560000.solverstate)
C2F	64	69.8	None	None
Integral	64	68.0	None	None

C2F and Integral are Included for reference.

Download models, place to

${POSE_ROOT}
 `-- models
     |   |-- net_iter_560000.caffemodel 
     |   |-- net_iter_560000.solverstate
     |   |-- net_iter_640000.caffemodel 
     |   |-- net_iter_640000.solverstate 
     |   |-- net_iter_720929.caffemodel 
     |   |-- net_iter_720929.solverstate

Kick off the testing

As you know, evaluation on the entire dataset takes time. For testing on a random subset, I implemented a random index generation layer. See screenshot "figs/test_d64_rand.png", "figs/test_d32_rand.png", "figs/test_d16_rand.png" for details.

I should claim that this is just for fun, please do not not take it seriously. You might get, say, 68.2 mm and 68.4 mm in two different runs.

d2 = 64

cd testing
$CAFFE_ROOT/build/tools/caffe test -model test_d64_rand.prototxt -weights models/net_iter_720929.caffemodel -iterations 500

This will give you figs/rand_test_d64.png (unstable number around 68 mm due to small number of samples)

d2 = 32

$CAFFE_ROOT/build/tools/caffe test -model test_d32_rand.prototxt -weights models/net_iter_640000.caffemodel -iterations 500

This will give you figs/rand_test_d32.png (unstable number around 71 mm)

d2 = 16

$CAFFE_ROOT/build/tools/caffe test -model test_d16_rand.prototxt -weights models/net_iter_560000.caffemodel -iterations 500

This will give you figs/rand_test_d16.png (unstable number around 74 mm)

Full testing

For full evaluation on H36M test set

d2 = 64

cd testing
$CAFFE_ROOT/build/tools/caffe test -model test_d16_statsfalse.prototxt -weights models/net_iter_720929.caffemodel -iterations 183000

This will give you 67.1 mm (figs/test_d64_full.png)

d2 = 32

$CAFFE_ROOT/build/tools/caffe test -model test_d32_statsfalse.prototxt -weights models/net_iter_640000.caffemodel -iterations 183000

This will give you 68.6 mm (figs/test_d32_full.png)

d2 = 16

$CAFFE_ROOT/build/tools/caffe test -model test_d16_statsfalse.prototxt -weights models/net_iter_560000.caffemodel -iterations 183000

This will give you 73.6 mm (figs/test_d16_full.png)

Training

Training is a bit tricky. For code structure about prototxt, see prototxt.pdf. Here's the thing:

Note I started with MPII pretrained caffemodel improved-hourglass_iter_640000.caffemodel from GNet repo.

I started with d2 = 2 to warm up. Simply run
```
cd training 
$CAFFE_ROOT/build/tools/caffe train --solver=solver_d2.prototxt --weights=improved-hourglass_iter_640000.caffemodel
```
I trained from MPII 2D HM pretrained model, with 2.5e-5 as base_lr and RMSProp. 2 GPUs were used unless otherwise specified. Weight initialization is gaussian w/ 0.01 std. Loss ratio of 3d HM to 2d HM is 0.1:1.
d2 = 4 Finetune weights from d2 = 2 after convergence.
```
$CAFFE_ROOT/build/tools/caffe train --solver=solver_d4.prototxt --snapshot=net_iter_XXX.solverstate 
```
You will get around 137 mm on train and 150 mm on test. For eval on training set, simply uncomment "index_lower_bound: 0" "index_upper_bound: 1559571" of "GenRandIndex" layer. Loss ratio is 0.3:1.
d2 = 8 Finetune weights from d2 = 4 after convergence.
```
$CAFFE_ROOT/build/tools/caffe train --solver=solver_d8.prototxt --snapshot=net_iter_XXX.solverstate 
```
You will get around 72 mm on train and 86 mm on test. Loss ratio is 0.1:1.
d2 = 16 Finetune weights from d2 = 8 after convergence
```
$CAFFE_ROOT/build/tools/caffe train --solver=solver_d16.prototxt --snapshot=net_iter_XXX.solverstate 
```
You will get around 47 mm on train and 72 mm on test. Loss ratio is 0.03:1.
d2 = 32 Finetune weights from d2 = 16 after net_iter_560000.solverstate
```
$CAFFE_ROOT/build/tools/caffe train --solver=solver_d32.prototxt --snapshot=net_iter_560000.solverstate 
```
You will get around 39 mm on train and 71 mm on test. Loss ratio is 0.03:1. I changed the weight initialization of 3D heatmap to normal distribution with 0.001 std in place of previous 0.01 as I found the MPJPE did not slump.
d2 = 64 Finetune weights from d2 = 32 after net_iter_640000.solverstate
```
$CAFFE_ROOT/build/tools/caffe train --solver=solver_d64.prototxt --snapshot=net_iter_640000.solverstate 
```
You will get around 37 mm on train and 68 mm on test. Loss ratio is 0.03:1. I again changed weight initialization of 3D heatmap from 0.001 gaussian $\rightarrow$ 0.0003.

This sounds pretty sketchy, right? Another way to train this is simply train d1 = 1, d2 = 64 from scratch. Details: \emph{missing, TO DO}

Notes:

I set use_global_stats to false during inference due to small batch size, otherwise you would get a totally different MPJPE number. I cannot recall the paper that mentioned it. Let me find the paper.
The major differences between prototxts lies in:

a) depth dimension param (Use sublime or notepad++ to search keywords "depth_dims")

b) 3d heatmap slicing layer. (Simply search "cube_")

c) 3d heatmap reshaping layer ("heatmap2_flat_scale")

d) loss ratio of 3d heatmap and 2d heatmap. Basic rule is magnitude of these two losses should be the same.

e) different weight initialization of last conv layer for 3d heatmap.
I only used L2 loss during training. Nevertheless I have Jenson-Shannon regularization loss, smooth L1 loss, adaptive loss, and integral loss in prototxt, as can be seen in figs/*.png. Adaptive loss tries to automatically balance weight magnitude of different euclidean regression loss. See code.pdf for details about integral loss.
MPJPE error of argmax operation is "error(mm)_3d_s2_max".

Windows

This line is just a test.

Don't excoriate windows. Mac, ubuntu, windows are all excellent operating systems.

Start cmd.exe run

caffe train ....

Should you have issues installing windows caffe, contact me.

FAQ

Feel free to contact me at strawberryfgalois@gmail.com if you have any problem or suggestion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

c2f-3dhm-human-caffe

External links (closely related projects)

News under the headline

Overture

Your briefing

Environment

General Structure

Installation

Data

Download data, place to

Train Index:

Test Index:

Trained models

Download models, place to

Kick off the testing

Full testing

Training

Notes:

Windows

This line is just a test.

FAQ

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
caffe_code		caffe_code
data		data
figs		figs
testing		testing
training		training
LICENSE		LICENSE
README.md		README.md
first man .png		first man .png
prototxt.pdf		prototxt.pdf

License

strawberryfg/c2f-3dhm-human-caffe

Folders and files

Latest commit

History

Repository files navigation

c2f-3dhm-human-caffe

External links (closely related projects)

News under the headline

Overture

Your briefing

Environment

General Structure

Installation

Data

Download data, place to

Train Index:

Test Index:

Trained models

Download models, place to

Kick off the testing

Full testing

Training

Notes:

Windows

This line is just a test.

FAQ

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages