3D Human Pose Lifting with Grid Convolution

By Yangyuxuan Kang, Yuyang Liu, Anbang Yao, Shandong Wang, and Enhua Wu.

This repository is an official Pytorch implementation of "3D Human Pose Lifting with Grid Convolution", dubbed GridConv. The paper is published in AAAI 2023 as an oral presentation.

GridConv is a new powerful representation learning paradigm to lift a 2D human pose to its 3D estimation, which relies on a learnable regular weave-like grid pose representation instead of the predominant irregular graph structures.

Figure 1. Overview of grid lifting network regressing 3D human pose from 2D skeleton input.

Regarding the definition and implementation of SGT designs and grid convolution layers, please refer to our paper for thorough interpretations.

Installation

Our experiments are conducted on an GPU server with the Ubuntu 18.04 LTS system, Python 2.7, and PyTorch 1.4.

cd GridConv
conda env create -f environment.yml
conda activate gridconv

Dataset Setup

Human3.6M

Get preprocessed h36m.zip (Google Drive)
mv h36m.zip ${GridConv_repo}/src/
unzip h36m.zip

Customized Dataset

Directory structure should look like:

${GridConv_repo}
├──src
  ├── data
    ├── DATASET_NAME
        ├── train_custom_2d_unnorm.pth.tar
        ├── train_custom_3d_unnorm.pth.tar
        ├── test_custom_2d_unnorm.pth.tar
        ├── test_custom_3d_unnorm.pth.tar

*_2d_unnorm.pth.tar are dict, whose keys are (SUBJECT, ACTION, FILE_NAME) and values are 2d positions with shape of (N, 34).
*_3d_unnorm.pth.tar are dict, whose keys are (SUBJECT, ACTION, FILE_NAME) and values are dict of { 'pelvis':N*3, 'joints_3d':N*51, 'camera':[fx,fy,cx,cy] }.

Results and Models

Figure 2. Qualitative results on Internet videos.

Grid lifting network with 2 residual blocks of D-GridConv, 256 latent channels, 5x5 grid size, trained on Human3.6M trainset for 100 epochs.

Evaluation results of pretrained models on Human3.6M testset (S9, S11):

2D Detections	SGT design	MPJPE	PA-MPJPE	Google Drive
GT	Handcrafted	37.15	28.32	model
GT	Learnable	36.39	28.29	model
HRNet	Handcrafted	47.93	37.85	model
HRNet	Learnable	47.56	37.43	model

Evaluation of pretrained models

Test on HRNet input using handcrafted SGT:

cd ./src
python main.py --eval --input hrnet \
               --load pretrained_model/hrnet_d-gridconv.pth.tar \
               --lifting_model dgridconv --padding_mode c z

Test on HRNet input using learnable SGT:

python main.py --eval --input hrnet \
               --load pretrained_model/hrnet_d-gridconv_autogrids.pth.tar \
               --lifting_model dgridconv_autogrids --padding_mode c z

Test on ground truth input using handcrafted SGT:

python main.py --eval --input gt \
               --load pretrained_model/gt_d-gridconv.pth.tar \
               --lifting_model dgridconv --padding_mode c r

Test on ground truth input using learnable SGT:

python main.py --eval --input gt \
               --load pretrained_model/gt_d-gridconv_autogrids.pth.tar \
               --lifting_model dgridconv_autogrids --padding_mode c r

Training the model from scratch

If you want to reproduce the results of our pretrained models, run the following commands.

python main.py --exp hrnet_dgridconv-autogrids_5x5 \
               --input hrnet --lifting_model dgridconv_autogrids \
               --grid_shape 5 5 --num_block 2 --hidsize 256 \
               --padding_mode c z

Training on 1 1080Ti GPU typically costs about 20 minute per epoch. We train each model for 100 epochs with Adam optimizer. Several settings will influence the performance:

--grid_shape H W, we set grid pose as 5x5 size as default. When enabling learnable SGT, grid size can be set as arbitrary values and may have influence on the accuracy.
--padding_mode c/z/r c/z/r, we pad grid pose with 1x1 border before delivering into nn.Conv2d. c/z/r denote respectively ciruclar / zeros / replicate padding. We found (c,r) works better for GT input and (c,z) for HRNet input.

See src/tool/argument.py for more details about argument setups.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{kang2023gridconv,
  title={3D Human Pose Lifting with Grid Convolution},
  author={Yangyuxuan Kang and Yuyang Liu and Anbang Yao and Shandong Wang and Enhua Wu},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2023},
}

License

GridConv is released under the Apache license. We encourage use for both research and commercial purposes, as long as proper attribution is given.

Acknowledgement

This repository is built based on ERD_3DPose, 3d_pose_baseline_pytorch, and fine-tuned HRNet detection is fetched from EvoSkeleton. We thank the authors for kindly releasing the codes.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
src		src
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

src

src

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

Repository files navigation

3D Human Pose Lifting with Grid Convolution

Installation

Dataset Setup

Human3.6M

Customized Dataset

Results and Models

Evaluation of pretrained models

Training the model from scratch

Citation

License

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

OSVAI/GridConv

Folders and files

Latest commit

History

Repository files navigation

3D Human Pose Lifting with Grid Convolution

Installation

Dataset Setup

Human3.6M

Customized Dataset

Results and Models

Evaluation of pretrained models

Training the model from scratch

Citation

License

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages