This repo is the official implementation for CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation
Our code is built on top of VideoPose3D.
The code is developed and tested under the following environment
- Python 3.8.2
- PyTorch 1.7.1
- CUDA 11.0
You can create the environment:
conda env create -f crossformer.yml
Our code is compatible with the dataset setup introduced by Martinez et al. and Pavllo et al.. Please refer to VideoPose3D to set up the Human3.6M dataset (./data directory).
We provide the pre-trained 81-frame model (CPN detected 2D pose as input) here. To evaluate it, put it into the ./checkpoint
directory and run:
python run_crossformer.py -k cpn_ft_h36m_dbb -f 81 -c checkpoint --evaluate best_epoch44.4.bin
We also provide pre-trained 81-frame model (Ground truth 2D pose as input) here. To evaluate it, put it into the ./checkpoint
directory and run:
python run_crossformer.py -k gt -f 81 -c checkpoint --evaluate best_epoch_gt_28.5.bin
- To train a model from scratch (CPN detected 2D pose as input), run:
python run_crossformer.py -k cpn_ft_h36m_dbb -f 27 -lr 0.00004 -lrd 0.99
- To train a model from scratch (Ground truth 2D pose as input), run:
python run_crossformer.py -k gt -f 81 -lr 0.0004 -lrd 0.99
81 frames achieves 28.5 mm (MPJPE).
We keep our code consistent with VideoPose3D. Please refer to their project page for further information.
Part of our code is borrowed from VideoPose3D. We thank the authors for releasing the codes.