Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference
Official PyTorch Implementation of Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference.
Please also refer to our video for more information.
If this code is helpful to your research, please consider citing our paper by:
@article{towards2024,
title={Towards Human-level 3D Relative Pose Estimation: Zero-Shot Unseen Generalization, Label/Training-Free, and A Single Reference},
author={Yuan Gao and Yajing Luo and Junhong Wang and Kui Jia and Gui-Song Xia},
year={2024},
journal={arXiv preprint arXiv:2406.18453}
}
conda create -n relativ_pose_estimation python=3.9.16
pip install -r requirements.txt
cd nvdiffrast_culling
pip install .
mkdir third_party & cd third_party
git clone https://github.com/facebookresearch/dinov2.git
cd dinov2 & mkdir weights & cd weights
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth
mkdir bop_datasets & cd bop_datasets
wget https://huggingface.co/datasets/bop-benchmark/datasets/resolve/main/lm/lm_test_all.zip # LineMOD real image datasets
unzip lm_test_all.zip
The generated query-reference pairs for the LineMOD dataset that used in our experiments are uploaded at data/gd_pairs
. Otherwise, the query-reference pairs can be automatically generated by those codes if data/gd_pairs
is empty.
Evaluate our method on the LineMOD dataset by running:
python run_lm.py
- Our
nvdiffrast_culling
is modified based onnvdiffrast
. - The semantic features are extracted by DINO v2.