Oryon: Open-Vocabulary Object 6D Pose Estimation [CVPR2024 highlight]

This is the repository that contains source code for the Oryon website and its implementation. This work is to be presented at CVPR'24.

Roadmap

Code release: 26 March 2024
Added test and train splits: 7 Dec 2023
Website and arxiv released: 4 Dec 2023

Installation

First of all, download oryon_data.zip and pretrained_models.zip from the release of this repository. The first contains the ground-truth information and the specification of the image pairs used, the second contains the third-party checkpoint used in Oryon (i.e, the tokenizer and PointDSC).

Run setup.sh to install the environment and download the external checkpoints.

Running Oryon

By default all experiments folder are created in exp_data/. This can be modified in the config file. Training with default settings:

python run_train.py exp_name=baseline

Run the following to obtain results with the basic 4 configurations. By default, the last checkpoint is used.

python run_test.py -cp exp_data/baseline/ dataset.test.name=nocs test.mask=predicted

python run_test.py -cp exp_data/baseline/ dataset.test.name=nocs test.mask=oracle

python run_test.py -cp exp_data/baseline/ dataset.test.name=toyl test.mask=predicted

python run_test.py -cp exp_data/baseline/ dataset.test.name=toyl test.mask=oracle

Dataset preparation

Our data is based on three publicly available datasets:

REAL275, used for test. We sample from the real test partition.
Toyota-Light (TOYL), used for test. We sample from real the test partition from the BOP challenge.
ShapeNet6D (SN6D), used for training. Note that SN6D itself does not provide textual annotations, but it uses object models from ShapeNetSem, which do provide object names and synsets for each object model.

We sample scenes from each dataset to build the training and testing partition (20000 image pairs for SN6D and 2000 for REAL275 and TOYL), and report in the following folder the scene ids and image ids used for each partition.

REAL275 (referred as NOCS)

From the repository download the test ground-truth, the object models and the data of the real_test partition. This should result in three files: obj_models.zip, gts.zip and real_test.zip

Run the prepare_nocs.sh script to unzip and run the preprocessing.

By default this will create the nocs folder in data, and can be changed by modifying the above script.

Toyota-Light

Download the object models and the test partition from the official BOP website:

wget https://bop.felk.cvut.cz/media/data/bop_datasets/tyol_models.zip

wget https://bop.felk.cvut.cz/media/data/bop_datasets/tyol_test_bop19.zip

Run the prepare_toyl.sh script to unzip and run the preprocessing.

By default this will create the toyl folder in data, and can be changed by modifying the above script.

ShapeNet6D

Download the images from the official repository of ShapeNet6D, and the object models of ShapeNet from HuggingFace.

Run the prepare_sn6d.sh script to unzip and run the preprocessing.

Note that each image of ShapeNet6D shows a different random background, so that we consider each image as being part of a different scene. ShapeNet6D provides a map from their object ids to the object ids of the original ShapeNetSem: we use this map to associated the object name and synonym sets of ShapeNetSem to each object model in ShapeNet6D.

NB: ShapeNet6D is not currently supported for evaluation (i.e., the symmetry annotations needed by the BOP toolkit are missing).

Acknowledgements

This work was supported by the European Union’s Horizon Europe research and innovation programme under grant agreement No 101058589 (AI-PRISM), and made use of time on Tier 2 HPC facility JADE2, funded by EPSRC (EP/T022205/1).

We thank the authors of the following repositories for open-sourcing the code, on which we relied for this project:

Citing Oryon

@inproceedings{corsetti2024oryon,
  title= {Open-vocabulary object 6D pose estimation}, 
  author = {Corsetti, Jaime and Boscaini, Davide and Oh, Changjae and Cavallaro, Andrea and Poiesi, Fabio},
  journal = {IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR)},
  year = {2024}
}

Website License

The website template is from Nerfies.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
bop_toolkit_lib		bop_toolkit_lib
configs		configs
models		models
scripts		scripts
static		static
utils		utils
README.md		README.md
datasets.py		datasets.py
environment.yml		environment.yml
index.html		index.html
losses.py		losses.py
net.py		net.py
pipeline.py		pipeline.py
prepare_nocs.sh		prepare_nocs.sh
prepare_sn6d.sh		prepare_sn6d.sh
prepare_toyl.sh		prepare_toyl.sh
pyrightconfig.json		pyrightconfig.json
run_test.py		run_test.py
run_train.py		run_train.py
setup.sh		setup.sh
setup_bop.py		setup_bop.py

jcorsetti/oryon

Folders and files

Latest commit

History

Repository files navigation

Oryon: Open-Vocabulary Object 6D Pose Estimation [CVPR2024 highlight]

Roadmap

Installation

Running Oryon

Dataset preparation

REAL275 (referred as NOCS)

Toyota-Light

ShapeNet6D

Acknowledgements

Citing Oryon

Website License

About

Resources

Stars

Watchers

Forks

Languages