KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation

This repository includes the code for the paper KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation. A large part of it has been taken from the code of the paper:

PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation
Sida Peng, Yuan Liu, Qixing Huang, Xiaowei Zhou, Hujun Bao
CVPR 2019

(we have kept the names of most scripts unchanged). Actually, KVN builds upon PVNet, with modifications to integrate the differentiable RANSAC layer and to address a stereo camera setup.

If you want to learn more, you may also check out the paper video:


If you use this code or the Transparent Tableware Dataset (TTD) in your research, please cite the following paper:

I. Donadi and A. Pretto, "KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation," in IEEE Robotics and Automation Letters, vol. 9, no. 4, pp. 3498-3505, April 2024, doi: 10.1109/LRA.2024.3367508.

BibTeX entry:

  title={{KVN}: {K}eypoints Voting Network with
  Differentiable {RANSAC} for Stereo Pose Estimation},
  author={Donadi, Ivano and Pretto, Alberto},
  journal={IEEE Robotics and Automation Letters},

Prerequisites (tested on Ubuntu 20.04)

  • Python 3.8
  • CUDA drivers (tested on version 11.3)
  • cuDNN libraries (tested on version 8.9.7)

Python3 prerequisistes

$ pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url
$ pip3 install --no-cache-dir -r /requirements.txt

(More updated versions of the packages may work as well, but have not been tested).

Required libraries

$ sudo apt-get install build-essential git cmake libgoogle-glog-dev libceres-dev
$ sudo apt-get install libatlas-base-dev libeigen3-dev libsuitesparse-dev libopencv-dev
$ sudo apt-get install libyaml-cpp-dev python3-pip python3-numpy python3-tk

(Please note: only Ceres Solver v.1.14.0 or older version is supported)

Build the C++ stuff

$ cd iterative_pnp_stereo/
$ mkdir build
$ cd build/
$ cmake ..
$ make -j3
$ cd ../..

Build the Cython stuff:

$ cd lib/csrc
$ cd dcn_v2
$ python3 build_ext --inplace --force
$ cd ../ransac_voting
$ python3 build_ext --inplace --force
$ cd ../nn
$ python3 build_ext --inplace --force
$ cd ../fps
$ python3 build_ext --inplace --force
$ cd ../uncertainty_pnp
$ python3 build_ext --inplace --force
$ cd ../../..

TOD dataset

To replicate the experiments presented in the paper you will first need to download the original TOD dataset, available at here. The zip file for each object should be unzipped inside the data/ directory and renamed to add a _orig suffix. For example, the original dataset for object 'heart_0' should be in data/heart_0_orig. To convert TOD's annotation into KVN format (the one used by PVNet, see pvnet_dataset_format.txt), you can use the script convert_all_textures located inside the tod_utils/ directory, which requires the absolute path to the data folder and the name of the object. Assuming to be inside the KVN/ directory and assuming that we want to convert the dataset for the object heart_0, the command to use is the following:

$ sh tod_utils/ data heart_0

This process might take a while since it has to generate all ground truth object segmentation masks for the right camera images. Metadata such as keypoints 3D position and object models is taken from the corresponding folder inside data/metafiles. The generated annotations are stored inside data/sy_datasets divided by object and texture. For example, the annotations for the object heart_0 when the training textures are 1-9 and the test texture is texture 0 are stored inside data/sy_datasets/heart_0_stereo_0.

KVN training

To train a model, ensure to have completed the TOD dataset annotation steps detailed above. From here you can start training the model with the script:

$ python3 -h

    usage: [-h] -d DATASET_DIR -m MODEL_DIR [-b BATCH_SIZE] [-n NUM_EPOCH] [-e EVAL_EP] [-s SAVE_EP] [--bkg_imgs_dir BKG_IMGS_DIR] [--disable_resume] [--cfg_file CFG_FILE]

    KVN training tool

    -h, --help            show this help message and exit
    -d DATASET_DIR, --dataset_dir DATASET_DIR
                            Input directory containing the training dataset
    -m MODEL_DIR, --model_dir MODEL_DIR
                            Output directory where the trained models will be stored
    -b BATCH_SIZE, --batch_size BATCH_SIZE
                            Number of training examples in one forward/backward pass (default = 2)
    -n NUM_EPOCH, --num_epoch NUM_EPOCH
                            Number of epochs to train (default = 240)
    -e EVAL_EP, --eval_ep EVAL_EP
                            Number of epochs after which to evaluate (and eventually save) the model (default = 5)
    -s SAVE_EP, --save_ep SAVE_EP
                            Number of epochs after which to save the model (default = 5)
    --bkg_imgs_dir BKG_IMGS_DIR
                            Optional background images directory, to be used to augment the dataset
    --disable_resume      If specified, disable train resume and start a new train
    --cfg_file CFG_FILE   Low level configuration file, DO NOT CHANGE THIS PARAMETER IF YOU ARE NOT SURE (default = configs/custom_dsac.yaml)

    You need at least to provide an input training dataset and to specify the output directory where the trained models will be stored.The best model checkpoint will be stored inside the best_model subdirectory

For example, it is possible to train the model on textures 1-9 of object heart_0 and store the trained models in the results/ directory (to be manually created) by using:

python3 -d data/sy_datasets/heart_0_stereo_0 -m results -n 150 -e 10 -s 10 --cfg_file configs/custom_dsac.yaml

This command will train the model using differentiable RANSAC (DSAC) as the training loss and will save a checkpoint every 10 epochs inside the results folder, named 9.pth, 19.pth and so on. The checkpoint with the best validation parameters is saved inside results/best_model. To perform the same training but with a standard PVNet network, you simply need to choose configs/custom_vanilla.yaml as the configuration file. Additionally, it is possible to perform random background augmentation by explicating the --bkg_imgs option with the path to the backgrounds dataset. We provide the set of backgrounds that were used in our experiments at this link. In case of correct execution, the output of this script will be the network's training process and the evaluation results on the validation set at the specified epochs interval.

KVN evaluation

Assuming to have completed the training procedure at the previous section, it is now possible to evaluate the trained model on texture 0 of object heart_0 with the script:

$ python3 -h
    usage: [-h] -d DATASET_DIR -m MODEL [-o OUTPUT_DIR] [--num_iters NUM_ITERS] [--cfg_file CFG_FILE] ...

    KVN evaluation tool

    optional arguments:
      -h, --help            show this help message and exit
      -d DATASET_DIR, --dataset_dir DATASET_DIR
                            Input directory containing the test dataset
      -m MODEL, --model MODEL
                            KVN trained model
      -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                            Optional output dir where to save the results
      --num_iters NUM_ITERS
                            Number of evaluation iterations to average over (default=10)
      --cfg_file CFG_FILE   Low level configuration file, DO NOT CHANGE THIS PARAMETER IF YOU ARE NOT SURE (default = configs/custom_dsac.yaml)

    You need at least to provide an (annotated) input test dataset and a KVN trained model

In the case of the current example, it should be used in the following way (assuming that the best checkpoint is 89.pth):

$ python3 -d data/sy_datasets/heart_0_stereo_0 -m results/best_model/89.pth --cfg_file configs/custom_dsac.yaml

If you are evaluating a model trained with the classical PVNet loss, then you just need to specify configs/custom_vanilla.yaml as the configuration file. The output of this script will be the quantitative evaluation results (to save average 2d projections, ADD, and <2cm metrics results in a test.txt file, use the '--output_dir' option).


During the evaluation, it is completely normal to receive the following message, especially when evaluating a model in the early stages of training:] Linear solver failure. Failed to compute a step: Eigen LLT decomposition failed.

KVN prediction visualization

To obtain a qualitative evaluation of the trained model on a test dataset, you can use the script, which will show, for every image of the test dataset, both ground truth and predicted 3d object bounding boxes and the reprojections of the estimated 3d keypoints on the left camera image.

$ python3 -h

    usage: [-h] -d DATASET_DIR -m MODEL [--cfg_file CFG_FILE] ...

    Locate an object from an input image

    -h, --help            show this help message and exit
    -d DATASET_DIR, --dataset_dir DATASET_DIR
                            Input directory containing the test dataset
    -m MODEL, --model MODEL
                            KVN trained model
    --cfg_file CFG_FILE   Low level configuration file, DO NOT CHANGE THIS PARAMETER IF YOU ARE NOT SURE (default = configs/custom_dsac.yaml)

With the same assumptions of our previous examples, it is possible to use this script to visualize predictions for the texture 0 of the object heart_0 with the following command:

$ python3 -m ../results/best_model/89.pth -d ../data/sy_datasets/heart_0_stereo_0 --cfg_file configs/custom_dsac.yaml 

Trained models

We provide the trained models for 2 TOD objects at this link

TTD dataset

Here you can download the dataset archive ( of the Transparent Tableware Dataset (TTD) along with the documentation of the dataset. Unizp the archive file and prepare the annotations in KVN format by using the script located inside the ttd_utils/ directory. We refer here to the standard 'mixed' benchmark (i.e., dataset partitioning), the same used in the experiments of the paper, whose files are found in the following ttd directory default_benchmarks/stereo/mixed. For example, assuming the ttd dataset has been extracted in the data/ directory, to convert the train, validation, and test subsets for the object 'glass' and save the annotations into the 'data/ttd_annotations' directory, run the script 3 times with the following parameters:

$ python3 ttd_utils/ -i data/ttd/default_benchmarks/stereo/mixed/train.json -n glass -d data/ttd -o data/ttd_annotations
$ python3 ttd_utils/ -i data/ttd/default_benchmarks/stereo/mixed/test_val.json -n glass -d data/ttd -o data/ttd_annotations
$ python3 ttd_utils/ -i data/ttd/default_benchmarks/stereo/mixed/test_val.json -n glass -d data/ttd -o data/ttd_annotations

To convert the annotations for the other TTD dataset, replace the object name 'glass' ('-n' option) with one of 'candle_holder', 'coffee_cup', 'little_bottle', or 'wine_glass'.

KVN training and evaluation

For detailed instructions on how to perform training and evaluation, you can follow the instructions in the previous TOD sections, taking care to use the appropriate config files for TTD (configs/ours_dsac.yaml for KVN and configs/ours_vanilla.yaml for PVNet) and using the path to the corresponding annotations directory. For example, having done the conversion as above for the 'glass' object, the directory with the annotations for this object is data/ttd_annotations/glass, and so on. In this case, you can train KVN with the following command:

$ python3 -d data/ttd_annotations/glass -m results_kvn/glass -n 150 -e 10 -s 10 --cfg_file configs/ours_dsac.yaml

where the trained models will be stored in this case in the results_kvn/glass directory (to be manually created).
Supposing that the best model after training is 119.pth, you can then evaluate it in the test subset by using in this case the following command:

$ python3 -d data/ttd_annotations/glass -m results_kvn/glass/best_model/119.pth --cfg_file configs/ours_dsac.yaml

Trained models

We provide the trained models for KVN for all TTD objects at this link