Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer, ICCV 2023

Abstract

Image retrieval-based cross-view localization methods often lead to very coarse camera pose estimation, due to the limited sampling density of the database satellite images. In this paper, we propose a method to increase the accuracy of a ground camera's location and orientation by estimating the relative rotation and translation between the ground-level image and its matched/retrieved satellite image. Our approach designs a geometry-guided cross-view transformer that combines the benefits of conventional geometry and learnable cross-view transformers to map the ground-view observations to an overhead view. Given the synthesized overhead view and observed satellite feature maps, we construct a neural pose optimizer with strong global information embedding ability to estimate the relative rotation between them. After aligning their rotations, we develop an uncertainty-guided spatial correlation to generate a probability map of the vehicle locations, from which the relative translation can be determined. Experimental results demonstrate that our method significantly outperforms the state-of-the-art. Notably, the likelihood of restricting the vehicle lateral pose to be within 1m of its Ground Truth (GT) value on the cross-view KITTI dataset has been improved from $35.54%$ to $76.44%$, and the likelihood of restricting the vehicle orientation to be within $1^{\circ}$ of its GT value has been improved from $19.64%$ to $99.10%$.

Experiment Dataset

We use three existing dataset to do the experiments: KITTI, Ford-AV and Oxford RobotCar. For our collected satellite images for KITTI and Ford-AV, please first fill this Google Form, we will then send you the link for download.

KITTI: Please first download the raw data (ground images) from http://www.cvlibs.net/datasets/kitti/raw_data.php, and store them according to different date (not category). Your dataset folder structure should be like:

KITTI:

raw_data:

2011_09_26:

  2011_09_26_drive_0001_sync:
  
    image_00:

image_01:

image_02:

image_03:

oxts:

  ...
  
2011_09_28:

2011_09_29:

2011_09_30:

2011_10_03:

satmap:

2011_09_26:

2011_09_29:

2011_09_30:

2011_10_03:

Ford-AV: The ground images and camera calibration files can be accessed from https://avdata.ford.com/downloads/default.aspx. Please follow their original structure to save them on your computer. For the satellite images, please put them under their corresponding log folder. Here is an example:

Ford:

2017-08-04:

V2:

  Log1:
  
    2017-08-04-V2-Log1-FL

    SatelliteMaps_18:

    grd_sat_quaternion_latlon.txt

    grd_sat_quaternion_latlon_test.txt

2017-10-26:

Calibration-V2:

For the Cross-view Oxford RobotCar dataset, please refer to this github page: https://github.com/tudelft-iv/CrossViewMetricLocalization.git.
For the VIGOR dataset, please refer to the following two github pages: https://github.com/Jeff-Zilence/VIGOR.git https://github.com/tudelft-iv/SliceMatch.git

Codes

Training on 2DoF(only location) pose estimation:

python train_kitti_2DoF.py --batch_size 1

python train_ford_2DoF.py --batch_size 1 --train_log_start 0 --train_log_end 1

python train_ford_2DoF.py --batch_size 1 --train_log_start 1 --train_log_end 2

python train_ford_2DoF.py --batch_size 1 --train_log_start 2 --train_log_end 3

python train_ford_2DoF.py --batch_size 1 --train_log_start 3 --train_log_end 4

python train_ford_2DoF.py --batch_size 1 --train_log_start 4 --train_log_end 5

python train_ford_2DoF.py --batch_size 1 --train_log_start 5 --train_log_end 6

python train_oxford_2DoF.py --batch_size 1

python train_VIGOR_2DoF.py --area cross

python train_VIGOR_2DoF.py --area same
Training on 3DoF (joint location and translation) pose estimation:

python train_kitti_3DoF.py --batch_size 1

python train_ford_3DoF.py --batch_size 1 --train_log_start 0 --train_log_end 1

python train_ford_3DoF.py --batch_size 1 --train_log_start 1 --train_log_end 2

python train_ford_3DoF.py --batch_size 1 --train_log_start 2 --train_log_end 3

python train_ford_3DoF.py --batch_size 1 --train_log_start 3 --train_log_end 4

python train_ford_3DoF.py --batch_size 1 --train_log_start 4 --train_log_end 5

python train_ford_3DoF.py --batch_size 1 --train_log_start 5 --train_log_end 6
Evaluation:

Plz simply add "--test 1" after the training commands. E.g.

python train_kitti_3DoF.py --batch_size 1 --test 1

You are free to change batch size according to your own GPU memory.

Models:

Our trained models are available here.

Publications

This work is submitted to ICCV 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
dataLoader		dataLoader
LICENSE		LICENSE
Oxford_dataset.py		Oxford_dataset.py
README.md		README.md
RNNs.py		RNNs.py
VGG.py		VGG.py
cfgnode.py		cfgnode.py
cross_attention.py		cross_attention.py
jacobian.py		jacobian.py
model_oxford.py		model_oxford.py
model_vigor.py		model_vigor.py
models_ford.py		models_ford.py
models_kitti.py		models_kitti.py
swin_transformer.py		swin_transformer.py
swin_transformer_cross.py		swin_transformer_cross.py
train_ford_2DoF.py		train_ford_2DoF.py
train_ford_3DoF.py		train_ford_3DoF.py
train_kitti_2DoF.py		train_kitti_2DoF.py
train_kitti_3DoF.py		train_kitti_3DoF.py
train_oxford_2DoF.py		train_oxford_2DoF.py
train_vigor.py		train_vigor.py
train_vigor_2DoF.py		train_vigor_2DoF.py
utils.py		utils.py

License

YujiaoShi/Boosting3DoFAccuracy

Folders and files

Latest commit

History

Repository files navigation

Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer, ICCV 2023

Abstract

Experiment Dataset

Codes

Models:

Publications

About

Resources

License

Stars

Watchers

Forks

Languages