XVO: Generalized Visual Odometry via Cross-Modal Self-Training

This repository contains the code that accompanies our ICCV 2023 paper XVO: Generalized Visual Odometry via Cross-Modal Self-Training. Please find our project page for more details.

Overview

We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. Our XVO can efficiently learn to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task.

Dataset

We use KITTI, Argoverse 2 and nuScenes dataset along with in-the-wild YouTube videos (available soon). Please find their websites for dataset setup.

Dataset	Download Link
KITTI	Link
Argoverse 2	Link
nuScenes	Link
YouTube	Coming Soon

Environment Requirements and Installation

# create a new environment
conda create --name xvo python=3.9
conda activate xvo

# install pytorch1.13.0
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia

Our environment also requires pytorch3d, and please refer to pytorch3d for installation guidlines.

Training

Coming soon!

Evaluation

python3 test.py
cd vo-eval-tool
python3 eval_odom.py

VO evaluation tool is revised from https://github.com/Huangying-Zhan/kitti-odom-eval.

Result

We find that incorporating audio and segmentation tasks as part of the semi-supervised learning process significantly improves ego-pose estimation on KITTI.

Contact

Please don't hesitate to contact us if you have any remarks or questions at leilai@bu.edu or sgzk@bu.edu.

License

Our work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

ToDos

Test code release
Training code release
Readme Update

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
images		images
model		model
results/KITTI		results/KITTI
vo-eval-tool		vo-eval-tool
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
params.py		params.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

model

model

results/KITTI

results/KITTI

vo-eval-tool

vo-eval-tool

.gitignore

.gitignore

README.md

README.md

dataset.py

dataset.py

params.py

params.py

test.py

test.py

Repository files navigation

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Overview

Dataset

Environment Requirements and Installation

Training

Evaluation

Result

Contact

License

ToDos

About

Releases

Packages

Contributors 2

Languages

h2xlab/XVO

Folders and files

Latest commit

History

Repository files navigation

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Overview

Dataset

Environment Requirements and Installation

Training

Evaluation

Result

Contact

License

ToDos

About

Resources

Stars

Watchers

Forks

Languages