Skip to content

h2xlab/XVO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CC BY-NC-SA 4.0

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

This repository contains the code that accompanies our ICCV 2023 paper XVO: Generalized Visual Odometry via Cross-Modal Self-Training. Please find our project page for more details.

Example 1

Overview

We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. Our XVO can efficiently learn to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task.

Dataset

We use KITTI, Argoverse 2 and nuScenes dataset along with in-the-wild YouTube videos (available soon). Please find their websites for dataset setup.

Dataset Download Link
KITTI Link
Argoverse 2 Link
nuScenes Link
YouTube Coming Soon

Environment Requirements and Installation

# create a new environment
conda create --name xvo python=3.9
conda activate xvo

# install pytorch1.13.0
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia

Our environment also requires pytorch3d, and please refer to pytorch3d for installation guidlines.

Training

Coming soon!

Evaluation

python3 test.py
cd vo-eval-tool
python3 eval_odom.py

VO evaluation tool is revised from https://github.com/Huangying-Zhan/kitti-odom-eval.

Result

We find that incorporating audio and segmentation tasks as part of the semi-supervised learning process significantly improves ego-pose estimation on KITTI.

Demo 1

Contact

Please don't hesitate to contact us if you have any remarks or questions at leilai@bu.edu or sgzk@bu.edu.

License

Our work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

ToDos

  • Test code release
  • Training code release
  • Readme Update

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published