Skip to content

MrZihan/HNR-VLN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang and Shuqiang Jiang

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating the future environment of candidate locations. To this end, some existing works predict RGB images for future environments, while this strategy suffers from image distortion and high computational cost. To address these issues, we propose the pre-trained hierarchical neural radiance representation model (HNR) to produce multi-level semantic features for future environments, which are more robust and efficient than pixel-wise RGB reconstruction. Furthermore, with the predicted future environmental representations, our lookahead VLN model is able to construct the navigable future path tree and select the optimal path branch via efficient parallel evaluation. Extensive experiments on the VLN-CE datasets confirm the effectiveness of our method.

image

TODOs

  • Release the pre-training code of the Hierarchical Neural Radiance Representation Model.
  • Release the checkpoints of the Hierarchical Neural Radiance Representation Model.
  • Tidy the pre-training code for easy execution.
  • Release the fine-tuning code of the Lookahead VLN Model.
  • Release the checkpoints of the Lookahead VLN Model.

Requirements

  1. Install Habitat simulator: follow instructions from ETPNav and VLN-CE.
  2. Download the Habitat-Matterport 3D Research Dataset (HM3D) from habitat-matterport-3dresearch
    hm3d-train-habitat-v0.2.tar
    hm3d-val-habitat-v0.2.tar
    
  3. Download annotations (PointNav, VLN-CE) and trained models from Baidu Netdisk.
  4. Download pre-trained waypoint predictor from link.
  5. Install torch_kdtree for K-nearest feature search from torch_kdtree.
  6. Install tinycudann for faster multi-layer perceptrons (MLPs) from tiny-cuda-nn.

Citation

@inproceedings{wang2024lookahead,
  title={Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation},
  author={Wang, Zihan and Li, Xiangyang and Yang, Jiahao and Liu, Yeqi and Hu, Junjie and Jiang, Ming and Jiang, Shuqiang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Acknowledgments

Our code is based on ETPNav, nerf-pytorch and torch_kdtree. Thanks for their great works!

About

Official implementation of Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation (CVPR'24 Highlight).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published