GitHub - farlit/VPN: Official implementation of "VPN: Visual Prompt Navigation"

VPN: Visual Prompt Navigation

Shuo Feng, Zihan Wang, Yuchen Li, Rui Kong, Hengyi Cai, Shuaiqiang Wang, Gim Hee Lee, Piji Li, Shuqiang Jiang

This repository is the official implementation of VPN: Visual Prompt Navigation.

While natural language is commonly used to guide embodied agents, the inherent ambiguity and verbosity of language often hinder the effectiveness of language-guided navigation in complex environments. To this end, we propose Visual Prompt Navigation (VPN), a novel paradigm that guides agents to navigate using only user-provided visual prompts within 2D top-view maps. This visual prompt primarily focuses on marking the visual navigation trajectory on a top-down view of a scene, offering intuitive and spatially grounded guidance without relying on language instructions. It is more friendly for non-expert users and reduces interpretive ambiguity. We build VPN tasks in both discrete and continuous navigation settings, constructing two new datasets, R2R-VP and R2R-CE-VP, by extending existing R2R and R2R-CE episodes with corresponding visual prompts. Furthermore, we introduce VPNet, a dedicated baseline network to handle the VPN tasks, with two data augmentation strategies: view-level augmentation (altering initial headings and prompt orientations) and trajectory-level augmentation (incorporating diverse trajectories from large-scale 3D scenes), to enhance navigation performance. Extensive experiments evaluate how visual prompt forms, top-view map formats, and data augmentation strategies affect the performance of visual prompt navigation.

Requirements for VPN

Install Matterport3D simulator and Python Environment for R2R-VP: follow instructions here.
Download annotations, preprocessed features, trained models and preprocessing code from Baidu Netdisk (You should the folder "datasets" in "VPN/").
Training & Evaluation for R2R-VP:

conda activate vlnduet
cd map_nav_src
bash scripts/run_r2r.sh

Requirements for VPN-CE

Install Habitat simulator and Python Environment for R2R-CE-VP: follow instructions here.
Download annotations, preprocessed features, trained models and preprocessing code from Baidu Netdisk (You should the folder "data" in "VPN/VPN_CE/").
Training & Evaluation for R2R-CE-VP:

conda activate vlnce
cd VPN_CE
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash train 2333  # training
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash eval  2333  # evaluation

Citation

If you find some useful for your work, please consider citing our paper:

Feng S, Wang Z, Li Y, et al. VPN: Visual Prompt Navigation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2026, 40(22): 18253-18261.

Contact

Feel free to contact Shuo Feng via email fengshuo@nuaa.edu.cn for more support.

Acknowledgments

Our code is based on VLN-DUET, ETPNav and ScaleVLN. Thanks for their great works!

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
VPN_CE		VPN_CE
map_nav_src		map_nav_src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VPN: Visual Prompt Navigation

Shuo Feng, Zihan Wang, Yuchen Li, Rui Kong, Hengyi Cai, Shuaiqiang Wang, Gim Hee Lee, Piji Li, Shuqiang Jiang

Requirements for VPN

Requirements for VPN-CE

Citation

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VPN: Visual Prompt Navigation

Shuo Feng, Zihan Wang, Yuchen Li, Rui Kong, Hengyi Cai, Shuaiqiang Wang, Gim Hee Lee, Piji Li, Shuqiang Jiang

Requirements for VPN

Requirements for VPN-CE

Citation

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages