An instruction–visual semantic consistency framework that explicitly aligns instructions with visual observations by identifying and preserving landmark regions before visual compression. This project uploads files by improving NaVid-VLN-CE, introducing the instruction-visual-semantic-consistency framework, which further enhances its original level.
In the repository, the train file is the training code, landmark_head is the functional code, and the arch file is the modification of NaVid-VLN-CE, which includes the interface of landmark_head.py.
Notice to Readers We would like to remind readers that the code in this repository is directly associated with our manuscript submitted to The Visual Computer, titled:
"Enhancing Cross-Modal Semantic Alignment for Vision-and-Language Navigation in Continuous Environments"
This repository supports the research presented in the paper and includes implementations that reflect the methods and experiments described in the manuscript.