STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation

STRIVE is a two-stage object-navigation framework that couples an incrementally built, multi-layer environment representation with selective VLM reasoning — achieving state-of-the-art success rate and navigation efficiency across four simulated benchmarks and on a real robot.

Overview

Vision-Language Models (VLMs) bring rich priors and strong reasoning to object navigation, but two challenges limit their practical use:

Parsing and structuring complex environment information on the fly.
Deciding when and how to query a VLM — querying at every step causes unnecessary backtracking and wasted compute, especially in large continuous environments.

STRIVE addresses both by incrementally constructing a multi-layer representation of viewpoints, object nodes, and room nodes during navigation:

Viewpoints and object nodes drive intra-room exploration and accurate target localization.
Room nodes support efficient inter-room planning.

On top of this representation, a two-stage policy combines high-level planning guided by VLM reasoning with low-level VLM-assisted exploration to locate the goal object efficiently and reliably.

Results. STRIVE achieves state-of-the-art performance on HM3D v1, HM3D v2, RoboTHOR, and MP3D, improving success rate by +13.1% SR and navigation efficiency by +6.2% SPL. Real-robot validation across 120 episodes in 10 indoor environments further demonstrates its robustness.

See the paper for full details: STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation.

News

2026-01-31 — Paper accepted to ICRA 2026.

Installation

1. Clone the repository

git clone git@github.com:igzat1no/STRIVE.git
cd STRIVE

2. Create a conda environment (Python 3.12)

conda create -n strive python=3.12 -y
conda activate strive

3. Install pip dependencies

pip install -r requirements.txt

4. Install habitat-sim and habitat-lab from source

We ship small patches and bug fixes on top of upstream Habitat. Install from our forks and check out the v0.3.2 branch:

5. Install Segment Anything (SAM)

pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

6. Install MMDetection (GroundingDINO)

mim install mmengine
mim install mmcv==2.1.0
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .

Download the GroundingDINO Swin-L checkpoint:

wget https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-l_pretrain_obj365_goldg/grounding_dino_swin-l_pretrain_obj365_goldg-34dcdc53.pth

Environment Variables

⚠️ Do not commit real secrets to the repository. Set them in your shell config (e.g. ~/.zshrc or ~/.bashrc).

export GEMINI_API_KEY="<YOUR_GEMINI_API_KEY>"
export HABITAT_LAB_PATH="/path/to/habitat-lab/"
export SAM_CHECKPOINT="/path/to/sam_vit_h_4b8939.pth"
export GROUNDING_DINO_PATH="/path/to/mmdetection/"
export GROUNDING_DINO_CHECKPOINT="/path/to/grounding_dino_swin-l_pretrain_obj365_goldg-34dcdc53.pth"
export HM3D_DATA_PATH="/path/to/HM3D_v2/"
export MP3D_DATA_PATH="/path/to/MP3D/"

Reload your shell:

source ~/.zshrc

Verify that everything is set:

python -c "import os; keys=['GEMINI_API_KEY','HABITAT_LAB_PATH','SAM_CHECKPOINT','GROUNDING_DINO_PATH','GROUNDING_DINO_CHECKPOINT','HM3D_DATA_PATH','MP3D_DATA_PATH']; print({k: bool(os.getenv(k)) for k in keys})"

Usage

Run the HM3D evaluation benchmark (default configuration):

python objnav_benchmark_with_process_obs.py

Citation

If you find STRIVE useful in your research, please consider citing:

@misc{zhu2025strivestructuredrepresentationintegrating,
      title={STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation},
      author={Haokun Zhu and Zongtai Li and Zhixuan Liu and Wenshan Wang and Ji Zhang and Jonathan Francis and Jean Oh},
      year={2025},
      eprint={2505.06729},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2505.06729},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cv_utils		cv_utils
llm_utils		llm_utils
mapping_utils		mapping_utils
.gitignore		.gitignore
README.md		README.md
config_utils.py		config_utils.py
constants.py		constants.py
mapper_with_process_obs.py		mapper_with_process_obs.py
objnav_agent_with_process_obs.py		objnav_agent_with_process_obs.py
objnav_benchmark_with_process_obs.py		objnav_benchmark_with_process_obs.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation

Overview

News

Installation

1. Clone the repository

2. Create a conda environment (Python 3.12)

3. Install pip dependencies

4. Install habitat-sim and habitat-lab from source

5. Install Segment Anything (SAM)

6. Install MMDetection (GroundingDINO)

Environment Variables

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation

Overview

News

Installation

1. Clone the repository

2. Create a conda environment (Python 3.12)

3. Install pip dependencies

4. Install habitat-sim and habitat-lab from source

5. Install Segment Anything (SAM)

6. Install MMDetection (GroundingDINO)

Environment Variables

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages