Skip to content

ml-stat-Sustech/MarsRetrieval

Repository files navigation

MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars

📘 Paper | 🤗 Datasets

Shuoyuan Wang1, Yiran Wang2, Hongxin Wei1*

1Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen, China
2Department of Earth and Space Sciences, Southern University of Science and Technology, Shenzhen, China
*Corresponding author

Overview

We introduce MarsRetrieval, an extensive retrieval benchmark for evaluating the utility of vision-language models in Martian geospatial discovery. Specifically, MarsRetrieval organizes evaluation into 3 complementary tasks: (1) paired image–text retrieval, (2) landform retrieval and (3) global geo-localization, covering multiple spatial scales and diverse geomorphic origins. MarsRetrieval aims to bridge the gap between multimodal AI capabilities and the needs of real-world planetary research.


Setup

1. Installation

For installation and other package requirements, please follow the instructions detailed in docs/INSTALL.md.

2. Data preparation

Please follow the instructions at docs/DATASET.md to prepare all datasets.

Quick Start

Example runs by task (e.g., PE-Core, Qwen3-VL-Embedding, MarScope):

Paired Image-Text Retrieval

GPU_ID=0
bash scripts/paired_image_text_retrieval/openclip.sh ${GPU_ID}
bash scripts/paired_image_text_retrieval/qwen3_vl_embedding.sh ${GPU_ID}
bash scripts/paired_image_text_retrieval/marscope.sh ${GPU_ID}

Landform Retrieval

GPU_ID=0
bash scripts/landform_retrieval/openclip.sh ${GPU_ID}
bash scripts/landform_retrieval/qwen3_vl_embedding.sh ${GPU_ID}
bash scripts/landform_retrieval/marscope.sh ${GPU_ID}

Global Geo-Localization

We highly recommend building the database in distributed mode first, then you can run experiments on a single GPU.

# Distributed DB build with 8 GPUs
bash scripts/geolocalization/openclip.sh 0,1,2,3,4,5,6,7

# Single-GPU runs
GPU_ID=0
bash scripts/geolocalization/openclip.sh ${GPU_ID}
bash scripts/geolocalization/qwen3_vl_embedding.sh ${GPU_ID}
bash scripts/geolocalization/marscope.sh ${GPU_ID}

For more models, see the scripts under scripts/geolocalization, scripts/landform_retrieval, and scripts/paired_image_text_retrieval, respectively.

Supported Models

Encoder-based

Model Paper Code
DFN2B-CLIP-ViT-L-14 link link
ViT-L-16-SigLIP-384 link link
ViT-L-16-SigLIP2-512 link link
PE-Core-L-14-336 link link
BGE-VL-large link link
aimv2-large-patch14-224 link link
aimv2-large-patch14-448 link link
dinov3-vitl16 link link

MLLM-based

Model Paper Code
E5-V link link
gme link link
B3++ link link
jina-embeddings-v4 link link
VLM2Vec-V2.0 link link
Ops-MM-embedding-v1 link link
Qwen3-VL-Embedding link link

Citation

If you find this useful in your research, please consider citing:

@article{wang2026marsretrieval,
  title={MarsRetrieval: Benchmarking Vision-Language Models for Planetary-Scale Geospatial Retrieval on Mars},
  author={Wang, Shuoyuan and Wang, Yiran and Wei, Hongxin},
  journal={arXiv preprint arXiv:2602.13961},
  year = {2026}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published