Skip to content

lzw-lzw/awesome-remote-sensing-vision-language-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 

Repository files navigation

Awesome remote sensing vision language models

This is a repository for visual language models in remote sensing, including advanced methods and commonly used datasets in different applications, such as image-text retrieval, visual question answering, pretraining, etc.

If you find any relevant papers that are not included here, please feel free to pull requests at any time.

PRs Welcome

Table of Contents

Surveys

Paper Published in Code/Project
Vision-Language Models in Remote Sensing: Current Progress and Future Trends arxiv 2023 -
The Potential of Visual ChatGPT For Remote Sensing arxiv 2023 -
Brain-inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey JSTARG 2023 -

Remote Sensing Vision Language Model

Paper Published in Code/Project
RSGPT: A Remote Sensing Vision Language Model and Benchmark arxiv 2023 code
RemoteGLM 2023 code
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis arxiv 2023 -
Towards Automatic Satellite Images Captions Generation Using Large Language Models arxiv 2023 -
GeoChat: Grounded Large Vision-Language Model for Remote Sensing arxiv 2023 code
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing AAAI 2024 code

Applications

Pretraining

Paper Published in Code/Project
S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions arxiv 2023 code
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing arxiv 2023 code
RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model arxiv 2023 Project

Image Captioning

Paper Published in Code/Project
Deep Semantic Understanding of High Resolution Remote Sensing Image CITS 2016 -
Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image? TGRS 2017 -
Exploring models and data for remote sensing image caption generation TGRS 2017 code
Natural language escription of remote sensing images based on deep learning IGARSS 2017 -
Description Generation for Remote Sensing Images Using Attribute Attention Mechanism Remote Sensing 2019 -
Vaa:Visual aligning attention model for remote sensing image captioning IEEE Access 2019 -
Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning IEEE Access 2019 -
A multi-level attention model for remote sensing image captions Remote Sensing 2020 -
Remote sensing image captioning via variational autoencoder and reinforcement learning Knowledge-Based Systems 2020 -
Truncation cross entropy loss for remote sensing image captionin TGRS 2020 -
Word–Sentence Framework for Remote Sensing Image Captioning TGRS 2020 code
A novel SVM-based decoder for remote sensing image captioning TGRS 2021 -
High-resolution remote sensing image captioning based on structured attention TGRS 2021 code
Exploring transformer and multilabel classification for remote sensing image captioning GRSL 2022 -
NWPU-captions dataset and mlca-net for remote sensing image captioning TGRS 2022 -
Remote Sensing Image Change Captioning With Dual-Branch Transformers: A New Method and a Large Scale Dataset TGRS 2022 code
Transforming remote sensing images to textual descriptions INT J APPL EARTH OBS 2022 -
Remote-sensing image captioning based on multilayer aggregated transformer GRSL 2022 -
Vlca: vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning J SYST ENG ELECTRON 2023 -
Multi-source interactive stair attention for remote sensing image captioning Remote Sensing 2023 -
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning arxiv 2023 code
Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning arxiv 2023 code

Text-based Image Generation

Paper Published in Code/Project
Retro-Remote Sensing: Generating Images From Ancient Texts J-STARS 2019 -
Remote sensing image augmentation based on text description for waterside change detection Remote Sensing 2021 -
Text-to-remote-sensing-image generation with structured generative adversarial networks GRSL 2021 -
Txt2img-MHN:Remote sensing image generation from text using modern hopfield network arxiv 2022 code

Image-text Retrieval

Paper Published in Code/Project
Textrs: Deep bidirectional triplet network for matching text to remote sensing images. Remote Sensing 2020 -
Deep unsupervised embedding for remote sensing image retrieval using textual cues Applied Sciences 2020 -
A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing J-STARS 2021 -
A lightweight multi-scale crossmodal text-image retrieval method in remote sensing TGRS 2021 code
Remote sensing cross-modal text-image retrieval based on global and local information TGRS 2022 code
Multilanguage transformer for improved text to remote sensing image retrieval J-STARS 2022 -
Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieva TGRS 2022 code
Contrasting dual transformer architectures for multi-modal remote sensing image retrieval Applied Sciences 2023 -
Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval arxiv 2023 -
Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval arxiv 2023 -

Visual Question Answering

Paper Published in Code/Project
RSVQA: Visual question answering for remote sensing data TGRS 2020 code
Mutual Attention Inception Network for Remote Sensing Visual Question Answering TGRS 2021 code
How to find a good image-text embedding for remote sensing visual question answering? ECML-PKDD 2021 -
Cross-Modal Visual Question Answering for Remote Sensing Data: The International Conference on Digital Image Computing: Techniques and Applications DICTA 2021 -
RSVQA meets bigearthnet: a new,large-scale, visual question answering dataset for remote sensing IGARSS 2021 code
Self-Paced Curriculum Learning for Visual Question Answering on Remote Sensing Data IGARSS 2021 -
From easy to hard: Learning language-guided curriculum for visual question answering on remote sensing data TGRS 2022 code
Language transformers for remote sensing visual question answering IGARSS 2022 -
Open-ended remote sensing visual question answering with transformers IJRS 2022 -
Bi-modal transformer-based approach for visual question answering in remote sensing imagery TGRS 2022 -
Prompt-RSVQA: Prompting visual context to a language model for remote sensing visual question answering CVPRW 2022 -
Change detection meets visual question answering TGRS 2022 code
A spatial hierarchical reasoning network for remote sensing visual question answering TGRS 2023 -
Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images JURSE 2023 -
LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing IGARSS 2023 code
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs arXiv 2023 code

Visual Grounding

Paper Published in Code/Project
Visual Grounding in Remote Sensing Images ACMMM 2022 data
RSVG: Exploring data and models for visual grounding on remote sensing data TGRS 2023 code

Scene Classification

Paper Published in Code/Project
Zero-shot scene classification for high spatial resolution remote sensing images TGRS 2017 -
Fine-grained object recognition and zero-shot learning in remote sensing imagery TGRS 2017 -
Structural alignment based zero-shot classification for remote sensing scenes ICECE 2018 -
A distance-constrained semantic autoencoder for zero-shot remote sensing scene classification J-STARS 2021 -
Learning deep crossmodal embedding networks for zero-shot remote sensing image scene classification TGRS 2021 -
Generative adversarial networks for zero-shot remote sensing scene classification Applied Sciences 2022 -
APPLeNet: Visual Attention Parameterized Prompt Learning for Few-Shot Remote Sensing Image Generalization using CLIP CVPR 2023 code

Object Detection

Paper Published in Code/Project
Text semantic fusion relation graph reasoning for few-shot object detection on remote sensing images Remote Sensing 2023 -
Few-shot object detection in aerial imagery guided by textmodal knowledge TGRS 2023 -

Semantic Segmentation

Paper Published in Code/Project
Semi-supervised contrastive learning for few-shot segmentation of remote sensing images Remote Sensing 2022 -
Few-shot segmentation of remote sensing images using deep metric learning GRSL 2022.
Language-aware domain generalization network for cross-scene hyperspectral image classification TGRS 2023 code
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model arxiv 2023 code
RRSIS: Referring Remote Sensing Image Segmentation arxiv 2023 -
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting arxiv 2023 -

Others

Dataset

Image Captioning Dataset

Dataset Home/Github Download link
RSICD Github [BaiduYun] [Google Drive]
Sydney-Captions Github [BaiduYun]
UCM-Captions Github [BaiduYun]
NWPU-RESISC45 Github [BaiduYun] [OneDrive]
DIOR-Captions - -
RS-5M Github [HuggingFace]
LEVIR-CC Github Google Drive
SkyScript github

Text-based Image Generation Dataset

Text-based Image Retrieval Dataset

Dataset Home/Project Download link
RSITMD Github [BaiduYun] [Google Drive]

Visual Question Answering Dataset

Dataset Home/Project Download link
RSVQA Home [data]
RSVQA×BEN [Github] [Home] -
RSIVQA Github -
CDVQA Github -

Visual Grounding Dataset

Dataset Home/Project Download link
DIOR-RSVG Github [Google Drive]

Scene Classification Dataset

Dataset Home/Project Download link
NWPU-RESISC45 Home [OneDrive] [BaiduYun]
AID Home [OneDrive] [BaiduYun]
UC Merced Land-Use(UCM) Home -
SATIN Home [HuggingFace]

Object Detection Dataset

Dataset Home/Project Download link
NWPU VHR-10 Home [OneDrive] [BaiduYun]
DIOR Home [Google Drive] [BaiduYun]
FAIR1M - [BaiduYun]

Semantic Segmentation Dataset

Dataset Home/Project Download link
Vaihingen Home [BaiduYun]
Potsdam Home [BaiduYun]
Toronto Home -
GID Home [BaiduYun code:GID5] [OneDrive]