Skip to content

houjunlin/Awesome-Medical-Vision-Language-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 

Repository files navigation

Awesome Medical Vision Language Learning

Contents

Datasets

Dataset Year Modality Images Text
MIMIC-CXR[data][paper] 2019 Chest X-ray 377,110 227,827
CheXpert[data][paper] 2019 Chest X-ray 224,316 224,316
ROCO [data][paper] 2018 CT, Ultrasound, X-Ray, Fluoroscopy, PET,
Mammography, MRI, Angiography, PET-CT
81,825 81,825
MedICaT[data][paper] 2020 CT, Ultrasound, X-Ray, Fluoroscopy, PET,
Mammography, MRI, Angiography, PET-CT
217,060 217,060

Survey

  • VLP: A Survey on Vision-Language Pre-training. arxiv 2022. [paper]

  • Vision-Language Pre-training: Basics, Recent Advances, and Future Trends. arxiv 2022. [paper]

  • Beyond Medical Imaging: A Review of Multimodal Deep Learning in Radiology. techrxiv 2022. [paper]

Tutorial

  • Vision-Language Pretraining: Current Trends and the Future. ACL 2022. [link]

  • Recent Advances in Vision-and-Language Pre-training. CVPR 2022. [link]

Vision Language Pretraining

Text Encoder

Text Encoder Year Corpus
BioBERT 2020 PubMed
ClinicalBERT 2019 MIMIC-III
PubMedBERT 2022 PubMed
CXR-BERT 2022 PubMed+MIMIC-III/CXR

How to Train

2023

  • PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents. arxiv 2023. [paper][code]

  • [BiomedCLIP] LARGE-SCALE DOMAIN-SPECIFIC PRETRAINING FOR BIOMEDICAL VISION-LANGUAGE PROCESSING. arxiv 2023. [paper][model]

  • Vision-Language Modelling for Radiological Imaging and Reports in the Low Data Regime. MIDL 2023. [paper]

  • Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts. arxiv 2023. [paper][code]

  • [MRM] Advancing Radiograph Representation Learning with Masked Record Modeling. ICLR 2023. [paper][code]

  • [BioViL-T] Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing. CVPR 2023. [paper]

  • MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training. arxiv 2023. [paper] [code]

2022

  • [MGCA] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning. NIPS 2022. [paper][code]

  • MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. EMNLP 2022. [paper][code]

  • [M3AE] Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training. MICCAI 2022. [paper][code]

  • Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training. MICCAI 2022. [paper]

  • Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge. MM 2022. [paper][code]

  • [MedViLL] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training. JHBI 2022. [paper][code]

  • [REFERS] Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence 2022. [paper][code]

  • [BioViL] Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing. ECCV 2022. [paper]

  • [LoVT] Joint learning of localized representations from medical images and reports. ECCV 2022. [paper]

2021

  • [Local-MI] Multimodal Representation Learning via Maximization of Local Mutual Information. MICCAI 2021. [paper]

  • GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. ICCV 2021. [paper]

  • Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays. arxiv 2021. [paper]

2020

  • A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports. BIBM 2020. [paper]

  • [ConVIRT] Contrastive Learning of Medical Visual Representations from Paired Images and Text. MLHC 2022. [paper][code]

2018

  • Unsupervised Multimodal Representation Learning across Medical Images and Reports. NIPS workshop 2018. [paper]

How to Use

2023

  • Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study. ICLR 2023. [paper]

2022

  • Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains. NIPS workshop 2022. [paper]

2021

  • [PubMedCLIP] Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain. arxiv 2021. [paper][code]

Vision Language Task

Refer to Awesome-Multimodal-Applications-In-Medical-Imaging for more papers

Segmentation

  • LViT: Language meets Vision Transformer in Medical Image Segmentation. arxiv 2022. [paper][code]

Generation

  • RoentGen: Vision-Language Foundation Model for Chest X-ray Generation. arxiv 2022. [paper]

About

Papers and Public Datasets for Medical Vision-Language Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published