Skip to content

Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.

License

Notifications You must be signed in to change notification settings

Nana2929/PubMedCLIP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PubMedCLIP in Medical Visual Question Answering

This repository includes PubMedCLIP, the fine-tuned version of CLIP with ROCO image--caption pairs. We also provide the pipelines for encorporating PubMedCLIP as the alternative pre-trained visual encoder in MEVF and QCR medical visual question answering pipelines. Our experiments illustrate that PubMedCLIP results in up tp 3% improvement in the medical visual question answering.

Citation

If you use this work in academic publication, please cite the arXiv paper by Sedigheh Eslami, Gerard de Melo, and Christoph Meinel:

Sedigheh Eslami, Gerard de Melo, Christoph Meinel (2021). 
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
arXiv e-prints 2112.13906, 2021.

BibTeX entry:

@article{EslamiDeMeloMeinel2021CLIPMedical,
  author = {{Eslami}, Sedigheh and {de Melo}, Gerard and {Meinel}, Christoph},
   title = {Does {CLIP} Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?},
  journal = {arXiv e-prints},
  keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning},
  year = 2021,
  month = dec,
  eid = {arXiv:2112.13906},
  archivePrefix = {arXiv},
  eprint = {2112.13906},
  primaryClass = {cs.CV},
}

About

Fine-tuning CLIP using ROCO dataset which contains image-caption pairs from PubMed articles.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.8%
  • Shell 3.2%