histopathology image caption

A dataset of 262,777 patches extracted from 991 H&E-stained gastric slides with Adenocarcinoma subtypes paired with captions extracted from medical reports. For more details see paper.

captions.csv contains id,subtype,text columns, where id designates the whole slide image id from which the patches were extracted. The patches filenames have id in the prefix as follows: {id}_{random hash}.jpg. The patches can be downloaded from here.

Dataset is provided for research use only.

If you use this Dataset, please cite:

@misc{tsuneki2022inference,
      title={Inference of captions from histopathological patches}, 
      author={Masayuki Tsuneki and Fahdi Kanavati},
      year={2022},
      eprint={2202.03432},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

Running training script for baseline model

build the docker image

docker build -t histo-captions .

Assuming the the patches have been extracted at /mnt/data/patches/x20 and the captions.csv file is at /mnt/data/captions.csv, you can run it with default settings with

docker run -v /mnt/data:/data -it histo-captions  python train.py

To check for available options, run

docker run -v /mnt/data:/data -it histo-captions  python train.py --help

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Dockerfile		Dockerfile
README.md		README.md
captions.csv		captions.csv
captions_adc.jpg		captions_adc.jpg
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

histopathology image caption

Running training script for baseline model

About

Releases

Packages

Contributors 2

Languages

masatsuneki/histopathology-image-caption

Folders and files

Latest commit

History

Repository files navigation

histopathology image caption

Running training script for baseline model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages