Skip to content

masatsuneki/histopathology-image-caption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

histopathology image caption

A dataset of 262,777 patches extracted from 991 H&E-stained gastric slides with Adenocarcinoma subtypes paired with captions extracted from medical reports. For more details see paper.

captions.csv contains id,subtype,text columns, where id designates the whole slide image id from which the patches were extracted. The patches filenames have id in the prefix as follows: {id}_{random hash}.jpg. The patches can be downloaded from here.

Dataset is provided for research use only.

If you use this Dataset, please cite:

@misc{tsuneki2022inference,
      title={Inference of captions from histopathological patches}, 
      author={Masayuki Tsuneki and Fahdi Kanavati},
      year={2022},
      eprint={2202.03432},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

Running training script for baseline model

build the docker image

docker build -t histo-captions .

Assuming the the patches have been extracted at /mnt/data/patches/x20 and the captions.csv file is at /mnt/data/captions.csv, you can run it with default settings with

docker run -v /mnt/data:/data -it histo-captions  python train.py 

To check for available options, run

docker run -v /mnt/data:/data -it histo-captions  python train.py --help

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published