histo_cap_transformers

Paper Link: https://arxiv.org/abs/2312.01435. Accepted at IEEE International Symposium on Biomedical Imaging, 2024 held at Athens, Greece.

Here is the inference pipeline. Refer to this Jupyter notebook for a compact demonstration of how the inference is done using our trained model.

Google Drive Link to trained model weights.

Method overview

This work builds on my prior work on HistoCap, accepted in ML4H 2023 Findings track, by incorporating BERT based decoder instead of a LSTM decoder. We are also able to include tissue type, gender and the actual caption into the actual caption generated, while we were limited by vocabulary in case of an LSTM based decoder.

Abstract:

Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for language modeling-based decoder for report generation, we can build a fairly performant and portable report generation mechanism that takes into account the whole of the high resolution image, instead of just the patches. Our method allows us to not only generate and evaluate captions that describe the image, but also helps us classify the image into tissue types and the gender of the patient as well. Our best performing model achieves a 79.98% accuracy in Tissue Type classification and 66.36% accuracy in classifying the sex of the patient the tissue came from, with a BLEU-4 score of 0.5818 in our caption generation task.

To run the lightning scripts, use requirements.txt to install the necessary packages and then run the training scripts. Details on the different script are given bellow.

data_files/ directory: Contains train/test/val pandas dataframe stored as a pickle file to preserve data types.
model_vit_bert.py: Contains model definitions
dataloader.py: Contains the custom PyTorch dataloader.
patch_4k_h5.py: Contains the code for patching high resolution WSIs in the form SVS images and save the patches in an hdf5 format.
generate4k_256clsreps.py: Extract the representations from pre-trained ViT. WARNING: To run this script download the following GitHub repo: https://github.com/mahmoodlab/HIPT

Training scripts:

Train script: training_script_vit_bert_5layers.py.
Evaluation script: evaluation5layers.py

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data_files		data_files
img		img
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
evaluation5layers.py		evaluation5layers.py
generate4k_256clsreps.py		generate4k_256clsreps.py
generation_notebook_5layers_updated.ipynb		generation_notebook_5layers_updated.ipynb
model_utils.py		model_utils.py
model_vit_bert.py		model_vit_bert.py
patch_4k_h5.py		patch_4k_h5.py
requirements.txt		requirements.txt
training_script_vit_bert_5layers.py		training_script_vit_bert_5layers.py
utils.py		utils.py
vision_transformer4k.py		vision_transformer4k.py

License

ssen7/histo_cap_transformers

Folders and files

Latest commit

History

Repository files navigation

histo_cap_transformers

Method overview

Training scripts:

About

Resources

License

Stars

Watchers

Forks

Languages