This repository contains the source code and binary datasets related to
Gergő Galiger and Zalán Bodó, Acta Universitatis Sapientiae Informatica 2023
We publish the followings here:
binbagnets/models
: PyTorch implementation of BagNet models [1] adapted for binary classification,binbagnets/data
: data manipulation functions for format conversion, binary dataset creation and data augmentation,binbagnets/plots
: BagNet heatmap human-readable analysis plot,examples
: example Python scripts demonstrating the usage of the above functions.
The binary datasets used in the paper can be found on Google Drive. These were created based on the two weakly-annotated datasets published in [2].
Feel free to create custom binary datasets using the provided data manipulation functions following the examples below.
pip install git+https://github.com/galigergergo/BolFTissueDetect.git
The code provides simple means to initialize the BinBagNet models in Pytorch. After installation the models can be loaded in the following way:
from binbagnets.models import pytorchnet
pytorch_model = pytorchnet.binbagnet17(pretrained=True)
To change the model replace binbagnet17
with binbagnet9
or binbagnet33
. The last number refers to the maximum local patch size that the network can integrate over. We also included binbagnet_small
, which is a compact version of the architecture used for testing purposes.
Pre-trained models use weights trained on ImageNet from [1], which has to be further trained to be used on binary datasets.
For the usage of data manipulation and plotting functionalities, please refer to the examples below.
As in [1], the binary BagNet models expect inputs with the standard torchvision preprocessing:
- RGB channels,
- [channel, x, y] format,
- [0, 1] interval pixel values
- pixel values normalized by mean and standard deviation:
- mean = [0.485, 0.456, 0.406]
- std = [0.229, 0.224, 0.225]
The Python scripts from the examples
directory demonstrate the usage of the source code from this repository. An example dataset is used in these scripts, which is located in the datasets/EXAMPLE
directory and contains a small version of the LUAD-HistoSeg dataset published in [2].
In order to create binary datasets for the BinBagNet models, the example dataset has to be converted from the original LUAD-HistoSeg format to ImageNet format. This formatted dataset can now be used to create binary datasets for all classes. The binary datasets can further be expanded using data augmentation. Run examples/dataset_preparation.py
to carry out all of these dataset processing tasks:
python examples/dataset_preparation.py
The newly created binary datasets can now be used to train a BinBagNet model using the examples/binary_training.py
script. This is a modified version of the ImageNet training script from [3], which includes the required image preprocessing steps mentioned above. To train a BagNet17 model for binary classification on the newly created LYM_aug dataset for two epochs, run:
python examples/binary_training.py datasets/EXAMPLE/binary/LYM_aug/data --epochs 2 -a binbagnet17
The trained model weights from the YM_aug/models
directory can now be used to generate a heatmap analysis plot for one of the segmented images from the test set of the example dataset. The plot can be generated by running:
python examples/heatmap_analysis_plot.py
If you find this project useful, please consider citing our paper in resulting publications:
@article{galiger2023explainable,
title={Explainable patch-level histopathology tissue type detection with Bag-of-local-Features models and data augmentation},
author={Galiger, Gergő and Bodó, Zalán},
journal={Acta Universitatis Sapientiae, Informatica},
year={2023},
url={about:blank},
}
[1] Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet. Wieland Brendel and Matthias Bethge, ICLR, 2019.
[2] Multi-Layer Pseudo-Supervision for Histopathology Tissue Semantic Segmentation using Patch-level Classification Labels. Han, Chu, et al., Medical Image Analysis 80, 2022.