This repository provides the official implementation of training Vision Transformers (ViT) for (2D) medical imaging tasks as well as the usage of the pre-trained ViTs in the following paper:
Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
Johns Hopkins University
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023
paper | code
- Instructions for preparing datasets.
- Instructions for pretraining and fine-tuning.
Our codebase follows the MAE Official and uses some additional packages.
You may use one of the following commands to build environments with Conda
and Pip
.
Conda:
conda create -n medical_mae -f medical_mae.yml
Pip:
conda create -n medical_mae python=3.8
conda activate medical_mae
pip install -r requirements.txt
The MIMIC-CXR, CheXpert, and ChestX-ray14 datasets are public available on their official sites. You can download or request the access to them under the agreements.
You may also download them through the following links for research only and follow the official agreements.
MIMIC-CXR (JPG): https://physionet.org/content/mimic-cxr-jpg/2.0.0/
CheXpert (v1.0-small): https://www.kaggle.com/datasets/ashery/chexpert
ChestX-ray14 : https://www.kaggle.com/datasets/nih-chest-xrays/data
The pre-training instruction is in PRETRAIN.md.
The fine-tuning instruction is in FINETUNE.md.
The following table provides the pre-trained checkpoints used in Table 1:
You can download all the weights in the following table with this link (google drive).
Model | Pretrained Dataset | Method | Pretrained | Finetuned (NIH Chest X-ray) | mAUC |
---|---|---|---|---|---|
DenseNet-121 | ImageNet | Categorization | torchvision official | google drive | 82.2 |
ResNet-50 | ImageNet | MoCo v2 | google drive | google drive | 80.9 |
ResNet-50 | ImageNet | BYOL | google drive | google drive | 81.0 |
ResNet-50 | ImageNet | SwAV | google drive | google drive | 81.5 |
DenseNet-121 | X-rays (0.3M) | MoCo v2 | google drive | google drive | 80.6 |
DenseNet 121 | X-rays (0.3M) | MAE | google drive | google drive | 81.2 |
ViT-Small/16 | ImageNet | Categorization | DeiT Official | google drive | 79.6 |
ViT-Small/16 | ImageNet | MAE | google drive | google drive | 78.6 |
ViT-Small/16 | X-rays (0.3M) | MAE | google drive | google drive | 82.3 |
ViT-Base/16 | X-rays (0.5M) | MAE | google drive | google drive | 83.0 |
Model | Pretrained Dataset | Finetuned (Chest X-ray) | mAUC | Finetuned (CheXpert) | mAUC | Finetuned (COVIDx) | Accuracy |
---|---|---|---|---|---|---|---|
ViT-Small/16 | X-rays (0.3M) | google drive | 82.3 | google drive | 89.2 | google drive | 95.2 |
ViT-Base/16 | X-rays (0.5M) | google drive | 83.0 | google drive | 89.3 | google drive | 95.3 |
If you use this code or use our pre-trained weights for your research, please cite our papers:
@inproceedings{xiao2023delving,
title={Delving into masked autoencoders for multi-label thorax disease classification},
author={Xiao, Junfei and Bai, Yutong and Yuille, Alan and Zhou, Zongwei},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={3588--3600},
year={2023}
}
This repo is under Apache 2.0 license.
This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research.
Our code is built upon facebookresearch/mae.