Video pretraining advances 3D deep learning on chest CTs

This repository contains code to train and evaluate models on the RSNA PE dataset and the LIDC-IDRI dataset for our paper Video pretraining advances 3D deep learning on chest CTs.

System Requirements

Hardware requirements

The data processing steps requires only a standard computer with enough RAM to support the in-memory operations.

For training and testing models, a computer with sufficient GPU memory is recommended.

Software requirements

OS requirements

All models have been trained and tested on a Linux system (Ubuntu 16.04)

Python dependencies

All dependencies can be found in environment.yml

Installation

Please install Anaconda in order to create a Python environment.
Clone this repo (from the command-line: git clone git@github.com:rajpurkarlab/2021-fall-chest-ct.git).
Create the environment: conda env create -f environment.yml.
Activate the environment: source activate pe_models.
Install PyTorch 1.7.1 with the right CUDA version.

Installation should take less than 10 minutes with stable internet.

Datasets

RSNA

Download dataset from: RSNA PE Dataset

Make sure to update PROJECT_DATA_DIR in pe_models/constants.py with path to the directory that contains the RSNA dataset.

Preprocessing

Please download the pre-processed label file that contains data split and DICOM header infomation using this link and place it in the RSNA data directory.

Alternatively, you can create the pre-processed file by running:

$ python pe_models/preprocess/rsna.py

Test

To ensure that the dataset is correct and that data are loading in the correct format, run the following unittest:

$ python -W ignore -m unittest

Note that this might take a couple of minutes to complete.

You can also visually inspect example inputs in data/test/ after the unittest is complete.

LIDC

Download dataset from TCIA Public Access into a PROJECT_DATA_DIR/lidc folder.

Preprocessing

Install pylidc and set up your ~/.pylidcrc file using the official installation instructions.

You can then create all the necessary pre-processed files by running:

$ python pe_models/preprocess/lidc.py

You can then set the type in an experiment YAML to lidc-window or lidc-2d to train on the LIDC dataset.

Usage

To train a model, run the following:

python run.py --config <path_to_config_file> --train

For more documentation, please run:

python run.py --help

To test a model, use the --test flag, making sure that either the --checkpoint flag is specified or that the config YAML contains a checkpoint entry:

python run.py --config <path_to_config_file> --checkpoint <path_to_ckpt> --test

To featurize all studies in a dataset (to run a 1d model for example), use the --test_split all flag

Example configs can be found in ./configs/

Run hyperparameter sweep with wandb

Example hyperparameter sweep configs for each model can be found in ./configs/

wandb sweep <path_to_sweep_config>
wandb agent <sweep-id>

Custom dataset:

To train/test model on custom datasets:

Please ensure that your data adhere to the same format as the RSNA/LIDC dataset. (See Example)
Create a dataloader similar to RSNA/LIDC in ./datasets and update ./datasets/init.py to include the name of your custom dataloader.
Make sure the data.type in your config file points to the name of your dataloader.

Demo

To run train/test script on a simulated demo dataset, use:

python run.py --config ./data/demo/resnet18_demo.yaml --checkpoint <path_to_ckpt> --test

You should expect the following results:

{'test/mean_auprc': 0.9107142686843872,
 'test/mean_auroc': 0.9166666865348816,
 'test/negative_exam_for_pe_auprc': 0.9107142686843872,
 'test/negative_exam_for_pe_auroc': 0.9166666865348816,
 'test_loss': 0.6920164227485657,
 'test_loss_epoch': 0.6920164227485657}

With a GPU, this should take less than 10 minutes to run.

Citation

If our work was useful in your research, please consider citing

@article{ke2023video,
    title={Video Pretraining Advances 3D Deep Learning on Chest CT Tasks}, 
    author={Alexander Ke and Shih-Cheng Huang and Chloe P O'Connell and Michal Klimont and Serena Yeung and Pranav Rajpurkar},
    booktitle={Medical Imaging with Deep Learning},
    year={2023},
    eprint={2304.00546},
    archivePrefix={arXiv},
    primaryClass={eess.IV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
data		data
pe_models		pe_models
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
run.py		run.py

License

rajpurkarlab/chest-ct-pretraining

Folders and files

Latest commit

History

Repository files navigation

Video pretraining advances 3D deep learning on chest CTs

Table of Contents

System Requirements

Hardware requirements

Software requirements

OS requirements

Python dependencies

Installation

Datasets

RSNA

Preprocessing

Test

LIDC

Preprocessing

Usage

Run hyperparameter sweep with wandb

Custom dataset:

Demo

Citation

About

Resources

License

Stars

Watchers

Forks

Languages