The medical detection toolkit is a comprehensive framework featuring:
- 2D + 3D implementations of prevalent object detectors: e.g. Mask R-CNN , Retina Net , Retina U-Net .
- Modular and light-weight structure ensuring sharing of all processing steps (incl. backbone architecture) for comparability of models.
- training with bounding box and/or pixel-wise annotations.
- dynamic patching and tiling of 2D + 3D images (for training and inference).
- weighted consolidation of box predictions across patch-overlaps, ensembles, and dimensions .
- monitoring + evaluation simultaneously on object and patient level.
- 2D + 3D output visualizations.
- integration of COCO mean average precision metric .
- integration of MIC-DKFZ batch generators for extensive data augmentation .
- easy modification to evaluation of instance segmentation and/or semantic segmentation.
 He, Kaiming, et al. "Mask R-CNN" ICCV, 2017
 Lin, Tsung-Yi, et al. "Focal Loss for Dense Object Detection" TPAMI, 2018.
 Jaeger, Paul et al. "Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection" , 2018
Setup package in virtual environment
git clone https://github.com/pfjaeger/medicaldetectiontoolkit.git . cd medicaldetectiontoolkit virtualenv -p python3 venv source venv/bin/activate pip3 install -e .
Install MIC-DKFZ batch-generators
cd .. git clone https://github.com/MIC-DKFZ/batchgenerators cd batchgenerators pip3 install -e . cd mdt
Prepare the Data
This framework is meant for you to be able to train models on your own data sets.
An example data loader (for the LIDC lung nodule data set) is provided in medicaldetectiontoolkit/experiments including thorough documentation to ensure a quick start for your own project. Also the data generator / data loader for the toy experiments described in  are provided, which can be used for running in-depth analyses, but also depicts a nice tool for testing functionalities.
Set I/O paths, model and training specifics in the configs file: medicaldetectiontoolkit/experiments/your_experiment/configs.py
Train the model:
python exec.py --mode train --exp_source experiments/my_experiment --exp_dir path/to/experiment/directory
This copies snapshots of configs and model to the specified exp_dir, where all outputs will be saved. By default, the data is split into 60% training and 20% validation and 20% testing data to perform a 5-fold cross validation (can be changed to hold-out test set in configs) and all folds will be trained iteratively. In order to train a single fold, specify it using the folds arg:
python exec.py --folds 0 1 2 .... # specify any combination of folds [0-4]
python exec.py --mode test --exp_dir path/to/experiment/directory
This runs the prediction pipeline and saves all results to exp_dir.
This framework features all models explored in  (implemented in 2D + 3D): The proposed Retina U-Net, a simple but effective Architecture fusing state-of-the-art semantic segmentation with object detection,
also implementations of prevalent object detectors, such as Mask R-CNN, Faster R-CNN+ (Faster R-CNN w\ RoIAlign), Retina Net, U-Faster R-CNN+ (the two stage counterpart of Retina U-Net: Faster R-CNN with auxiliary semantic segmentation), DetU-Net (a U-Net like segmentation architecture with heuristics for object detection.)
This framework features training with pixelwise and/or bounding box annotations. To overcome the issue of box coordinates in
data augmentation, we feed the annotation masks through data augmentation (create a pseudo mask, if only bounding box annotations provided) and draw the boxes afterwards.
Consolidation of predictions (Weighted Box Clustering)
Multiple predictions of the same image (from test time augmentations, tested epochs and overlapping patches), result in a high amount of boxes (or cubes), which need to be consolidated. In semantic segmentation, the final output would typically be obtained by averaging every pixel over all predictions. As described in , weighted box clustering (WBC) does this for box predictions:
Visualization / Monitoring
Histograms of matched output predictions for training/validation/testing are plotted per foreground class:
Input images + ground truth annotations + output predictions of a sampled validation abtch are plotted after each epoch (here 2D sampled slice with +-3 neighbouring context slices in channels):
Zoomed into the last two lines of the plot:
How to cite this code
Please cite the original publication .
The code is published under the Apache License Version 2.0.