Providing a fast and easy IO toolbox for on-the-fly and local preprocessing in Keras, particularly for medical image segmentation tasks. This is currently in alpha, and I'm open to anyone who wants to contribute!
Channels_last
only- Kind of slow? Multiprocessing doesn't really help performance.
- Single input (x,y) cases in examples and for
BaseTransformGenerator
- The
BaseGenerator
inspired by Shervine's introduction to keras.utils.Sequence's. - The data augmentation is reliant on MIC-DKFZ's batchgenerators and their
transforms
API. - Some of the patch extraction in
patch_utils.py
andpatch.py
are from ellisdg's 3DUnetCNN repository. - Some of the I/O functions are directly from or inspired by MedicalDetectionToolkit and Isensee's BRaTS2017 Submission.
- Note: I still need to update my license to accommodate for their Apache 2.0 license.
- Can achieve state-of-the-art results
- Customizable
- Positive Slice Sampling!!
A lot of the suggestions is located within individual modules' README.md
's, but here's a list of the current top priorities:
- Revise the
n_workers
interface forBaseTransformGenerator
- Fix up the examples.
- Make some easy, simple examples.
- Provide links to my personal use cases w/Colab Notebooks.
- Tests (particularly for any I/O functions)
This specific module is for low-level abstract generators for you to reuse in your in your own keras pipelines. The current base generators are:
BaseGenerator
: Basic framework for generating thread-safe data in keras. (no preprocessing and channels_last) Based on https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-flyBaseTransformGenerator
: Loads data and applies data augmentation withbatchgenerators.transforms
.- Supports channels_last
- Loads data with nibabel
Overall, these base generators are made with the intent to be open to a variety of potential uses (particularly targeting image segmentation i/o pipelines). Hence, I made it so that the main arguments for these generators are:
list_IDs
: list of all your filenames. The idea behind this is that you can just load your (image, mask) pairs for segmentation easily (assuming they have the same file name). For classification tasks, this may not be optimal, but this directory was targeting mainly segmentation users. I'll look into branching out with a separate classification base generator though.data_dirs
: [x_dir, y_dir]. This was a more ambiguous approach so that you can freely play around with your directory structure.- Changes to look out for in the future: Dividing this into just
x_dir
,y_dir
- Building a base generator for multiple inputs (iterating through
x_dir
,y_dir
)
- Changes to look out for in the future: Dividing this into just
batch_size
: batch size for the networkn_channels
: The number of channels that your data has. This parameter exists to handle cases where the number of input channels does not match that of the output segmentation.n_classes
: The number of classes. Again, this is to handle cases where the number of input channels does not match that of the output segmentation.
This module contains a bunch of submodules with easy, documented, reusable, and common i/o functions. Note that these are not to fulfill the purpose of data augmentation functions (that is fulfilled by batchgenerators.transform
and the BaseTransformGenerator
). The functions are divided based on how they manipulate images, such as:
intensity_io.py
- contains functions that manipulate the intensity distribution of input images in some sort of way, such as
whitening
(z-score normalization),minmax_normalize
,clip_upper_lower_percentile
, etc.
- contains functions that manipulate the intensity distribution of input images in some sort of way, such as
shape_io.py
- contains functions that manipulate the shape of input images such as
reshape
,extract_nonint_region
, andresample_array
, etc.
- contains functions that manipulate the shape of input images such as
patch_utils.py
andpatch.py
- Contains a bunch of patch extraction functions from ellisdg's 3DUnetCNN repository
patch_utils.py
is the OOP version of the functions inpatch.py
.- These particular submodules need to be refactored.
misc_utils.py
- Contains a bunch of miscellaneous functions, such as:
get_list_IDs
: divides filenames into train/validation/test setsget_multi_class_labels
: one-hot encoding function for segmentation (includes the option to remove the background class)sanity_checks
: checks for NaNs, and makes sure that the labels are one-hot encoded.add_channel
: adds a gray scale channel dimension forchannels_last
compute_pad_value
: Computes the minimum pixel intensity of the entire dataset for the pad value (if it's not 0)- Need to add the
KFold
function.
- Contains a bunch of miscellaneous functions, such as:
custom_augmentations.py
- Actually for data augmentation purposes (on-the-fly preprocessing)
- In the prelimary phase currently
- Notable utility functions:
- Patch extraction:
get_random_slice_idx
,get_positive_idx
- Patch extraction:
The goal of this module is to provide evaluation and prediction tools for medical segmentation/classification. Currently, this module is still in-the-works to make a more generalizable framework.
- Patch evaluation and aggregation
- Cleaner examples
- Pipeline for easy general inference