Skip to content

It is important not to split images of the same patient between the test and train sets to avoid overfitting. This repository splits a sample Covid/Normal classification dataset into test and train sets in a patient aware and stratified manner.

License

Notifications You must be signed in to change notification settings

maftouni/Patient_Aware_Splitter

Repository files navigation

Latest Version Covid_Patient_Aware_Image_Split

It is important not to split images of the same patient between the test and train sets to avoid overfitting. This repository splits a sample Covid/Normal classification dataset into test and train sets in a patient aware and stratified manner. The meta-data file is used to group the images based on Patient-ID. For example, all the images colored green belong to the same patient and should be either in the test or the train split.

Screenshot

While grouping should be done strictly to ensure there is no splitting images of a patient, stratification can be done approximately i.e. as well as possible. This code also assumes that all images of one patient have the same stratification category (diagnosis), meaning that all the images coming from the same Patient ID are either Covid or NonCovid.

Installation

pip install patient-aware-splitter

To split images into 4 folders (train/Covid, train/NonCovid, test/Covid, test/NonCovid) inside splitted folder:

split_to_folders.py

To split images into a dictionary:

split_into_dictionary.py

To split images into a torch Dataloader:

split_into_dataloader.py

About

It is important not to split images of the same patient between the test and train sets to avoid overfitting. This repository splits a sample Covid/Normal classification dataset into test and train sets in a patient aware and stratified manner.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages