Skip to content

If you want to train a breast cancer classifier or a segmentation model using the CBIS-DDSM dataset, this repository may help you to extract the mammograms and the masks from the original folder easily.

License

Notifications You must be signed in to change notification settings

sposso/CBIS-DDSM-DATASET

Repository files navigation

CBIS-DDSM-DATASET

If you want to train a breast cancer classifier or a segmentation model using the CBIS-DDSM dataset, this repository may help you to easily extract the mammograms and the masks from the original folder.

Setup

  1. The dataset can be downloaded directly from theofficial site.
  2. If you want to go into detail about the CBIS-DDSM dataset, you can check this paper. It describes how to use the dataset and how the dataset was built.

Quantitative Description

Despite the paper stating that CBIDS-DDSM has 753 calcification cases and 891 mass cases, it is difficult to determine how many images this dataset actually has. According to the metadata provided in the CSV files, CBIS-DDSM contains 3103 mammograms, 465 of which have more than one abnormality. 2.458 mamograms (79.21%) belong to the training set, and 645 (20.79% ) belong to the test set. Furthermore, 3568 cropped mammograms and 3568 masks are included.

A bit of explanation of the repository's functions

Mammograms_code.ipynb:

This script contains a function that retrieves the path of all mammograms on your local machine and merges each image path with its pathology in a data frame. The data frame is subsequently saved as a CSV file.

mask_code.ipynb:

This script contains a function that retrieves the path of all patches in your local machine and then merges each mask path with its pathology in a data frame. This data frame is subsequently saved as CSV file. Note: There are more masks than mammograms since some mammograms have more than one lesion.

convert_dicom.ipynb:

The images provided by CBIS-DDSM (mammograms, masks, crops of abnormalities) are saved in DICOM format. This function saves 16-bit mammogram from dicom as rescaled 16-bit png file.

Original_Split.ipynb:

This script is used to create the test and training set according to the standardized split given by the official paper. The path of all images is stored in a dataframe which is saved as CSV file.

About

If you want to train a breast cancer classifier or a segmentation model using the CBIS-DDSM dataset, this repository may help you to extract the mammograms and the masks from the original folder easily.

Resources

License

Stars

Watchers

Forks

Packages