Add ADNI dataset and Alzheimer's Disease classification pipeline#921
Add ADNI dataset and Alzheimer's Disease classification pipeline#921laubryan wants to merge 1 commit into
Conversation
c85f2fe to
86e4f90
Compare
the ADNI dataset and 3D-CNN model.
- Dataset: ADNIDataset
- Processor: NIftIImageProcessor
- Task: AlzheimersDiseaseClassification
- Model: AlzheimersDiseaseCNN
- Tests:
- TestADNIDataset
- TestAlzheimersDiseaseClassification
- TestAlzheimersDiseaseModel
- Examples:
- Python scripts
- Notebook
- RST Documentation
EricSchrock
left a comment
There was a problem hiding this comment.
I haven't had time for a full review yet, but I wanted to get you some initial feedback to work on. I still need to review the examples, processor, task, and tests as well as to read the paper and possibly try running things for myself.
| pyhealth.datasets.ADNIDataset | ||
| =================================== | ||
|
|
||
| The Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. A Pyhealth dataset consisting of labelled MRI brain scan images. For more information, and to apply for access, visit `https://adni.loni.usc.edu/data-samples/adni-data/`. |
There was a problem hiding this comment.
Pretty sure this won't create clickable links (here and in pyhealth.models.AlzheimersDiseaseCNN.rst). See the existing docs for how to create clickable links in RST files (and don't forget the trailing underscore).
| :members: | ||
| :undoc-members: | ||
| :show-inheritance: | ||
|
|
There was a problem hiding this comment.
May as well clean up this trailing whitespace at the end of this file (even though it's just copy/paste from some of the existing docs).
| - ``TimeImageProcessor``: For time-stamped image sequences (e.g., serial X-rays) | ||
| - ``TensorProcessor``: For pre-processed tensor data | ||
| - ``RawProcessor``: Pass-through processor for raw data | ||
| - ``NIftIImageProcessor``: For NIftI MRI images |
There was a problem hiding this comment.
This probably belongs in the "Specialized Processors" section.
| @@ -0,0 +1,9 @@ | |||
| pyhealth.models.AlzheimersDiseaseCNN | |||
There was a problem hiding this comment.
This is a very generic name, which makes me think it might conflict with future PyHealth contributions. Unfortunately, the paper this model comes from doesn't really provide a unique name. Any ideas for better names? My first thought was AdMri3dCNN, but I don't love it (camel case not doing us any favors here with all the acronyms).
| self.root = Path(root) | ||
| self.dev = dev | ||
|
|
||
| self.DEV_LIMIT = 1000 |
There was a problem hiding this comment.
Hmm, I don't like having this hardcoded in BaseDataset and here. We should add that to BaseDataset, perhaps as a class variable/attribute, and reference it in this class.
Unless of course you break this class's dependency on dev mode... (per other comments). Then you can just delete self.DEV_LIMIT.
| config_path (Optional[str]): Path to the configuration file, | ||
| defaults to "../configs/adni-metadata-pyhealth.csv" | ||
| dataset_name (str): Name identifying this dataset, default "ADNI". | ||
| dev: If True, number of loaded images is restricted to 100. |
There was a problem hiding this comment.
| dev: If True, number of loaded images is restricted to 100. | |
| dev: If True, number of loaded images is restricted to 1000. |
| # e.g. ADNI_002_S_0295_MPR__GradWarp__B1_Correction__N3_S13408_I45107.xml | ||
| metadata_files = Path(data_path).glob("ADNI*.xml") | ||
|
|
||
| # If a dev limit is requested, then sample the list randomly |
There was a problem hiding this comment.
Why add randomness to dev mode? I would expect dev mode to be 100% reproducible for debugging issues.
| dataset_name (str): Name identifying this dataset, default "ADNI". | ||
| dev: If True, number of loaded images is restricted to 100. | ||
| """ | ||
| self.root = Path(root) |
There was a problem hiding this comment.
self.root should not be initialized here, as that mirrors the initialization of self.root in the parent class. Just pass Path(root) to _validate_data_path() and _write_patient_catalog() and let super().__init__() initialize self.root.
| dev: If True, number of loaded images is restricted to 100. | ||
| """ | ||
| self.root = Path(root) | ||
| self.dev = dev |
There was a problem hiding this comment.
Initializing the dev member variable here in the child class is odd because it will be re-initialized by the call to super().__init__(). I see why you're doing it though. You need that information in _write_patient_catalog.
How long does _write_patient_catalog take to run on the dev and full datasets respectively? If the difference and/or absolute value is small, probably best to just run _write_patient_catalog on the full dataset metadata every time, even in dev mode, removing the need for the if self.dev: statement.
The other option is to pass dev to _write_patient_catalog directly as a function parameter, avoiding the need for self.dev = dev.
| from pyhealth.models.base_model import BaseModel | ||
|
|
||
|
|
||
| class AlzheimersDiseaseCNN(BaseModel): |
There was a problem hiding this comment.
Could you speak to why you didn't choose to inherit from the CNN class?
I'm wondering if that would simplify this class. I'm also wondering if this is an opportunity to further parameterize the CNN class to meet the needs of this new model (and future models?).
Contributor
Contribution
This PR is a full pipeline implementation of the Alzheimer's Disease classification model described by Liu et al., consisting of dataset, task and model for ADNI MRI images dataset.
Paper
This work is an implementation of the methods and ideas presented in:
Description
This PR adds support for ADNI MRI brain scan images using a PyHealth-compatible dataset and processing via a compatible task and multi-class classification model, based on the paper "On the Design of Convolutional Neural Networks for Automatic Detection of Alzheimer's Disease" by Liu et al.
Changes
ADNIDatasetfor ADNI MRI images, with YAML configuration fileNIfTIImageProcessorfor NIftI MRI imagesAlzheimersDiseaseClassificationtaskAlzheimersDiseaseCNN3D CNN model for classificationFiles to Review
pyhealth/datasets/adni.pypyhealth/datasets/configs/adni.yamlpyhealth/processors/nifti_image_processor.pypyhealth/tasks/ad_classification.pypyhealth/models/alzheimers_cnn.pypyhealth/tests/core/test_adni.pypyhealth/examples/adni_adclassification_cnn.ipynbpyhealth/examples/adni_ad_classification.pypyhealth/examples/adni_ad_synthetic_data.pypyhealth/docs/api/datasets.rstpyhealth/docs/api/datasets/pyhealth.datasets.ADNIDataset.rstpyhealth/docs/api/processors.rstpyhealth/docs/api/processors/pyhealth.processors.NIftIImageProcessor.rstpyhealth/docs/api/tasks.rstpyhealth/docs/api/tasks/pyhealth.tasks.AlzheimersDiseaseClassification.rstpyhealth/docs/api/models.rstpyhealth/docs/api/models/pyhealth.models.AlzheimersDiseaseCNN.rstNotes