This repo is to access the prostate cancer patient data from the re-imagine consortium that is stored in the HealthSuite Platform Clinical Data Lake.
Even though this doesnt directly use the pycdal package it is heavily inspired by it and has shamelessly lifted code from it.
-
Create a virtual environment and activate it. See here for more details.
-
Install all the dependencies using the requirements.txt
$> pip install -r requirements.txt
-
Copy the config.example.py file and rename it to config.py
-
Edit the HSP_IAM_USERNAME attribute in the HSPIAMConfig class in the config.py
-
Edit the CDL_ORGANIZATION_ID attribute in the HSPCDLConfig class in the config.py
-
To change the study (default ReImagine/WS1) edit the DEFAULT_STUDY_ID attribute of HSPCDLConfig class in config.py
-
Edit the OUTPUT_DIR attribute in the BaseConfig class in the config.py to change output directory.
-
Download all patients MR identifiers associated with the study
$> python download.py patients
This will prompt you for HSP IAM PASSWORD and then create a file patients.txt in the OUTPUT_DIR with all patients MR identifiers from study
-
Download metadata for all patients
$> python download.py patient-metadata
This will create a subfolder per patient in the OUTPUT_DIR and create a file metadata.json in it which will have information about the AWS S3 buckets where the patient information is available. This step can fail due to expiry of HSP Access Token. If so, please delete the last patient folder that was created (See error message on console) and restart this step. It will start downloading only for the remaining patients and patients for whom subfolder is already present will be skipped.
-
Download actual patient data
$> python download.py patient-data
This step will download DICOM data and BIOSAMPLES data in the patient subfolder under OUTPUT_DIR. I haven’t executed the steps for all the patients in one shot, so let me know if this step throws any errors.
This project uses the CODE_OF_CONDUCT to define expected conduct in our community. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting a project CODEOWNER
See CHANGELOG for more info on what's been changed.
See LICENSE