Skip to content

philips-labs/pca-reimagine-cdl

Repository files navigation

pca-reimagine-cdl

🚨 Status: Pre-alpha 🚨

Overview

This repo is to access the prostate cancer patient data from the re-imagine consortium that is stored in the HealthSuite Platform Clinical Data Lake.

Even though this doesnt directly use the pycdal package it is heavily inspired by it and has shamelessly lifted code from it.

  1. CONTRIBUTING.md
  2. CHANGELOG.md
  3. CODE_OF_CONDUCT
  4. CODEOWNERS
  5. LICENSE

Getting Started

Installation

On Windows

  1. Create a virtual environment and activate it. See here for more details.

  2. Install all the dependencies using the requirements.txt

    $> pip install -r requirements.txt

Usage

  1. Copy the config.example.py file and rename it to config.py

  2. Edit the HSP_IAM_USERNAME attribute in the HSPIAMConfig class in the config.py

  3. Edit the CDL_ORGANIZATION_ID attribute in the HSPCDLConfig class in the config.py

  4. To change the study (default ReImagine/WS1) edit the DEFAULT_STUDY_ID attribute of HSPCDLConfig class in config.py

  5. Edit the OUTPUT_DIR attribute in the BaseConfig class in the config.py to change output directory.

  6. Download all patients MR identifiers associated with the study

    $> python download.py patients

    This will prompt you for HSP IAM PASSWORD and then create a file patients.txt in the OUTPUT_DIR with all patients MR identifiers from study

  7. Download metadata for all patients

    $> python download.py patient-metadata

    This will create a subfolder per patient in the OUTPUT_DIR and create a file metadata.json in it which will have information about the AWS S3 buckets where the patient information is available. This step can fail due to expiry of HSP Access Token. If so, please delete the last patient folder that was created (See error message on console) and restart this step. It will start downloading only for the remaining patients and patients for whom subfolder is already present will be skipped.

  8. Download actual patient data

    $> python download.py patient-data

    This step will download DICOM data and BIOSAMPLES data in the patient subfolder under OUTPUT_DIR. I haven’t executed the steps for all the patients in one shot, so let me know if this step throws any errors.

Community

This project uses the CODE_OF_CONDUCT to define expected conduct in our community. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting a project CODEOWNER

Changelog

See CHANGELOG for more info on what's been changed.

Development

Licenses

See LICENSE

About

This repo is used to access Prostate Cancer data available from the ReImagine consortium stored in CDL

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages