# Working with Medical Images in Python - Reading and Converting Images

This notebook demonstrates how one might commonly work with medical images using Python

In [1]:
# Is platipy alread installed in your environment? If not run this cell to install it.
!pip install git+https://github.com/pyplati/platipy.git

Collecting git+https://github.com/pyplati/platipy.git
  Cloning https://github.com/pyplati/platipy.git to /private/var/folders/45/nx990hn91s77h5f_zqnxw7krhpqn54/T/pip-req-build-rdcpaudh
Collecting numpy
  Downloading numpy-1.20.1-cp38-cp38-macosx_10_9_x86_64.whl (16.0 MB)
[K     |████████████████████████████████| 16.0 MB 10.5 MB/s eta 0:00:01
[?25hCollecting pyyaml
  Downloading PyYAML-5.4.1-cp38-cp38-macosx_10_9_x86_64.whl (253 kB)
[K     |████████████████████████████████| 253 kB 11.7 MB/s eta 0:00:01
[?25hCollecting itk>=5.0a1
  Downloading itk-5.2rc2-cp38-cp38-macosx_10_9_x86_64.whl (8.2 kB)
Collecting vtk
  Downloading vtk-9.0.1-cp38-cp38-macosx_10_9_x86_64.whl (80.7 MB)
[K     |████████████████████████████████| 80.7 MB 369 kB/s  eta 0:00:01
[?25hCollecting SimpleITK
  Downloading SimpleITK-2.0.2-cp38-cp38-macosx_10_9_x86_64.whl (43.6 MB)
[K     |████████████████████████████████| 43.6 MB 2.9 MB/s eta 0:00:011
[?25hCollecting scikit-image
  Downloading scikit_image-0.18.1-cp

Collecting cycler>=0.10
  Using cached cycler-0.10.0-py2.py3-none-any.whl (6.5 kB)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.3.1-cp38-cp38-macosx_10_9_x86_64.whl (61 kB)
[K     |████████████████████████████████| 61 kB 548 kB/s  eta 0:00:01
Collecting amqp<2.7,>=2.6.0
  Using cached amqp-2.6.1-py2.py3-none-any.whl (48 kB)
Using legacy 'setup.py install' for platipy, since package 'wheel' is not installed.


Installing collected packages: numpy, pyyaml, itk-core, itk-numerics, itk-filtering, itk-registration, itk-segmentation, itk-io, itk, vtk, SimpleITK, pillow, imageio, cycler, kiwisolver, matplotlib, tifffile, networkx, scipy, PyWavelets, scikit-image, click, pluggy, toml, iniconfig, py, pytest, pydicom, pynetdicom, Werkzeug, itsdangerous, flask, aniso8601, pytz, flask-restful, SQLAlchemy, flask-sqlalchemy, billiard, vine, amqp, kombu, celery, redis, loguru, psutil, gunicorn, certifi, urllib3, chardet, idna, requests, pandas, pymedphys, platipy
    Running setup.py install for platipy ... [?25ldone
[?25hSuccessfully installed PyWavelets-1.1.1 SQLAlchemy-1.3.23 SimpleITK-2.0.2 Werkzeug-1.0.1 amqp-2.6.1 aniso8601-9.0.0 billiard-3.6.3.0 celery-4.4.7 certifi-2020.12.5 chardet-4.0.0 click-7.1.2 cycler-0.10.0 flask-1.1.2 flask-restful-0.3.8 flask-sqlalchemy-2.4.4 gunicorn-20.0.4 idna-2.10 imageio-2.9.0 iniconfig-1.1.1 itk-5.2rc2 itk-core-5.2rc2 itk-filtering-5.2rc2 itk-io-5.2rc2 itk-numeric

### Import some functions we'll need later on

In [44]:
from pathlib import Path

import pydicom
import SimpleITK as sitk

from platipy.dicom.download.tcia import (
    get_collections,
    get_modalities_in_collection,
    get_patients_in_collection,
    fetch_data
)

from platipy.dicom.rtstruct_to_nifti.convert import convert_rtstruct

from platipy.dicom.dicom_directory_crawler.conversion_utils import process_dicom_directory

# Download data from TCIA

The Cancer Imaging Archive (TCIA) is a fantastic resource for public medical imaging data. 

#### We'll use the 'Head-Neck Cetuximab' collection

In this cell we fetch a list of patients and then download the first patient.

In [6]:
collection = 'Head-Neck Cetuximab'
patients = get_patients_in_collection(collection)
patient_id = patients[0]
data = fetch_data(
    collection,
    patient_ids=[patient_id],
    modalities=["CT", "PT", "RTSTRUCT", "RTDOSE"],
    nifti=False,
    output_directory="./data"
)

2021-02-26 09:04:02.424 | DEBUG    | platipy.dicom.download.tcia:fetch_data:147 - Modalities available: ['PT', 'CT', 'RTDOSE', 'RTPLAN', 'RTSTRUCT']
2021-02-26 09:04:02.429 | DEBUG    | platipy.dicom.download.tcia:fetch_data:174 - Fetching data for Patient: 0522c0001
2021-02-26 09:05:11.291 | DEBUG    | platipy.dicom.download.tcia:fetch_data:198 - Downloading Series: 1.3.6.1.4.1.14519.5.2.1.5099.8010.427264300850965737262860055580
2021-02-26 09:05:44.152 | DEBUG    | platipy.dicom.download.tcia:fetch_data:198 - Downloading Series: 1.3.6.1.4.1.14519.5.2.1.5099.8010.293653169363509354643731389289
2021-02-26 09:06:23.311 | DEBUG    | platipy.dicom.download.tcia:fetch_data:198 - Downloading Series: 1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353
2021-02-26 09:08:30.563 | DEBUG    | platipy.dicom.download.tcia:fetch_data:198 - Downloading Series: 1.3.6.1.4.1.22213.2.26555.2
2021-02-26 09:10:07.593 | DEBUG    | platipy.dicom.download.tcia:fetch_data:198 - Downloading Series:

###  Let's see what data we got

In [7]:
for path in Path("data").glob("**/*"):
    print(path)

data/Head-Neck Cetuximab
data/Head-Neck Cetuximab/0522c0001
data/Head-Neck Cetuximab/0522c0001/DICOM
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.224799699720072341908493257751
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.22213.2.26555.5.1
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.118643650758016323655506179265
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.308184765901558710285007064772
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.252282268497495823643342650154
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.22213.2.26555.2
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.22213.2.26555.3
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.427264300850965737262860055580
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.313834683360850380813202838079
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.1

data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/3-206.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/371-274.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/154-057.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/201-104.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/22-225.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/167-070.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/353-256.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.5.2.1.5099.8010.279840345131785576013456733353/403-006.dcm
data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.14519.

### That's a lot of DICOM files, let's inspect one of those files using pydicom

In [9]:
pydicom.read_file(path)

Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 198
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: Positron Emission Tomography Image Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 1.3.6.1.4.1.14519.5.2.1.5099.8010.100627917464350080801231914379
(0002, 0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.2.40.0.13.1.1.1
(0002, 0013) Implementation Version Name         SH: 'dcm4che-1.4.34'
-------------------------------------------------
(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0008) Image Type                          CS: ['ORIGINAL', 'PRIMARY']
(0008, 0016) SOP Class UID                       UI: Positron Emission Tomography Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.3.6.1.4.1.14519.5.2.1.5099.8010.100627917464350080

'1.3.6.1.4.1.14519.5.2.1.5099.8010.427264300850965737262860055580'

In [29]:
convert_rtstruct?

### Things like Image Orientation, Pixel Spacing, Slice Thickness are essential for working with Medical Images

Here is a great resource to learn about all of that: <http://dicomiseasy.blogspot.com/2013/06/getting-oriented-using-image-plane.html>

To learn even more about the DICOM standard, check out the book: <https://www.springer.com/gp/book/9783642108495> (This should be available from your University Library)

### Working with DICOM files can be tricky. Fortunately the NIFTI image standard allows us to work with Medical Images while still retaining their image space

Let's use SimpleITK to read a DICOM series and save it as NIFTI

In [21]:
series_uid = next(iter(data[patient_id]["DICOM"]["CT"].keys()))
ct_path = data[patient_id]["DICOM"]["CT"][series_uid]

series_files = sitk.ImageSeriesReader.GetGDCMSeriesFileNames(str(ct_path))
ct_image = sitk.ReadImage(series_files)
sitk.WriteImage(ct_image, "data/ct.nii.gz")

### This works well for Image Series such as CT, MR or PT, but what about RTStruct?

Let's have a look at the format of an RTStruct file

In [30]:
series_uid = next(iter(data[patient_id]["DICOM"]["RTSTRUCT"].keys()))
rtstruct_directory = data[patient_id]["DICOM"]["RTSTRUCT"][series_uid]
rtstruct_path = next(iter(rtstruct_directory.glob("*")))

rts_ds = pydicom.read_file(rtstruct_path)
rts_ds

Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 164
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: RT Structure Set Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 1.3.6.1.4.1.22213.2.26555.3.1
(0002, 0010) Transfer Syntax UID                 UI: Implicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.2.40.0.13.1.1.1
(0002, 0013) Implementation Version Name         SH: 'dcm4che-1.4.34'
-------------------------------------------------
(0008, 0016) SOP Class UID                       UI: RT Structure Set Storage
(0008, 0018) SOP Instance UID                    UI: 1.3.6.1.4.1.22213.2.26555.3.1
(0008, 0021) Series Date                         DA: '19990831'
(0008, 0030) Study Time                          TM: ''
(0008, 0050) Accession Number                    SH: ''
(0008, 0060) Modality                            CS: 'RTSTRU

### So we can't simply read those with SimpleITK... but we can with platipy

There are now other libraries popping up which can read these file too... Feel free to try out one of those:
- https://github.com/pyplati/platipy/tree/master/platipy/dicom/rtstruct_to_nifti
- https://github.com/qurit/RT-Utils
- https://github.com/brianmanderson/Dicom_RT_and_Images_to_Mask

In [42]:
ct_series_uid = rts_ds.ReferencedFrameOfReferenceSequence[0].RTReferencedStudySequence[0].RTReferencedSeriesSequence[0].SeriesInstanceUID
ct_path = data[patient_id]["DICOM"]["CT"][ct_series_uid]

convert_rtstruct(
    str(ct_path),
    rtstruct_path,
    prefix='Struct_',
    output_dir='./data'
)

Converting RTStruct: data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.22213.2.26555.3/1-159.dcm
Using image series: data/Head-Neck Cetuximab/0522c0001/DICOM/1.3.6.1.4.1.22213.2.26555.2
Output file prefix: Struct_
Output directory: ./data
Converting structure 0 with name: 1cmptv
Converting structure 1 with name: BRAC_PLX
Converting structure 2 with name: GTV
Converting structure 3 with name: SKIN
Converting structure 4 with name: SPINAL_CORD
Converted all structures. Writing output.
Writing file to: ./data/Struct_1cmptv.nii.gz
Writing file to: ./data/Struct_BRAC_PLX.nii.gz
Writing file to: ./data/Struct_GTV.nii.gz
Writing file to: ./data/Struct_SKIN.nii.gz
Writing file to: ./data/Struct_SPINAL_CORD.nii.gz
Finished


## Thanks to Rob, we have a function on platipy which can convert all of this DICOM for you at once

https://github.com/pyplati/platipy/tree/master/platipy/dicom/dicom_directory_crawler



In [50]:
process_dicom_directory(
    "data/Head-Neck Cetuximab/0522c0001/DICOM",
    output_directory="./data/Head-Neck Cetuximab/0522c0001/NIFTI"
)

<generator object process_dicom_directory at 0x13a9e02e0>