# Step 1
## Getting data ready

This notebook is a quick demonstration of how I would prepare some DICOM data to start looking more into the imaging.

There are some quick checks about what is converted.

In [1]:
"""
Import some useful modules
"""
!pip install -U git+https://github.com/pyplati/platipy.git
from pathlib import Path
import zipfile

import SimpleITK as sitk

from platipy.dicom.io.crawl import process_dicom_directory

%matplotlib notebook

Collecting platipy
  Using cached platipy-0.1.1-py3-none-any.whl (145 kB)
Collecting celery==4.4.7
  Using cached celery-4.4.7-py2.py3-none-any.whl (427 kB)
Collecting rt-utils==1.1.4
  Using cached rt_utils-1.1.4-py3-none-any.whl (16 kB)
Collecting requests==2.25.1
  Using cached requests-2.25.1-py2.py3-none-any.whl (61 kB)
Collecting matplotlib==3.3.4
  Using cached matplotlib-3.3.4-cp38-cp38-manylinux1_x86_64.whl (11.6 MB)
Collecting click==7.1.2
  Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Collecting flask-sqlalchemy==2.4.4
  Using cached Flask_SQLAlchemy-2.4.4-py2.py3-none-any.whl (17 kB)
Collecting redis==3.5.3
  Using cached redis-3.5.3-py2.py3-none-any.whl (72 kB)
Collecting pandas==1.1.5
  Using cached pandas-1.1.5-cp38-cp38-manylinux1_x86_64.whl (9.3 MB)
Collecting pynetdicom==1.5.7
  Using cached pynetdicom-1.5.7-py2.py3-none-any.whl (1.6 MB)
Collecting pymedphys==0.37.1
  Using cached pymedphys-0.37.1-py3-none-any.whl (6.0 MB)
Collecting flask-restful==0.3.8
  Us

In [5]:
"""
First, unzip the archive
"""

with zipfile.ZipFile("./input/data.zip","r") as zip_ref:
    zip_ref.extractall("./input/DICOM")

In [6]:
"""
Let's have a look at what data we have
"""

input_dir = Path("./input/DICOM/")
list(input_dir.glob("**/*.dcm"))

[PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000044.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000032.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000014.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000085.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000051.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000113.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000099.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000058.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.000000-T2 TSE-95973/000030.dcm'),
 PosixPath('input/DICOM/RTMAC-LIVE-003/10-21-1998-ResearchHN-39464/3.0000

### Read and sort the DICOM directory

This function is actually pretty powerful.

It can read in a lot of DICOM data, and matches the StudyUID and SeriesUID which provides a useful way to index imaging.

It also looks for ReferencedSeriesUID, so can link up RT-STRUCT and RT-DOSE files to the respective image.

There's still room for improvement, check out platipy to contribute!

In [7]:
output = process_dicom_directory(
    dicom_directory=input_dir,
    parent_sorting_field='PatientName',
    output_image_name_format='{parent_sorting_data}_{study_uid_index}_{Modality}_{image_desc}_{SeriesNumber}',
    output_structure_name_format='{parent_sorting_data}_{study_uid_index}_{Modality}_{structure_name}',
    output_dose_name_format='{parent_sorting_data}_{study_uid_index}_{DoseSummationType}',
    return_extra=True,
    output_directory='./input/NIfTI',
    output_file_suffix='.nii.gz',
    overwrite_existing_files=False,
    write_to_disk=True,
)

2021-08-24 02:48:51.165 | INFO     | platipy.dicom.io.crawl:process_dicom_directory:806 - Processing data for PatientName = RTMAC-LIVE-001.
2021-08-24 02:48:51.166 | INFO     | platipy.dicom.io.crawl:process_dicom_directory:807 -   Number of DICOM series = 2
2021-08-24 02:48:51.166 | DEBUG    | platipy.dicom.io.crawl:process_dicom_directory:819 -   Output image name format: {parent_sorting_data}_{study_uid_index}_{Modality}_{image_desc}_{SeriesNumber}
2021-08-24 02:48:51.167 | DEBUG    | platipy.dicom.io.crawl:process_dicom_directory:820 -   Output structure name format: {parent_sorting_data}_{study_uid_index}_{Modality}_{structure_name}
2021-08-24 02:48:51.168 | DEBUG    | platipy.dicom.io.crawl:process_dicom_directory:821 -   Output dose name format: {parent_sorting_data}_{study_uid_index}_{DoseSummationType}
2021-08-24 02:48:51.168 | INFO     | platipy.dicom.io.crawl:process_dicom_series:348 -   Processing series UID: 1.3.6.1.4.1.14519.5.2.1.1706.6003.260322528954425378667840062326


### Data investigation

It's usually a good idea to check basic information about the data you create.

In [8]:
"""
How can we check out what we have?
"""

for ptid in output:
    
    print("Processing",ptid)
    
    img_files = output[ptid]["IMAGES"]
    
    if len(img_files) == 1:
        print("  Single image found.")
        
        img = sitk.ReadImage( str(img_files[0]) )
        print("    Pixel Type:",img.GetPixelIDTypeAsString())
        print("    Resolution:",img.GetSpacing())
        print("    Size:      ",img.GetSize())
        print("    Origin:    ",img.GetOrigin())
        print("    Direction: ",img.GetDirection())

Processing RTMAC-LIVE-001
  Single image found.
    Pixel Type: 16-bit unsigned integer
    Resolution: (0.5, 0.5, 2.0)
    Size:       (512, 512, 120)
    Origin:     (-135.74818420410156, -199.0250244140625, -109.0)
    Direction:  (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0)
Processing RTMAC-LIVE-002
  Single image found.
    Pixel Type: 16-bit unsigned integer
    Resolution: (0.5, 0.5, 2.0)
    Size:       (512, 512, 120)
    Origin:     (-119.92090606689453, -200.8926544189453, -314.6151428222656)
    Direction:  (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0)
Processing RTMAC-LIVE-003
  Single image found.
    Pixel Type: 16-bit unsigned integer
    Resolution: (0.5, 0.5, 2.0)
    Size:       (512, 512, 120)
    Origin:     (-136.47457885742188, -184.7796630859375, -362.92022705078125)
    Direction:  (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0)


In [9]:
"""
How can we check out what we have?
"""

for ptid in output:
    
    print("Processing",ptid)
    
    s_files = output[ptid]["STRUCTURES"]
    
    print(f"  {len(s_files)} structures found:")

    for s_file in s_files:
        
        s_name = s_file.name[26:-7]
        print("    ", s_name)

Processing RTMAC-LIVE-001
  8 structures found:
     GLND_SUBMAND_L
     GLND_SUBMAND_R
     LN_NECK_II_L
     LN_NECK_II_R
     LN_NECK_III_L
     LN_NECK_III_R
     PAROTID_L
     PAROTID_R
Processing RTMAC-LIVE-002
  8 structures found:
     GLND_SUBMAND_L
     GLND_SUBMAND_R
     LN_NECK_II_L
     LN_NECK_II_R
     LN_NECK_III_L
     LN_NECK_III_R
     PAROTID_L
     PAROTID_R
Processing RTMAC-LIVE-003
  8 structures found:
     GLND_SUBMAND_L
     GLND_SUBMAND_R
     LN_NECK_II_L
     LN_NECK_II_R
     LN_NECK_III_L
     LN_NECK_III_R
     PAROTID_L
     PAROTID_R
