# Unit 3.5.2 - Capstone Submission - Collect Data, Part III: Glioblastoma Radiogenomic data

## David Schonberger, 12/21/2021

### Dataset 3 -> See 'Idea 5: Apply ML/DL to Radiogenomics' (https://docs.google.com/document/d/1MsBeSa2sujr_A1eT3QPI7kjxqXY9FjGO2xeVg_gJLJY/edit?usp=sharing) 

### As the Google doc indicates, the dataset is part of a recent Kaggle competition. (See https://www.kaggle.com/c/rsna-miccai-brain-tumor-radiogenomic-classification/overview)

### I attempted to use the Kaggle cli tool to download the dataset via the command: 'kaggle competitions download -c rsna-miccai-brain-tumor-radiogenomic-classification'. But this failed, so I manually downloaded the data instead. The compressed dataset was about 12.7 GB but expanded to over 136 GB upon unzipping it.

In [71]:
import pydicom
import os
import numpy as np 
import cv2

In [74]:
#### Using https://pydicom.github.io/pydicom/stable/tutorials/dataset_basics.html
base_path = os.getcwd()
file_path = '00013\T1wCE\Image-40.dcm'
path = os.path.join(base_path, file_path)
mri_image = pydicom.dcmread(path)
print(mri_image)

Dataset.file_meta -------------------------------
(0002, 0010) Transfer Syntax UID                 UI: Implicit VR Little Endian
-------------------------------------------------
(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0008) Image Type                          CS: ['DERIVED', 'SECONDARY']
(0008, 0016) SOP Class UID                       UI: MR Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.826.0.1.3680043.8.498.12170484640840839415850946442294292245
(0008, 0050) Accession Number                    SH: '00013'
(0008, 0060) Modality                            CS: 'MR'
(0008, 103e) Series Description                  LO: 'T1wCE'
(0010, 0010) Patient's Name                      PN: '00013'
(0010, 0020) Patient ID                          LO: '00013'
(0018, 0023) MR Acquisition Type                 CS: '3D'
(0018, 0050) Slice Thickness                     DS: '2.0'
(0018, 0081) Echo Time                           DS: None
(0018, 0082)

In [75]:
cv2.imshow(f'image: {file_path}',mri_image.pixel_array)  
cv2.waitKey(0)
cv2.destroyAllWindows()

# Note, the above code runs. However the image that displays is completely grey. I tried several images from the FLAIR folder. Same results.

In [52]:
file_path2 = '00013\T1w\Image-15.dcm'
path = os.path.join(base_path, file_path2)
mri_image2 = pydicom.dcmread(path)
print(mri_image2)

Dataset.file_meta -------------------------------
(0002, 0010) Transfer Syntax UID                 UI: Implicit VR Little Endian
-------------------------------------------------
(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0008) Image Type                          CS: ['DERIVED', 'SECONDARY']
(0008, 0016) SOP Class UID                       UI: MR Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.826.0.1.3680043.8.498.92033800745224840860544117273406018632
(0008, 0050) Accession Number                    SH: '00013'
(0008, 0060) Modality                            CS: 'MR'
(0008, 103e) Series Description                  LO: 'T1w'
(0010, 0010) Patient's Name                      PN: '00013'
(0010, 0020) Patient ID                          LO: '00013'
(0018, 0023) MR Acquisition Type                 CS: '2D'
(0018, 0050) Slice Thickness                     DS: '5.0'
(0018, 0081) Echo Time                           DS: None
(0018, 0083) N

In [53]:
cv2.imshow(f'image: {file_path2}',mri_image2.pixel_array)  
cv2.waitKey(0)
cv2.destroyAllWindows()

# After switching to another folder, T1w, the image display is still completely grey. I tried several images, same results.

## What follows is an attempt to display other DICOM images, downloaded from the Cancer Imaging Archives (https://www.cancerimagingarchive.net/)

In [76]:
file_path3 = r'301.000000-T1WSE-68254\1-23.dcm'
path = os.path.join(base_path, file_path3)
mri_image3 = pydicom.dcmread(path)
print(mri_image3)

Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length  UL: 206
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: MR Image Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 1.3.6.1.4.1.14519.5.2.1.7009.2405.340592342021531538737305032606
(0002, 0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.3.6.1.4.1.22213.1.143
(0002, 0013) Implementation Version Name         SH: '0.5'
(0002, 0016) Source Application Entity Title     AE: 'POSDA'
-------------------------------------------------
(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0008) Image Type                          CS: ['ORIGINAL', 'PRIMARY', 'M_SE', 'M', 'SE']
(0008, 0012) Instance Creation Date              DA: '19591231'
(0008, 0013) Instance Creation Time              TM: '123736'
(0008, 0014) Instance Crea

In [77]:
type(mri_image3.pixel_array)

numpy.ndarray

In [78]:
mri_image3.pixel_array

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint16)

In [79]:
mri_image3.pixel_array.shape

(512, 512)

In [69]:
mri_image3.pixel_array[0:100]

array([[  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       [  0,   0,   0, ...,   0,   0,   0],
       ...,
       [  0,   0, 188, ...,  90, 184,   0],
       [  0,   0, 231, ..., 146, 261,   0],
       [  0,   0, 232, ..., 114, 223,   0]], dtype=uint16)

In [80]:
cv2.imshow(f'image: {file_path3}',mri_image3.pixel_array)  
cv2.waitKey(0)
cv2.destroyAllWindows()

# ...and the above image displays all black. Unclear what the issue is here.

### Additional exploration

In [33]:
elem = mri_image[0x0008, 0x0005]
elem

(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'

In [34]:
elem.keyword

'SpecificCharacterSet'

In [35]:
elem2 = mri_image['SeriesDescription']
elem2

(0008, 103e) Series Description                  LO: 'FLAIR'

In [36]:
elem2.value

'FLAIR'

In [37]:
elem3 = mri_image[0x0020, 0x0032]
elem3

(0020, 0032) Image Position (Patient)            DS: [75.71598816, -175.28321568, 105.86288376]

In [38]:
elem3.keyword

'ImagePositionPatient'

In [39]:
elem3.value

[75.71598816, -175.28321568, 105.86288376]

In [40]:
pixel_data = mri_image.PixelData
type(pixel_data)

bytes

In [41]:
pixel_data[0:8]

b'\x00\x00\x00\x00\x00\x00\x00\x00'

In [42]:
type(mri_image.pixel_array)

numpy.ndarray