In [None]:
!pip install SimpleITK pydicom nibabel pylibjpeg GDCM pylibjpeg-libjpeg

## Uncomment this if you are using google collab
# from google.colab import drive
# drive.mount('/content/drive')
# %cd /content/drive/MyDrive/med-phys-python-bootcamp

# Code example: Dicom and Nifti image basics

The government has standardized medical image storage into **Digital Image and Communications in Medicine** (DICOM) files. 

Working with these files are not required or relavent with in the core carriculum. However, some advanced classes and many areas of research will work closely with these types of file structures.

This notebook briefly introduces both `DICOM` file formats as well as another commonly-used image storage format `nifti`.

Python package `pydicom` will be used to read `DICOM` files. `nibabel` will be used to read `nifti` files. Additional inherent packages `os` and `glob` will be used to locate and read file paths from the computer. `matplotlib.pyplot` will be used to visualize the volume while `numpy` will be used to handle the numerical image values.

---
# DICOM

Dicom, or Digital Image and Communications in Medicine, file format have the file type `.dcm`. Here, each file contains the image itself, as well as information stored in the file *header*.

## Headers
The header of a file or image contains meta data describing what the image is of, how the image was aquired, and other cruicial information that may be of interest. The meta data is identifiable through their *tags* (Dicom tags).

In the case of the medical world, the header will contain additional information about **the Patient scanned.** Patient data, including HIPPA protected data, is stored in the header. When working with Dicom files, the data is *usually* de-identified, stripping the meta-data of all information pertaining to the patient. 

The elements of the file header are structures like a dictionary, with consistent and well defined locations for imporant aspects of information. 

For instance, when working with images, the spatial resolution is an important piece of information to be aware of. The image itself is just some m x n array of pixel values. Within the header, there will exist a specific tag that defines explicitly what spatial measurement corresponds with a single pixel.

### **PSA: be mindful of meta-data to not violate Hippa compliance**

The Dicom file format can contain information about and support a wide variety of different objects. For isntance, the optimized radiotherapy plan that contains detailed positional and timing instructions for a linear accelerator to enable someone's radiotherapy can be stored in a dicom file.

## Reading a Dicom file
For us, most use cases will be in the form of image data. Each dicom file will contain a single 2D image. To read this file, we can use `pydicom`'s `read_file` function, which will read the image and header in to a pydcm object.

Dicom files extracted from the clinical systems will typically have jargon names. They will be stored in a directory that contains a more user-friendly name, however the actual `.dcm` file may just be a string of numbers.

In the example below, we load a single `.dcm` file and read out the header

In [None]:
import pydicom as pdcm
import nibabel as nib
import os, glob
import matplotlib.pyplot as plt, numpy as np

file_path = "CCTA_dicom_example/1.2.826.0.1.3680043.2.629.20230228.10451374128595241267215975787.dcm"

header = pdcm.read_file(file_path)
print(f'The header information is:')
print(header)

There is a lot of information!

Note the structure of the output. There are multiple rows of a consistent column layout.

* The first column contains the `bite` location of the information in header.
* The second column contains the description of the information
* The third column contains the data type that is stored at that location
* The fourth column contains the information that is stored.

## Retrieving information from the dicom file
Say I want to get information about the Pixel Spacing, or resolution, of the image...

I see in the output above that it is stored at byte location (0028, 0030). We can call that line item through:

In [None]:
xy_res = header[0x0028, 0x0030]
print(xy_res)
print(type(xy_res))

Note: the byte starts with `0x` which indicates to python that we are dealing with a byte 0028 and not the integer 28. 

When the element is called in this method, the full data element is returned. To return a usable list of floats, we can call the property `.value` on the data element:

In [None]:
xy_res = header[0x0028, 0x0030].value
print(xy_res)
print(type(xy_res))

## Getting the image from the dicom file

The image itself is stored within the header as well. In this case, it is the last element on this list `Pixel Data`.

Retrieving this in the same format as we did above, we will get back a large data element that contains pixel data stored in a binary format. To get back a usable image array, we call the property `pixel_array` which will return the 2D image as a `numpy` array.

Calling `.shape` on a `numpy.ndarray` object will return the pixel counts of the iamge:

In [None]:
image = header.pixel_array
print('The image array has type: ', type(image))
print('The image has a shape: ', image.shape)

# Visualize the 2D plot 
fig, ax = plt.subplots()
ax.imshow(image, cmap='gray')
plt.show()


## Reading an Image *Volume* from Dicom files

Above we gave the example of a 2D dicom image. Images aquired on CT or MRI will be volumetric.When working with Dicom images, a 3D image is just a series of multiple 2D images. If we look into the folder that contains the example, you will see numerous `.dcm`. Using a combination of `os` and `glob` packages, we can easily load each image into a single list. Using `numpy` we can convert that list into an image volume.

In [None]:
image_volume = []

dcm_src = "CCTA_dicom_example"

dcm_files = glob.glob(os.path.join(dcm_src, '*.dcm'))

for file in dcm_files:
    img_slice = pdcm.read_file(file).pixel_array
    image_volume.append(img_slice)

image_volume = np.stack(image_volume, axis=0)
print(f'Full image volume has shape [z, x - y plane]: ', image_volume.shape)

Note, we *stacked* the image along axis 0, meaning the z axis is now in that first (or zeroth) index. We can view this volume by iterating over the z slices and showing each slice. Here we will show every 8th slice to reduce the output.

In [None]:
# Adjusting the volume to the "Cardiac Window" for better viewing
view_volume = image_volume.copy()
# Scaling the image to the dicom intercept shift
view_volume += int(header[0x0028, 0x1052].value)
# Windowing
view_volume[view_volume < -250] = -250
view_volume[view_volume > 550] = 550

for z in range(0, view_volume.shape[0], 8):
    fig, ax = plt.subplots()
    ax.imshow(view_volume[z], cmap='gray', vmin=view_volume.min(), vmax=view_volume.max())
    ax.axis('off')
    ax.set_title(f'slice {z}')
    plt.show()

This image volume is a patient MRI of the thoracic cavity. Looking closely at the slices, you can see that the order of the images don't make sense. We loaded all the dicoms one by one and put them into a single image volume. However, the dicoms were not in the anatomical order. When using `glob` to find files, it does not load files in order as they appear in the directory, nor does it load files in numerical order. Futhermore, the dicom files themselves have a jargon naming scheme of integers. 

To make an understandable and useable image volume where each slice is *in order*, we must load each dicom file, identify *in the header* what z-slice this image coresponds to, and manualy put them in order ourselves.

Of course, there are many existing packages that do exactly that such as `SimpleITK` which will read the full stack of images, sort them correctly, and output a single 3D `np.ndarray` object containing the image:

In [None]:
import SimpleITK as sitk

# Initializing the reader object that can identify file paths
dicom_reader = sitk.ImageSeriesReader()
# Reading the series from the directory that contains the .dcm files
dicom_image_series = dicom_reader.GetGDCMSeriesFileNames(dcm_src)
# Reading the image from the series
dicom_image = sitk.ReadImage(dicom_image_series)
# Converting it to a numpy array out of sitk
image_volume = sitk.GetArrayFromImage(dicom_image)

print(f'The output is a {type(image_volume)}')
print(f'With shape {image_volume.shape}')
print(f'Which matches what we had above, but in order!')

Viewing the image volume like we did above, we will see that it is now in the correct order:

In [None]:
# Adjusting the volume to the "Cardiac Window" for better viewing
view_volume = image_volume.copy()
# No need to scale because sitk applied to automatically
# view_volume += int(header[0x0028, 0x1052].value)
# Windowing
view_volume[view_volume < -250] = -250
view_volume[view_volume > 550] = 550

for z in range(0, view_volume.shape[0], 8):
    fig, ax = plt.subplots()
    ax.imshow(view_volume[z], cmap='gray', vmin=view_volume.min(), vmax=view_volume.max())
    ax.axis('off')
    ax.set_title(f'slice {z}')
    plt.show()

At the end of the above process to retrieve the image volume, we still need to be mindful of the meta data, going back into the `.dcm` files and retrieving information as necessary. 

For example, when I am working with MRI volumes for deep learning, I need to load each image and resample them to a consistent resolution.

I load the image volumes using the same method as described, and then I load an individual `.dcm` file to retrieve information about the xy resolution (same as the example), as well as tag (0018, 0050) Slice Thickness. This gets me the full xyz voxel size of the image for use in pre-processing.

In [None]:
# Getting the header of a single slice (done above)
header = pdcm.read_file(file_path)

# Retrieving information of interest from the dicom tags
z_thicknes = header[0x0018, 0x0050].value
xy_res = header[0x0028, 0x0030].value

# Combining them into a single list of resolution
zxy_resolution = [float(z_thicknes)] + [float(x) for x in xy_res]

print(f'The intial resolution of the image is {zxy_resolution}')

---

# Nifti files

When working with 3D images, it is cumbersum to go continuously go through the process of loading each `.dcm` file and converting them to a 3D array. It is also cumbersum when sharing data to transfer 1000+s of `.dcm` files. Instead, after processing a 3D `.dcm` file, they are normally converted to a `nifti` file (`.nii`) or a `compressed nifti` file (`.nii.gz`).

A nifti file works very similar to a `.dcm` file with a header and pixel information. However, it is more dedicated to image storage while enableing higher dimention information.

You could load a single 2D image and store it into a nifti file. You could load up a series of dicom images into a 3D volume and store them all in a single file. You can even load a series of series (4D CT / Diffusion Tensor Imaging) and save them all into a single nifti file.

---
## Saving a dicom stack to a nifti file.

There are two different types of nifti volumes, `nifti1` and `nifti2`. Both do very similar things but `nifti2` has more options and larger value storage. Here we will be using `nifti1`.

To save a nifti file, you will need to provide 3 items:
* The data array (usually a numpy array)
* The `Affine` matrix that describes the orientation, scaling, and rotation of the image
    * A usable defauly is the identity matrix (1s along the diagonal, 0s else where, `np.eye(4)`)
* The path to the file to save the image to, ending in `FILE_NAME.nii.gz` to save as a compressed nifti file

First, we will convert the numpy array holding the 3D image into a nifti object through the package `nibabel`

In [None]:
import nibabel as nib

arr = image_volume
affine = np.eye(4)

nib_img = nib.nifti1.Nifti1Image(arr, affine=affine)
nib.save(nib_img, f'Nifti_CCTA_example.nii.gz')

Now, the above 3D Dicom stack has been saved as a single nifti array inside of our current working directory!

---
## Loading an image from a nifti file

To load a nifti file, we can read the image path in a very similar manner to that of the dicom images. Doing so will load `nibabel` object holding both the image and the header information.

In [None]:
nib_image = nib.load('Nifti_CCTA_example.nii.gz')
print(type(nib_image))

To see the object's information (Including the header), we can print either the object directly. 

In [None]:
print(nib_image)

Notice, the header keys are not equivalent to what we saw in the dicom portion of this notebook. The keys here have equivalent entries in the dicom header. However, you must visit the documentation to fully understanding what key maps to what between file formats.

Most of the variables are empty. In the above cell, we *only* saved the data array. In order to fill out the header information, we would need to transfer the relavent information from the dicom header to the nifit header.

To get the image array from the nibabel obejct, we can call `.get_fdata()` to return it as a numpy array.

In [None]:
image_volume = nib_image.get_fdata()
print(type(image_volume))

---

## Interacting with the header

As mentioned before, there is no additional information provided by default in the nifti header when saved the 3D volume. When creating and saving the nifti volume, we have to make sure that the information we want is defined.

To access the header, either to retrieve information or to set new values, we can call the `.header` property and treat it as a dictionary. 

For example, the image resolution in a nifti file will be stored under the key `pixdim`...

In [None]:
header = nib_image.header

resolution = header['pixdim']
print(f'Default resolution: {resolution}')

Here, we see a list of 8 1s. This parameter in a `nifti1` file will always correspond to a list of 8 values. Because our image is 3D, we will just use the first 3 values. To make it a valid entry, we will need to replace the remaining 5 values with 1s.

We got the resolution of the dicom volume earlier and stored it as the variable `zxy_resolution`. Lets update the header of the nifti volume and save it.

The header information can be overwritten directly as follows:

In [None]:
nib_image.header['pixdim'] = zxy_resolution + [1 for _ in range(5)]
print(nib_image.header['pixdim'])

Overwritting the save for this nifti volume as we did earlier, we can re-load the image in and see that the resolution has been stored in the header.