![DICOM](https://www.dicomstandard.org/images/librariesprovider2/default-album/dicom-logo.jpg?sfvrsn=7e5f288b_2)

This notebook presents an introduction to DICOM format, support for .dcm in python, OSIC .dcm EDA files and official documentation of tag names and values.

In [None]:
from IPython.display import display,HTML,clear_output
groups = {}
groups['Patient'] = ['PatientID','PatientName','PatientSex','DeidentificationMethod']
groups['General-study'] = ['StudyID','StudyInstanceUID']
groups['General-series'] = ['SeriesInstanceUID','BodyPartExamined','Modality','PatientPosition']

groups['General-image'] = ['InstanceNumber','PatientOrientation']
groups['CT-image'] = ['BitsAllocated','BitsStored','ConvolutionKernel','GantryDetectorTilt','ImageType','KVP','RescaleIntercept','RescaleSlope','RescaleType','RotationDirection','DistanceSourceToDetector','DistanceSourceToPatient','FocalSpots','GeneratorPower','RevolutionTime','SingleCollimationWidth','SpiralPitchFactor','TableFeedPerRotation','TableHeight','TableSpeed','TotalCollimationWidth','XRayTubeCurrent']
groups['Image-pixel'] = ['PixelData','Rows','Columns','HighBit','PixelRepresentation','SamplesPerPixel','PhotometricInterpretation','SmallestImagePixelValue','LargestImagePixelValue']
groups['Image-plane'] = ['ImageOrientationPatient','ImagePositionPatient','PixelSpacing','SliceLocation','SliceThickness']
groups['VOI-lut'] = ['WindowCenter','WindowWidth','WindowCenterWidthExplanation']

groups['SOP-common'] = ['SOPInstanceUID','SpecificCharacterSet']
groups['Frame-of-reference'] = ['FrameOfReferenceUID','PositionReferenceIndicator']
groups['General-equipment'] = ['Manufacturer','ManufacturerModelName','PixelPaddingValue','SpatialResolution']


#groups['Other-tags'] = list(set(metadata.columns.values) - set([item  for fg in groups for item in groups[fg]]))
ds = '<h1 id = "Table-of-contents">Table of contents</h1>'
ds += '<ul class = "roman">'
ds += '<li><a href = "#Table-of-contents">Table of contents</a></li>'
ds += '<li><a href = "#DICOM-standard">DICOM standard</a></li>'
ds += '<ul class = "roman">'
ds += '<li><a href = "#pydicom-package">pydicom package</a></li>'
ds += '<li><a href = "#DICOM-tags,-numbers-and-keywords">DICOM tags, numbers and keywords</a></li>'
ds += '<li><a href = "#PixelData">PixelData</a></li>'
ds += '<li><a href = "#Disadvantages">Disadvantages</a></li>'
ds += '</ul>'
ds += '<li><a href = "#OSIC-DICOM-EDA">OSIC DICOM EDA</a></li>'
ds += '<ul class = "roman">'
for fg in groups:
    ds += '<li><a href = "#'+fg+'">'+fg+' Module</a></li>'
    ds += '<ul class = "square">'
    for ffg in groups[fg]:
        ds += '<li><a href = "#'+ffg+'">'+ffg+' Attribute</a></li>'
    ds += '</ul>'
ds += '</ul>'
ds += '<li><a href = "#Case-Study">Case Study</a></li>'
ds += '</ul>'
display(HTML(ds))

# DICOM standard
Sources:
* https://www.dicomstandard.org/
* http://dicom.nema.org/medical/dicom/current/output/html/part01.html
* https://en.wikipedia.org/wiki/DICOM

**DICOM® — Digital Imaging and Communications in Medicine** — is the international standard for medical images and related information. It defines the formats for medical images that can be exchanged with the data and quality necessary for clinical use. DICOM periodically holds conferences to promote the understanding and adoption of the DICOM Standard and to understand regional interests and priorities. The DICOM standard is divided into related but independent parts.
In particular, in this notebook we will focus on CT-image DICOM standard, described also here:
https://dicom.innolitics.com/ciods/ct-image. 

## pydicom package
Source:
https://github.com/pydicom/pydicom

**pydicom** is a pure Python package for working with **DICOM** files. It lets you read, modify and write **DICOM** data in an easy "pythonic" way. In the example below, we import the first dcm file in the training set, and then display its contents:

In [None]:
from pydicom import dcmread
import os
dir = '/kaggle/input/osic-pulmonary-fibrosis-progression/train/'
first_patient = dir + os.listdir(dir)[0] + '/'
first_dicom = dcmread(first_patient + os.listdir(first_patient)[0])
first_dicom

## DICOM tags, numbers and keywords
DICOM groups information into data sets. You can access specific data elements by **DICOM tag number** (actually a pair of number = group, element) or by name (keyword):

In [None]:
first_dicom[0x10,0x10]

Another way:

In [None]:
first_dicom[0x0010,0x0010]

Yet another way (using pydicom TupleTag):

In [None]:
from pydicom.tag import TupleTag
first_dicom[TupleTag((0x0010, 0x0010))]

Yet another way (using pydicom tag long number):

In [None]:
from pydicom.datadict import keyword_dict
#keyword_dict['PatientName'] #1048592
first_dicom[1048592]

Yet another way (using pydicom BaseTag):

In [None]:
from pydicom.tag import BaseTag
first_dicom[BaseTag(1048592)]

Yet another way (using DICOM keyword):

In [None]:
first_dicom['PatientName']

Note that, you can't access this field via name (ValueError raised):

In [None]:
try:
    first_dicom['Patient Name']
except ValueError:
    print('ValueError')

These keywords are attached to pydicom package. These keywords can be also found on this page:
https://dicom.innolitics.com/ciods/ct-image/
First 5 keys in this dictionary are presented below:

In [None]:
idx = 0
for key in keyword_dict:
    if idx < 5:
        print(key)
        print(keyword_dict[key])
    idx +=1

## PixelData 
One of the main tags is obviously PixelData. Its value is a compressed image. You can refer to it directly to get a record in bytes:

In [None]:
first_dicom['PixelData'].value[:10]

In pydicom package we can refer to **pixel_array** to get an array for example:

In [None]:
import matplotlib.pyplot as plt
first_img = first_dicom.pixel_array
plt.imshow(first_img)
print(first_img.shape)

More information about the content of the image will be given later.

## Disadvantages
According to a paper presented at an international symposium in 2008, the DICOM standard has problems related to data entry. "A major disadvantage of the DICOM Standard is the possibility for entering probably too many optional fields. This disadvantage is mostly showing in inconsistency of filling all the fields with the data. Some image objects are often incomplete because some fields are left blank and some are filled with incorrect data."

Therefore, DICOM tags are described by major types:
* 1 = Required
* 1C = Conditionally Required
* 2 = Required, Empty if Unknown
* 2C = Conditionally Required, Empty if Unknown
* 3 = Optional

### Example DICOM-CT tags: Required tags (type 1):

In [None]:
first_dicom['ImageType']

In [None]:
first_dicom['SeriesInstanceUID']

In [None]:
try:
    first_dicom['ReferencedSOPClassUID']
except KeyError:
    print('KeyError') # This should works !!!

Due to this problem, we will move on with some function wrapper for this:

In [None]:
def get_dicom_tag(dicom, call):
    try:
        return dicom[call]
    except KeyError:
        print('KeyError')
get_dicom_tag(first_dicom, 'ReferencedSOPClassUID')

In [None]:
get_dicom_tag(first_dicom, 'ImageType')

### Example tags: Required, Empty if Unknown (type 2):

In [None]:
get_dicom_tag(first_dicom, 'KVP')

In [None]:
get_dicom_tag(first_dicom, 'SeriesNumber') # This should work either! 

## All tags from file:
With the **dir()** command we can see the whole list of keywords available for a given file (we will print sample 10):

In [None]:
first_dicom.dir()[:10]

# OSIC DICOM EDA
It's worth to notice, that OSIC training folder data has each folder for each patient - containing .dcm files:

In [None]:
os.listdir(first_patient)[:5]

These files should contain CT 3D image, connected by **SeriesInstanceUID**. Let's check it out, if it holds for the first patient:

In [None]:
import numpy as np

SeriesUIDs = []
for file_dicom in os.listdir(first_patient):
    SeriesUIDs.append(dcmread(first_patient + file_dicom)['SeriesInstanceUID'].value)
np.unique(np.array(SeriesUIDs))

Works perfectly. Next we will extract from .dcm files, all tags, divided in two major groups (from competition POV):
* patient-level tags - unique for all images in patient folder
* image-level tags - other tags

However, this is also documented and named **Information entities** and **Modules**. The following list presents entities included in the DICOM-CT standard (will be explored further, notice that all modules presented below are marked as **Mandatory**, except VOI LUT module):
* Patient
    - Patient Module
* Study
    - General Study Module
* Series
    - General Series Module
* Image
    - General Image Module
    - CT Image Module
    - Image pixel Module
    - Image plane Module
    - VOI LUT Module *(User Optional)*
    - SOP Common Module
* Frame of Reference
    - Frame of Reference Module
* Equipment
    - General Equipment Module

More examples below. So in the same series, two images have a lot of tags of the same value. These tags and their values can be explored for the two .dcm files using code presented below (notice that pixel_array differences are not included):

In [None]:
# https://pydicom.github.io/pydicom/stable/auto_examples/plot_dicom_difference.html
# authors : Guillaume Lemaitre <g.lemaitre58@gmail.com>
# license : MIT
import difflib
import pydicom

filename_ct = first_patient+'/'+os.listdir(first_patient)[0]
filename_ct2 = first_patient+'/'+os.listdir(first_patient)[1]

datasets = tuple([pydicom.dcmread(filename, force=True)
                  for filename in (filename_ct, filename_ct2)])

# difflib compare functions require a list of lines, each terminated with
# newline character massage the string representation of each dicom dataset
# into this form:
rep = []
for dataset in datasets:
    lines = str(dataset).split("\n")
    lines = [line + "\n" for line in lines]  # add the newline to end
    rep.append(lines)

diff = difflib.Differ()
for line in diff.compare(rep[0], rep[1]):
    if line[0] != "?":
        print(line)

From the above printout we can see that there are differences in two consecutive files for the same patient: SliceLocation, InstanceNumber and so on. Regardless of that, we will import all features into DataFrame for each file separately:

In [None]:
def get_imagename(x):
    return str(x).split('_')[1]

def extract_patients_metadata(workdir): 
    P = {}
    pidx = 0
    for patient in os.listdir(workdir):
        pidx += 1
        for dcm in os.listdir(workdir+patient):
            P[patient+dcm] = {}
            R = {}
            r = dcmread(workdir + patient +'/'+dcm)
            for fr in r.dir():
                if (fr == 'PixelData'):
                    pass
                else:
                    R[fr] = r[fr].value
            #del r
            #gc.collect()
            P[patient+'_'+dcm] = R
            
    patients_metadata = pd.DataFrame.from_dict(P,orient='index')
    patients_metadata['ImageName'] = list(map(get_imagename,np.array(patients_metadata.index.values)))
    return patients_metadata#R, R2
#! conda install -y gdcm -c conda-forge > /dev/null

In [None]:
%%time 

import pandas as pd
workdir = '/kaggle/input/osic-pulmonary-fibrosis-progression/train/'
metadata_file = 'osic_dicom_metadata.csv'
try:
    metadata = pd.read_csv(metadata_file, index_col = 0, low_memory = False)
except:
    metadata = extract_patients_metadata(workdir)
    metadata.to_csv(metadata_file)
metadata.head()

In [None]:
metadata.shape

A beautiful collection! Let's find out what the particular features mean. Each feature has a group (module) assigned, we will start with a patient module:

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

In [None]:
comments = {}
comments['PatientID'] = 'This Attributte is equivalent to Patient column in train.csv, also to directory name in training folder'
comments['PatientName'] = 'In this dataset, equivalent to PatientID'
comments['PatientSex'] = 'Useless here; presented in training.csv'
import tabulate
from pydicom.datadict import *
from plotly.offline import iplot
import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')

#tag = 'ConvolutionKernel'#'PatientName' #PatientID
def tag_eda(tag):
    mtu = metadata[tag].map(str).unique() #metadata[tag].unique()
    lmtu = len(mtu)
    result = {}
    result['Number of unique values:'] = str(lmtu)
    percent_of_empty = metadata[tag].isna().sum()/metadata.shape[0]*100
    result['Percent of empty or NaN values: '] = str(percent_of_empty)
    if lmtu>1:
        result['Example values: '] = str(mtu[0]) + '; ' + str(mtu[1])
    else:
        result['Value: '] = str(mtu[0])
    
    table_data = [[fr,result[fr]] for fr in result]
    display(HTML(tabulate.tabulate(table_data, tablefmt='html')))

    if lmtu>1 and lmtu<50:
        metadata[tag].map(str).value_counts().iplot(kind='bar',yTitle='Counts', linecolor='black', opacity=0.7,color='blue',theme='pearl',bargap=0.5,gridcolor='white',title='Distribution of the ' +tag+' tag values in the OSIC DICOM metadata')
    
    if percent_of_empty == 0 and lmtu == 1:
        comments[tag] = 'Useless here; constant value in training data'
    try:
        display(HTML('<b> Notebook author comment:</b> ' + comments[tag]))
    except:
        pass
    
import requests
from bs4 import BeautifulSoup
from pydicom.tag import BaseTag # https://github.com/pydicom/pydicom/blob/master/pydicom/tag.py
from pydicom.datadict import tag_for_keyword
os.makedirs('html', exist_ok = True)
import pickle
def display_html(tag):
    try:
        with open('html/'+tag + '.html', 'rb') as config_dictionary_file:
            r = pickle.load(config_dictionary_file)
            #print('Extracting html description..: ' + tag)
    except:
        #print('Downloading html description..: ' + tag)
        r = requests.get(links[tag]) 
        with open('html/'+tag + '.html', 'wb') as config_dictionary_file:
            pickle.dump(r, config_dictionary_file)
    soup = BeautifulSoup(r.text, 'html.parser')
    ds = ""
    idx = 0
    for fs in soup.find_all(class_="m-a-1 detail-pane-section"):
        if idx == 0:
            #ffs = fs.find(class_ = "section-title text-secondary")
            #ffs.string += ': documentation'
            fs.find(class_ = "section-title text-secondary").string += ': documentation'
            fs.find(class_ = "section-title text-secondary")['id'] = tag
            #ffs['id'] = tag
        ds += str(fs)
        idx +=1
    display(HTML(ds))
    
links = {}

def display_h(text, level = 2, idf = ''):
    sl = str(level)
    display(HTML('<h'+sl+' id="'+idf+'">' + text + '</h'+sl+'>'))
def display_hr():
    display(HTML(' <hr style="height:2px;border-width:0;color:gray;background-color:gray"> '))
for fg in groups:
    links[fg] = 'https://dicom.innolitics.com/ciods/ct-image/' + fg.lower()
    if fg != 'Other-tags':
        display_html(fg)
        display_hr()
        for ffg in groups[fg]:
            btg = BaseTag(tag_for_keyword(ffg))
            tag_group_element = "{0:04x}{1:04x}".format(btg.group, btg.element)
            links[ffg] = 'https://dicom.innolitics.com/ciods/ct-image/'+fg.lower()+'/' + tag_group_element
            if ffg != 'PixelData':
                display_h(dictionary_description(keyword_dict[ffg] )+ ' Attribute: OSIC dataset', idf = ffg)
                tag_eda(ffg)
            display_html(ffg)
            display_hr()
            
    else:
        display_h(fg,1, idf = fg)
        display(HTML('Tags in this groups are not official tags supported for CT images in DICOM'))
        display_hr()
        for ffg in groups[fg]:
            display_h(ffg+ ' Attribute: OSIC dataset', idf = ffg)
            tag_eda(ffg)
            display_hr()
    
import shutil
shutil.rmtree('./html/')

# Case Study
**WARNING: many of the threads presented below are not directly related to the OSIC competition, but more to the DICOM standard.**

Using the documentation, we will try to decipher the data for a sample, randomly selected patient:

In [None]:
np.random.seed(0)
patients = os.listdir('/kaggle/input/osic-pulmonary-fibrosis-progression/train')
patients.sort()
n_patients = len(patients)
patient = patients[np.random.randint(0,n_patients)]
patient

Starting from the beginning, the training folder contains 62 images containing 46 filled in tags (plus PixelData):

In [None]:
# patient metadata with removed useless tags
pm = metadata[metadata['PatientID'].isin([patient])].copy().reset_index().drop(columns = ["index","ImageName"]).dropna(axis = 1, how = 'all')
pm.shape

The set of these images will be called from now on a **series**.

## Patient Module
Tags 
* **PatientID** 
* **PatientName** 

have the same values and are fixed for the entire series:

In [None]:
cols = ['PatientID','PatientName']
pm[cols].groupby(cols).count().reset_index()

* **DeidentificationMethod** value of this tag shown below sounds mysterious (at first glance), but the idea behind this value seems clearer. Since DICOM is the standard for data exchange between hospitals worldwide, personal data about patients (in many cases) should be hidden. So the process of hiding this data (deidentification) is also documented. I have not found an explanation for the value of 'Table', but descriptions of these procedures here:
    - http://dicom.nema.org/medical/dicom/2017a/output/chtml/part16/sect_CID_7050.html
    - http://dicom.nema.org/dicom/cp/CPack-33_PDF/cp563_lb.pdf

and a sample file with the procedure (and De-identification Method Code Sequence) performed here:

https://xnat.bmia.nl/REST/services/dicomdump?src=/archive/projects/stwstrategyps2/experiments/BMIAXNAT_E51650/scans/5&format=html&requested_screen=DicomScanTable.vm

So I assume that 'Table;' value means deidentification (for the purposes of the competition) is done using a dictionary: patient - id (note that tag **PatientIdentityRemoved** is not presented):

In [None]:
cols = ['DeidentificationMethod']
pm[cols].groupby(cols).count().reset_index()

## General Study Module
According to documentation: This module specifies the Attributes that describe and identify the Study performed upon the Patient. Especially in our data, there is only one study for each patient. In particular, according to the documentation: 
* **StudyID** tag value is empty 
* **StudyInstanceUID** is completed on a case-by-case basis (176 unique values in the data, and one unique value per patient):

In [None]:
cols = ['StudyID','StudyInstanceUID']
pm[cols].groupby(cols).count().reset_index()

## General Series Module
According to documentation: This module specifies the Attributes that identify and describe general information about the Series within a Study.  Especially in our data, there is only one series for each study (therefore also one for each patient). 

* **SeriesInstanceUID** is completed on a case-by-case basis (176 unique values in the data, and one unique value per study, and one unique value per patient):

* **Modality** value CT is a term for Computed Tomography. 

* **BodyPartExamined** value is Chest - full list of values (separately for humans and animals) available here: http://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_L.html#chapter_L

* **PatientPosition** value is FFS - Feet First-Supine

In [None]:
cols = ['SeriesInstanceUID','Modality','BodyPartExamined','PatientPosition']
pm[cols].groupby(cols).count().reset_index()

We have the following structure: Patient - Study - Series. Now for each series there are plenty of images in files. Each picture has its unique id across all other images in the SOP Common Module (will be explored further); and unique id across series in the following module:


## General Image Module
This module specifies the Attributes that identify and describe an image within a particular Series. 

* **InstanceNumber** In this case, for each file we have numbers from 1 to 62
* **PatientOrientation** Empty value in this case

In [None]:
cols = ['InstanceNumber','PatientOrientation']
pm[cols].groupby(cols).count().reset_index()

We will proceed to the exploration of the next module by looking at the first image 1.dcm (with *InstanceNumber* = 1):

In [None]:
metadata[metadata['PatientID'].isin([patient]) & metadata['InstanceNumber'].isin([1])][['PatientID','InstanceNumber','ImageName']]

## CT-image and Image-pixels Modules
This modules contains IOD Attributes that describe CT image. From the beginning: in this module you will find information on how to DICOM standard save images, especially particular bits. We have the following tag values for this patient:

* **BitsAllocated** value 16 - same value for each patient from OSIC dataset. Funny fact: the only value mentioned in documentation
* **BitsStored** value 16 - one of the two most common values in OSIC dataset for this tag (other one is 12)
* **HighBit** value 15  - one of the two most common values in OSIC dataset for this tag (other one is 11)

Each Pixel Cell shall contain a single Pixel Sample Value. The size of the Pixel Cell shall be specified by Bits Allocated (0028,0100). Bits Stored (0028,0101) defines the total number of these allocated bits that will be used to represent a Pixel Sample Value. Bits Stored (0028,0101) shall never be larger than Bits Allocated (0028,0100). High Bit (0028,0102) specifies where the high order bit of the Bits Stored (0028,0101) is to be placed with respect to the Bits Allocated (0028,0100) specification. Bits not used for Pixel Sample Values can be used for overlay planes described further in PS3.3 of the DICOM Standard.

For example, in Pixel Data with 16 bits (2 bytes) allocated, 12 bits stored, and bit 15 specified as the high bit, one pixel sample is encoded in each 16-bit word, with the 4 least significant bits of each word not containing Pixel Data.

In [None]:
cols = ['BitsAllocated','BitsStored','HighBit']
pm[cols].groupby(cols).count().reset_index()

* **PixelRepresentation** 0 = unsigned integer. 1 = 2's complement. Data representation of the pixel samples. Each sample shall have the same pixel representation.
* **SamplesPerPixel** 1 = Number of samples (planes) in this image. 
* **PhotometricInterpretation** MONOCHROME2 = Pixel data represent a single monochrome image plane. The minimum sample value is intended to be displayed as black after any VOI gray scale transformations have been performed. See PS3.4. This value may be used only when Samples per Pixel (0028,0002) has a value of 1. May be used for pixel data in a Native (uncompressed) or Encapsulated (compressed) format; 

In [None]:
cols = ['PixelRepresentation','SamplesPerPixel','PhotometricInterpretation']
pm[cols].groupby(cols).count().reset_index()

Let's go ahead and see how it looks in our file. By default pydicom reads in pixel data as the raw bytes found in the file:

In [None]:
first_dcm = dcmread(dir+patient+'/1.dcm') # first image with InstanceNumber = 1
img = first_dcm['PixelData'].value # retrieving value for the PixelData tag
img[:100] # first 100 bytes

The byte is a unit of digital information that most commonly consists of eight bits. So the first two bytes are 16 bits:

In [None]:
img[:1]

This value can be converted to int using built-in python method:

In [None]:
int.from_bytes(img[:1],"big")

That is, in our file/image: the first occurring pixel is 139. Because of the complexity in interpreting the pixel data, *pydicom* provides an easy way to get it in a convenient form: **Dataset.pixel_array**. Let us examine whether our interpretation agrees with the one presented in *pydicom*:

In [None]:
first_dcm.pixel_array[0,0]

Great! So we save this result:

In [None]:
image = first_dcm.pixel_array
image.shape

This result is consistent with the Rows and Columns tag values for this file:

In [None]:
cols = ['Rows','Columns']
pm[cols].groupby(cols).count().reset_index()

Now, each pixel stores the value (SV). Depending on the application, this value should be converted into output units using the following tags:
* **RescaleType** Specifies the output units of Rescale Slope (0028,1053) and Rescale Intercept (0028,1052). Required if the Rescale Type is not HU (Hounsfield Units) (...); US = Unspecified
* **RescaleIntercept**  The value b in relationship between stored values (SV) and the output units. Output units = m*SV+b
* **RescaleSlope** m in the equation specified in Rescale Intercept (0028,1052); typically 1


In [None]:
cols = ['RescaleType','RescaleSlope','RescaleIntercept']
pm[cols].groupby(cols).count().reset_index()

Therefore, our image for correct interpretation should go through a linear transformation:

In [None]:
image_hu = 1.0*image - 1024.0
plt.imshow(image_hu)

## Image-plane Module
* **PixelSpacing** Physical distance in the patient between the center of each pixel, specified by a numeric pair - adjacent row spacing (delimiter) adjacent column spacing in mm. 
* **SliceThickness** Nominal slice thickness, in mm.

In [None]:
pm['PixelSpacing'][0]

In [None]:
cols = ['SliceThickness']
pm[cols].groupby(cols).count().reset_index()

Finally, we are ready to load the whole series into one matrix:

In [None]:
# modyfied from source: https://www.kaggle.com/allunia/pulmonary-dicom-preprocessing

import cv2

def load_scans(patient):
    basepath = "../input/osic-pulmonary-fibrosis-progression/train/"
    dcm_path = basepath + patient
    slices = [dcmread(dcm_path + "/" + file) for file in os.listdir(dcm_path)]
    slices.sort(key = lambda x: float(x.InstanceNumber))
    return slices
def resize_array(pixel_array):
    return cv2.resize(pixel_array, dsize=(319, 319), interpolation=cv2.INTER_CUBIC)
def transform_to_hu(slices):
    images = np.stack([resize_array(file.pixel_array) for file in slices])
    images = images.astype(np.int16)
    for n in range(len(slices)):
        intercept = slices[n].RescaleIntercept
        slope = slices[n].RescaleSlope
        if slope != 1:
            images[n] = slope * images[n].astype(np.float64)
            images[n] = images[n].astype(np.int16)
            
        images[n] += np.int16(intercept)
    return np.array(images, dtype=np.int16)
slices = load_scans(patient)
pixels = transform_to_hu(slices)

## Top - down animation

In [None]:
# source https://www.kaggle.com/danpresil1/dicom-basic-preprocessing-and-visualization
import imageio
from IPython.display import Image

imageio.mimsave("pixels_top_bottom.gif", pixels, duration=0.001)
Image(filename="pixels_top_bottom.gif", format='png')

## Left-right  animation

In [None]:
imageio.mimsave("pixels_left_right.gif", pixels.transpose(2,0,1), duration=0.001)
Image(filename="pixels_left_right.gif", format='png')

## Front-back animation

In [None]:
imageio.mimsave("pixels_front_back.gif", pixels.transpose(1,0,2), duration=0.001)
Image(filename="pixels_front_back.gif", format='png')

## SOP Common Module
Within this module, we have identified each image by a unique id: **SOPInstanceUID** tag has 33206 values in the entire OSIC data, 62 for this patient. It is worth noting that there may still be a **SpecificCharacterSet** tag in this module, which is empty for this case:

In [None]:
cols = ['SOPInstanceUID']
pm[cols].groupby(cols).count().reset_index()

After major modules we have the following structure: Patient (PatientID) - Study (StudyID) - Series (SeriesInstanceUID) - Images (SOPInstanceUID or InstanceNumber). But this is not the end of the *file relationship* history :)
## Frame of Reference Module
This module specifies the Attributes necessary to uniquely identify a Frame of Reference that ensures the spatial relationship of Images within a Series. It also allows Images across multiple Series to share the same Frame Of Reference.

* **FrameOfReferenceUID** tag value 2.25.79882532498596542588570014249557818511 - In this case, I have not found any specific information how the value of this tag refers to a particular image. I checked that the following **FrameRefereneceID** value is not present in this case either in SOP module, study module or anywhere else. At first glance it works the same as **StudyID**

* **PositionReferenceIndicator** This tag is not presented for this patient (as is the case for most patients), but there are patients in the OSIC training set who have this tag filled.

In [None]:
cols = ['FrameOfReferenceUID','PositionReferenceIndicator']
pm[cols].groupby(cols).count().reset_index()

To be continued..
Thank you for reading this notebook :)
P.S. It has been shown statistically that if you have read this notebook to the end, you are a very patient man :)