<center><h1> SIIM-FISABIO-RSNA COVID-19 Detection </h1></center><br>

# What is .DCM format

The DCM file extension is used for DICOM which stands for Digital Imaging and Communications in Medicine. This is the common file format used to store medical imaging data when a patient undergoes a CT, MRI, PET, UltraSound, and many other types of medical scans

üìå<b>DICOM : Digital Imaging and Communication in Medicine</b><br>

<b>About DICOM</b><br>

DICOM¬Æ ‚Äî Digital Imaging and Communications in Medicine ‚Äî is the international standard for medical images and related information. It defines the formats for medical images that can be exchanged with the data and quality necessary for clinical use.

DICOM¬Æ is implemented in almost every radiology, cardiology imaging, and radiotherapy device (X-ray, CT, MRI, ultrasound, etc.), and increasingly in devices in other medical domains such as ophthalmology and dentistry. With hundreds of thousands of medical imaging devices in use, DICOM¬Æ is one of the most widely deployed healthcare messaging Standards in the world. There are literally billions of DICOM¬Æ images currently in use for clinical care.

Since its first publication in 1993, DICOM¬Æ has revolutionized the practice of radiology, allowing the replacement of X-ray film with a fully digital workflow. Much as the Internet has become the platform for new consumer information applications, DICOM¬Æ has enabled advanced medical imaging applications that have ‚Äúchanged the face of clinical medicine‚Äù. From the emergency department, to cardiac stress testing, to breast cancer detection, DICOM¬Æ is the standard that makes medical imaging work ‚Äî for doctors and for patients.

DICOM¬Æ is recognized by the International Organization for Standardization as the ISO 12052 standard.

<b>Data format</b><br>

<img src='https://formats.kaitai.io/dicom/dicom.svg'><br>
DICOM groups information into data sets. For example, a file of a chest x-ray image may contain the patient ID within the file, so that the image can never be separated from this information by mistake. This is similar to the way that image formats such as JPEG can also have embedded tags to identify and otherwise describe the image.

A DICOM data object consists of a number of attributes, including items such as name, ID, etc., and also one special attribute containing the image pixel data (i.e. logically, the main object has no "header" as such, being merely a list of attributes, including the pixel data). A single DICOM object can have only one attribute containing pixel data. For many modalities, this corresponds to a single image. However, the attribute may contain multiple "frames", allowing storage of cine loops or other multi-frame data. Another example is NM data, where an NM image, by definition, is a multi-dimensional multi-frame image. In these cases, three- or four-dimensional data can be encapsulated in a single DICOM object. Pixel data can be compressed using a variety of standards, including JPEG, lossless JPEG, JPEG 2000, and run-length encoding (RLE). LZW (zip) compression can be used for the whole data set (not just the pixel data), but this has rarely been implemented.

DICOM uses three different data element encoding schemes. With explicit value representation (VR) data elements, for VRs that are not OB, OW, OF, SQ, UT, or UN[clarification needed], the format for each data element is: GROUP (2 bytes) ELEMENT (2 bytes) VR (2 bytes) LengthInByte (2 bytes) Data (variable length). For the other explicit data elements or implicit data elements, see section 7.1 of Part 5 of the DICOM Standard.

The same basic format is used for all applications, including network and file usage, but when written to a file, usually a true "header" (containing copies of a few key attributes and details of the application that wrote it) is added.

DICOM official homepage link : https://www.dicomstandard.org/<br>
DICOM wiki link : https://en.wikipedia.org/wiki/DICOM

# Pydicom

Pydicom is a pure Python package for working with DICOM files such as medical images, reports, and radiotherapy objects. Pydicom makes it easy to read these complex files into natural pythonic structures for easy manipulation. Modified datasets can be written again to DICOM format files.

Pydicom Github link : https://github.com/pydicom<br>
Pydicom blog link : https://pydicom.github.io/pydicom/stable/#


### Libraries

In [None]:
import os
import cv2
import warnings
from glob import glob
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut

warnings.filterwarnings('ignore')

# Convert a DICOM Dataset to Image

<b>Progress</b><br>
<img src='https://t1.daumcdn.net/cfile/tistory/9979A1365EF2EFBA21'><br>


### Read a DICOM Dataset

reference url : https://pydicom.github.io/pydicom/stable/auto_examples/input_output/plot_read_dicom.html#sphx-glr-auto-examples-input-output-plot-read-dicom-py

In [None]:
dataset_dir = '../input/siim-covid19-detection'

dicom_paths = glob(f'{dataset_dir}/train/*/*/*.dcm')
for path in dicom_paths[:5]:
    print(path)

### Print a DICOM Dataset

reference url : https://github.com/pydicom/pydicom/issues/319

In [None]:
def dicom_dataset_to_dict(dicom_header):
    dicom_dict = {}
    repr(dicom_header)
    for dicom_value in dicom_header.values():
        if dicom_value.tag == (0x7fe0, 0x0010):
            # discard pixel data
            continue
        if type(dicom_value.value) == pydicom.dataset.Dataset:
            dicom_dict[dicom_value.name] = dicom_dataset_to_dict(dicom_value.value)
        else:
            v = _convert_value(dicom_value.value)
            dicom_dict[dicom_value.name] = v
    
    for d in dicom_dict:
        print('{} : {}'.format(d, dicom_dict[d]))


def _sanitise_unicode(s):
    return s.replace(u"\u0000", "").strip()


def _convert_value(v):
    t = type(v)
    if t in (list, int, float):
        cv = v
    elif t == str:
        cv = _sanitise_unicode(v)
    elif t == bytes:
        s = v.decode('ascii', 'replace')
        cv = _sanitise_unicode(s)
    elif t == pydicom.valuerep.DSfloat:
        cv = float(v)
    elif t == pydicom.valuerep.IS:
        cv = int(v)
    else:
        cv = repr(v)
    return cv

In [None]:
ds = pydicom.dcmread(dicom_paths[0])
dicom_dataset_to_dict(ds)

### Simple Method of looking at the Images

In [None]:
fig, ax = plt.subplots(1, 3, figsize=(18,6))

ds0 = pydicom.dcmread(dicom_paths[0]).pixel_array
ds1 = pydicom.dcmread(dicom_paths[1]).pixel_array
ds2 = pydicom.dcmread(dicom_paths[2]).pixel_array

ax[0].imshow(ds0)
ax[1].imshow(ds1, cmap=plt.cm.bone)
ax[2].imshow(ds2, cmap='gray')
plt.show()

### Hounsfiled Unit (HU) ###

üìå<b>About Hounsfiled Unit</b><br>
    
The Hounsfield unit (HU) scale is a linear transformation of the original linear attenuation coefficient measurement into one in which the radiodensity of distilled water at standard pressure and temperature (STP) is defined as zero Hounsfield units (HU), while the radiodensity of air at STP is defined as ‚àí1000 HU. In a voxel with average linear attenuation coefficient {\displaystyle \mu }\mu , the corresponding HU value is therefore given by:

<img src='https://wikimedia.org/api/rest_v1/media/math/render/svg/ecfb5f44205930f7a33a9c240f41eb94051f3f01'/><br>

Thus, a change of one Hounsfield unit (HU) represents a change of 0.1% of the attenuation coefficient of water since the attenuation coefficient of air is nearly zero.
It is the definition for CT scanners that are calibrated with reference to water.
<br>
    
<b> Values for Different body tissues and material</b><br>

<img src = 'https://t1.daumcdn.net/cfile/tistory/99A51D3E5EF2F3B226'/><br>
HU applies to medical-grade dual-energy CT scans but not to cone beam computed tomography (CBCT) scans.
Values reported here are approximations. Different dynamics are reported from one study to another.
Exact HU dynamics can vary from one CT acquisition to another due to CT acquisition and reconstruction parameters (kV, filters, reconstruction algorithms, etc.). The use of contrast agents modifies HU as well in some body parts (mainly blood).

HU Table url : https://en.wikipedia.org/wiki/Hounsfield_scale

### apply_voi_lut ###

reference url : https://www.kaggle.com/tanlikesmath/siim-covid-19-detection-a-simple-eda

In [None]:
def dicom2array(path, voi_lut=True, fix_monochrome=True):
    dicom = pydicom.read_file(path)
    # VOI LUT (if available by DICOM device) is used to
    # transform raw DICOM data to "human-friendly" view
    if voi_lut:
        data = apply_voi_lut(dicom.pixel_array, dicom)
    else:
        data = dicom.pixel_array
    # depending on this value, X-ray may look inverted - fix that:
    if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
    return data
        
    
def plot_img(img, size=(7, 7), is_rgb=True, title="", cmap='gray'):
    plt.figure(figsize=size)
    plt.imshow(img, cmap=cmap)
    plt.suptitle(title)
    plt.show()


def plot_imgs(imgs, cols=4, size=7, is_rgb=True, title="", cmap='gray', img_size=(500,500)):
    rows = len(imgs)//cols + 1
    fig = plt.figure(figsize=(cols*size, rows*size))
    for i, img in enumerate(imgs):
        if img_size is not None:
            img = cv2.resize(img, img_size)
        fig.add_subplot(rows, cols, i+1)
        plt.imshow(img, cmap=plt.cm.bone)
    plt.suptitle(title)
    plt.show()

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(18, 6))

ds0 = pydicom.dcmread(dicom_paths[0]).pixel_array
ds1 = dicom2array(dicom_paths[0])

ax[0].imshow(ds0, cmap=plt.cm.bone)
ax[0].set_title('DICOM -> Array')
ax[1].imshow(ds1, cmap=plt.cm.bone)
ax[1].set_title('apply_voi_lut( )')

plt.show()

In [None]:
imgs = [dicom2array(path) for path in dicom_paths[:16]]
plot_imgs(imgs)

#### Thanks for reading my work