<img src="https://i.postimg.cc/pLZtqGrC/normal-and-impaired-gas-exchange.png">

<img src="https://i.postimg.cc/sgQhWNcB/Ipf-NIH.jpg" align="right" width="400" height="250">
The word “pulmonary” means lung and the word “fibrosis” means scar tissue— similar to scars that you may have on your skin from an old injury or surgery. So, in its simplest sense, pulmonary fibrosis (PF) means scarring in the lungs. Over time, the scar tissue can destroy the normal lung and make it hard for oxygen to get into your blood. Low oxygen levels (and the stiff scar tissue itself) can cause you to feel short of breath, particularly when walking and exercising. Pulmonary fibrosis isn’t just one disease. It is a family of more than 200 different lung diseases that all look very much alike. The PF family of lung diseases falls into an even larger group of diseases called the interstitial lung diseases (also known as ILD), which includes all of the diseases that have inflammation and/or scarring in the lung. Some interstitial lung diseases don’t include scar tissue. When an interstitial lung disease does include scar tissue in the lung, we call it pulmonary fibrosis.

No one is certain how many people are affected by PF. One recent study estimated that idiopathic pulmonary fibrosis (or IPF, which is just one of more than 200 types of PF) affects 1 out of 200 adults over the age of 60 in the United States—that translates to more than 200,000 people living with PF today. Approximately 50,000 new cases are diagnosed each year and as many as 40,000 Americans die from IPF each year.

<a href="https://www.pulmonaryfibrosis.org/life-with-pf/about-pf">Ref</a>

<BR CLEAR=”left” />

My goal in this notebook is to explore CSVs as well as dicom files to get a general idea of what kind of data we are dealing with and to have a hunch about their behavior. enjoy =)

# Contents

* [<font size=4>Libraries For Fun</font>](#1)
* [<font size=4>Dicom Files</font>](#2)
 *     [1. Image Type](#2.1)
 *     [2. Manufacturer / Manufacturer's model name](#2.2)
 *     [3. Slice Thickness](#2.3)
 *     [4. KVP](#2.4)
 *     [5. Spacing Between Slices](#2.5)
 *     [6. Table Height](#2.6)
 *     [7. Convolution Kernel](#2.7)
 *     [8. Patient Position](#2.8)
 *     [9. Instance Number](#2.9)
 *     [10. Image Position & Image Orientation (Patient) ](#2.10)
 *     [11. Position Reference Indicator ](#2.11)
 *     [12. Slice Location Attribute](#2.12)
 *     [13. Rows & Columns](#2.13)
 *     [14. Pixel Spacing Attribute](#2.14)
 *     [15. Bits Stored & High Bit](#2.15)
 *     [16. Pixel Representation Attribute](#2.16)
 *     [17. Window Center & Window Width ](#2.17)
 *     [18. Rescale Intercept & Rescale Slope](#2.18)
 *     [19. Images](#2.19)
* [<font size=4>Train and Test</font>](#3)
 *     [1. Smoking Status ](#3.1)
 *     [2. Sex ](#3.2)
 *     [3. Age](#3.3)
 *     [4. FVC & Percentage](#3.4)
 *     [5. Heatmap](#3.5)
* [<font size=4>Conclusion</font>](#4)

# Libraries For Fun <a id="1"></a>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pydicom
from pydicom import dcmread
import glob, os
from collections import defaultdict
import tqdm
import gc
import seaborn as sns
import ast
import plotly.express as px
from pandas_profiling import ProfileReport 
pd.options.display.max_columns = None
import cv2

from collections import defaultdict
import collections
import imageio
from IPython.display import HTML

import plotly.offline as pyo
from scipy import ndimage, misc
import warnings
warnings.filterwarnings('ignore')

pyo.init_notebook_mode()

# Dicom Files <a id="2"></a>

First we gonna explore dicom files.

<p><b>Digital Imaging and Communications in Medicine</b> (<b>DICOM</b>) is the standard for the communication and management of medical imaging information and related data.<sup id="cite_ref-1" class="reference"><a href="#cite_note-1">[1]</a></sup> DICOM is most commonly used for storing and <a href="/wiki/Data_transmission" title="Data transmission">transmitting</a> <a href="/wiki/Medical_imaging" title="Medical imaging">medical images</a> enabling the integration of medical imaging devices  such as scanners, servers, workstations, printers, network hardware, and <a href="/wiki/Picture_archiving_and_communication_system" title="Picture archiving and communication system">picture archiving and communication systems</a> (PACS) from multiple manufacturers. It has been widely adopted by <a href="/wiki/Hospital" title="Hospital">hospitals</a> and is making inroads into  smaller applications like dentists' and doctors' offices.
</p>

<p>DICOM files can be exchanged between two entities that are capable of receiving image and patient data in DICOM format. The different devices come with DICOM Conformance Statements which state which DICOM classes they support. The standard includes a <a href="/wiki/File_format" title="File format">file format</a> definition and a network <a href="/wiki/Communications_protocol" class="mw-redirect" title="Communications protocol">communications protocol</a> that uses <a href="/wiki/TCP/IP" class="mw-redirect" title="TCP/IP">TCP/IP</a> to communicate between systems.
</p>

<p>The <a href="/wiki/National_Electrical_Manufacturers_Association" title="National Electrical Manufacturers Association">National Electrical Manufacturers Association</a> (NEMA) holds the copyright to the published standard<sup id="cite_ref-2" class="reference"><a href="#cite_note-2">[2]</a></sup> which was developed by the DICOM Standards Committee, whose members<sup id="cite_ref-3" class="reference"><a href="#cite_note-3">[3]</a></sup> are also partly members of NEMA.<sup id="cite_ref-4" class="reference"><a href="#cite_note-4">[4]</a></sup> It is also known as <a href="/wiki/National_Electrical_Manufacturers_Association" title="National Electrical Manufacturers Association">NEMA</a> standard PS3, and as <a href="/wiki/ISO_standard" class="mw-redirect" title="ISO standard">ISO standard</a> 12052:2017 "Health informatics -- Digital imaging and communication in medicine (DICOM) including workflow and data management".
</p>

<a href="https://en.wikipedia.org/wiki/DICOM">Ref</a>

A preview of a random dicom file :

In [None]:
pydicom.read_file("../input/osic-pulmonary-fibrosis-progression/train/ID00213637202257692916109/29.dcm")

<img src="https://i.postimg.cc/2jnrKRkd/Sans-titre.png" align="right" width="600" height="400">

To handle dicom files well I decided to insert them in a pandas dataframe where each column represents Attribute stored in the dicom files except the images and the given array like 'ImageType' or 'PixelSpacing' will be split into several columns and each dataframe line represents a dicom file.

As shown in the image on the right.

I merged the result with the csv train.

I did this whole process on another notebook because it takes a long time to run.

My notebook to prepare the data: <a href="https://www.kaggle.com/servietsky/osic-transform-dicom-into-dataframe?scriptVersionId=40328207">OSIC : Transform DICOM into DataFrame</a>


<BR CLEAR=”left” />

In [None]:
train = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/train.csv')
test = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/test.csv')
Data = pd.read_pickle('../input/osic-transform-dicom-into-dataframe/output_data.pkl')

Train and test data

In [None]:
print('Head :')

display(Data.head())

print('Info :')

Data.info()

## 1. Image Type <a id="2.1"></a>

<div>
<div>
<div>
<div>
<h6>
Image Type</h6>
</div>
</div>
</div>
<p>
Image Type (0008,0008) identifies important image identification characteristics. These characteristics are:</p>
<div>
<ol type="a">
<li>
<p>
Pixel Data Characteristics</p>
<div>
<ol type="1">
<li>
<p>
is the image an ORIGINAL Image; an image whose pixel values are based on original or source data</p>
</li>
<li>
<p>
is the image a DERIVED Image; an image whose pixel values have been derived in some manner from the pixel value of one or more other images</p>
</li>
</ol>
</div>
</li>
<li>
<p>
Patient Examination Characteristics</p>
<div>
<ol type="1">
<li>
<p>
is the image a PRIMARY Image; an image created as a direct result of the patient examination</p>
</li>
<li>
<p>
is the image a SECONDARY Image; an image created after the initial patient examination</p>
</li>
</ol>
</div>
</li>
<li>
<p>
Modality Specific Characteristics</p>
</li>
<li>
<p>
Implementation specific identifiers; other implementation specific identifiers shall be documented in an implementation's conformance statement.</p>
</li>
</ol>
</div>
<p>
The Image Type Attribute is multi-valued and shall be provided in the following manner:</p>
<div>
<ol type="a">
<li>
<p>
Value 1 shall identify the Pixel Data Characteristics</p>
<div>
<p>
<strong>Enumerated Values:</strong>
</p>
<dl>
<dt>
<span>ORIGINAL</span>
</dt>
<dd>
<p>
identifies an Original Image</p>
</dd>
<dt>
<span>DERIVED</span>
</dt>
<dd>
<p>
identifies a Derived Image</p>
</dd>
</dl>
</div>
</li>
<li>
<p>
Value 2 shall identify the Patient Examination Characteristics</p>
<div>
<p>
<strong>Enumerated Values:</strong>
</p>
<dl>
<dt>
<span>PRIMARY</span>
</dt>
<dd>
<p>
identifies a Primary Image</p>
</dd>
<dt>
<span>SECONDARY</span>
</dt>
<dd>
<p>
identifies a Secondary Image</p>
</dd>
</dl>
</div>
</li>
<li>
<p>
Value 3 shall identify any Image IOD specific specialization (optional)</p>
</li>
<li>
<p>
Other Values that are implementation specific (optional)</p>
</li>
</ol>
</div>
<p>
Any of the optional values (value 3 and beyond) may be encoded either with a value or zero-length, independent of other optional values, unless otherwise specified by a specialization of this Attribute in an IOD.</p>
<p>
If the pixel data of the derived Image is different from the pixel data of the source images and this difference is expected to affect professional interpretation of the image, the Derived Image shall have a UID different than all the source images.</p>
</div>

<a href="https://dicom.innolitics.com/ciods/cr-image/general-image/00080008">Image Type</a>

In [None]:
type_dict_all = ['ORIGINAL', 'PRIMARY', 'AXIAL', 'CT_SOM5 SPI', 'HELIX', 'CT_SOM5 SEQ', 'SECONDARY', 'DERIVED', 'JP2K LOSSY 6:1', 'VOLUME', 'OTHER', 'CSA MPR', 'CSAPARALLEL', 
                'CSA RESAMPLED', 'REFORMATTED', 'AVERAGE', 'CT_SOM7 SPI DUAL', 'STD', 'SNRG', 'DET_AB']
sns.set(rc={'figure.figsize':(15,7.5)})
plt.xticks(rotation=45)
ax = sns.barplot(y=0, x = Data[type_dict_all].sum().to_frame().sort_values(0,  ascending=False).index,  data=Data[type_dict_all].sum().to_frame().sort_values(0,  ascending=False))

In [None]:
tmp = Data.groupby('Manufacturer')[type_dict_all].sum()
tmp = pd.melt(tmp.reset_index(), id_vars=['Manufacturer'])
tmp.columns = ['Manufacturer','ImageType', 'Value']
sns.factorplot(x='Manufacturer', y='Value', data=tmp, kind='bar' , hue = 'ImageType', size=10, aspect=3)

The three most common types of images are axial, original and primary and they are generated by Siemence, Toshiba, Philips and GE Medical Systems.

However, it is important to specify that Philips generates images of type CT_SOMS SPI and Siemence generates images of type HELIX.

We will see the marks in the next section.

## 2. Manufacturer / Manufacturer's model name <a id="2.2"></a>

Manufacturer of the equipment that produced the Composite Instances.
Manufacturer's model name of the equipment that is to be used for beam delivery.

<a href="https://dicom.innolitics.com/ciods/rt-plan/general-equipment/00080070">Manufacturer / Manufacturer's model name </a>

In [None]:
plt.figure(figsize=(30,10))

plt.subplot(1,2,1)

sns.set(rc={'figure.figsize':(15,7.5)})
tmp = Data['Manufacturer'].value_counts(ascending=False).to_frame().reset_index()
tmp
ax1 = sns.barplot(y='index', x = 'Manufacturer',  data=tmp)

plt.subplot(1,2,2)

sns.set(rc={'figure.figsize':(15,7.5)})
tmp = Data['ManufacturerModelName'].value_counts(ascending=False).to_frame().reset_index()
tmp
ax2 = sns.barplot(y='index', x = 'ManufacturerModelName',  data=tmp)

In [None]:
tmp = Data.groupby(['Manufacturer','ManufacturerModelName']).count()['PatientID'].to_frame().reset_index()
sns.factorplot(x='Manufacturer', y='PatientID', data=tmp, kind='bar' , hue = 'ManufacturerModelName', size=10 , aspect=3 )#, palette=tmp['ManufacturerModelName'])

<img src="https://i.postimg.cc/t4h8yPvY/scanner-toshiba-aquilion-32-slice.jpg" align="right" width="500" height="300">
Most popular Manufacturer ModelName:

1. TOSHIBA	Aquilion
2. GE MEDICAL SYSTEMS	LightSpeed VCT	
3. GE MEDICAL SYSTEMS	OsiriX
4. SIEMENS	Sensation 16
5. SIEMENS	OsiriX	

Most popular Model Name for each Manufacturer :

* TOSHIBA Aquilion (Image in the left)
* GE MEDICAL SYSTEMS LightSpeed VCT
* SIEMENS Sensation 16
* Philips Brilliance 64
* Hitachi Medical Corporation	ECLOS	
* PACSGEAR	LightSpeed VCT
* PACSMATT	OsiriX	
<BR CLEAR=”left” />

## 3. Slice Thickness <a id="2.3"></a>

Nominal slice thickness, in mm.

<a href="https://dicom.innolitics.com/ciods/ct-image/image-plane/00180050">Slice Thickness</a>

In [None]:
tmp = Data['SliceThickness'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
tmp.columns = ['SliceThickness','Count']
ax = sns.barplot(y='Count', x = 'SliceThickness',  data=tmp)

Mostly 1 mm and sometimes 1.25, 0.625 and 0.5

## 4. KVP <a id="2.4"></a>

Peak kilo voltage output of the X-Ray generator used.

<a href="https://dicom.innolitics.com/ciods/digital-x-ray-image/x-ray-generation/00180060">KVP</a>

In [None]:
tmp = Data['KVP'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
tmp.columns = ['KVP','Count']
ax = sns.barplot(y='Count', x = 'KVP',  data=tmp)

Mostly 120 KVP

## 5. Spacing Between Slices <a id="2.5"></a>

Spacing between slices, in mm, measured from center-to-center of each slice along the normal to the first image. The sign of the Spacing Between Slices (0018,0088) determines the direction of stacking. The normal is determined by the cross product of the direction cosines of the first row and first column of the first frame, such that a positive spacing indicates slices are stacked behind the first slice and a negative spacing indicates slices are stacked in front of the first slice. See Image Orientation (0020,0037) in the NM Detector Module.

<a href="https://dicom.innolitics.com/ciods/nm-image/nm-reconstruction/00180088">Spacing Between Slices</a>

In [None]:
tmp = Data['SpacingBetweenSlices'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
tmp.columns = ['SpacingBetweenSlices','Count']
ax = sns.barplot(y='Count', x = 'SpacingBetweenSlices',  data=tmp)

Mostly 0 mm

## 6. Table Height <a id="2.6"></a>

The distance in mm of the top of the patient table to the center of rotation; below the center is positive.

<a href="https://dicom.innolitics.com/ciods/ct-image/ct-image/00181130">Table Height</a>

In [None]:
ig, ax = plt.subplots()

tmp = Data['TableHeight'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
sns.set(rc={'figure.figsize':(20,10)})
sns.distplot(tmp["TableHeight"])

ax2 = plt.axes([0.7, 0.5, .15, .3], facecolor='y')
ax2 = sns.violinplot(y=tmp["TableHeight"],  ax=ax2)

The distance mainly between 0 and 1000 mm and there are rare cases where this distance is greater.

## 7. X-Ray Tube Current <a id="2.7"></a>

X-Ray Tube Current in mA.

<a href="https://dicom.innolitics.com/ciods/digital-x-ray-image/x-ray-acquisition-dose/00181151">X-Ray Tube Current</a>

In [None]:
ig, ax = plt.subplots()

tmp = Data['XRayTubeCurrent'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
sns.set(rc={'figure.figsize':(20,10)})
sns.distplot(tmp["XRayTubeCurrent"])

ax2 = plt.axes([0.7, 0.5, .15, .3], facecolor='y')
ax2 = sns.violinplot(y=tmp["XRayTubeCurrent"],  ax=ax2)

Mostly between 0 and 500 mA

## 7. Convolution Kernel <a id="2.7"></a>

A label describing the convolution kernel or algorithm used to reconstruct the data

<a href="https://dicom.innolitics.com/ciods/ct-image/ct-image/00181210">Convolution Kernel</a>

In [None]:
tmp = Data['ConvolutionKernel'].value_counts().to_frame().reset_index().sort_values(by = 'ConvolutionKernel',ascending = False)
tmp.columns = ['Convolution Kernel','Count']
plt.xticks(rotation=45)
ax = sns.barplot(y='Count', x = 'Convolution Kernel',  data=tmp)

Top 5 most used Convolution Kernel :

* LUNG
* C
* B70f
* BONEPLUS 
* FC01

## 8. Patient Position <a id="2.8"></a>

<div class="external-reference"><div>
<div>
<div>
<div>
<h6>
&nbsp;Patient Position</h6>
</div>
</div>
</div>
<p>
Patient Position (0018,5100) specifies the position of the patient relative to the imaging equipment space. This Attribute is intended for annotation purposes only. It does not provide an exact mathematical relationship of the patient to the imaging equipment.</p>
<p>
When multiple subjects are present in the same image, and arranged with different positions, then the Patient Position (0018,5100) in the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.3.html#sect_C.7.3.1">General Series Module</a> is nominal, does not apply to each subject, but does define the relationship of the nominal Patient-Based Coordinate System to the machine.</p>
<div>
<h3>Note</h3>
<p>
In conjunction with the Patient Position (0018,5100) in each Item of the Group of Patients Identification Sequence (0010,0027), Patient Position (0018,5100) in the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.3.html#sect_C.7.3.1">General Series Module</a> may be helpful to compute patient-relative spatial information for each subject from the Attributes of the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.2.html#sect_C.7.6.2">Image Plane Module</a>.</p>
</div>
<p>
When facing the front of the imaging equipment, Head First is defined as the patient's head being positioned toward the front of the imaging equipment (i.e., head entering the front of the equipment). Feet First is defined as the patient's feet being positioned toward the front of the imaging equipment (i.e., feet entering the front of the equipment). Left First is defined as the patient's left side being positioned towards the front of the imaging equipment (i.e., patient's left side entering the front of the equipment). Right First is defined as the patient's right being positioned towards the front of the imaging equipment (i.e., patient's right side entering the front of the equipment). Prone is defined as the patient's face being positioned in a downward (gravity) direction. Supine is defined as the patient's face being in an upward direction. Decubitus Right is defined as the patient's right side
                                being in a downward direction. Decubitus Left is defined as the patient's left side being in a downward direction.</p>
<div>
<p>
<strong>Defined Terms:</strong>
</p>
<dl>
<dt>
<span>HFP</span>
</dt>
<dd>
<p>
Head First-Prone</p>
</dd>
<dt>
<span>HFS</span>
</dt>
<dd>
<p>
Head First-Supine</p>
</dd>
<dt>
<span>HFDR</span>
</dt>
<dd>
<p>
Head First-Decubitus Right</p>
</dd>
<dt>
<span>HFDL</span>
</dt>
<dd>
<p>
Head First-Decubitus Left</p>
</dd>
<dt>
<span>FFDR</span>
</dt>
<dd>
<p>
Feet First-Decubitus Right</p>
</dd>
<dt>
<span>FFDL</span>
</dt>
<dd>
<p>
Feet First-Decubitus Left</p>
</dd>
<dt>
<span>FFP</span>
</dt>
<dd>
<p>
Feet First-Prone</p>
</dd>
<dt>
<span>FFS</span>
</dt>
<dd>
<p>
Feet First-Supine</p>
</dd>
<dt>
<span>LFP</span>
</dt>
<dd>
<p>
Left First-Prone</p>
</dd>
<dt>
<span>LFS</span>
</dt>
<dd>
<p>
Left First-Supine</p>
</dd>
<dt>
<span>RFP</span>
</dt>
<dd>
<p>
Right First-Prone</p>
</dd>
<dt>
<span>RFS</span>
</dt>
<dd>
<p>
Right First-Supine</p>
</dd>
<dt>
<span>AFDR</span>
</dt>
<dd>
<p>
Anterior First-Decubitus Right</p>
</dd>
<dt>
<span>AFDL</span>
</dt>
<dd>
<p>
Anterior First-Decubitus Left</p>
</dd>
<dt>
<span>PFDR</span>
</dt>
<dd>
<p>
Posterior First-Decubitus Right</p>
</dd>
<dt>
<span>PFDL</span>
</dt>
<dd>
<p>
Posterior First-Decubitus Left</p>
</dd>
</dl>
</div>
<div>
<h3>Note</h3>
<div>
<ol type="1">
<li>
<p>
For quadrupeds, separate concepts for ventral and dorsal are not introduced, rather it is expected that anterior and posterior will be considered synonymous as they are when applied to the trunk.</p>
</li>
<li>
<p>
There are no decubitus variants of left or right first, since for imaging equipment that is aligned horizontally with respect to gravity the patient cannot be both decubitus and have the left or right side towards the front of the imaging equipment.</p>
</li>
<li>
<p>
There are no prone or supine variants of anterior or posterior first, since for imaging equipment that is aligned horizontally with respect to gravity the patient cannot be prone or supine and have the anterior or posterior side towards the front of the imaging equipment.</p>
</li>
</ol>
</div>
</div>
<p>
The <a href="http://dicom.nema.org/medical/dicom/current/output/html/part03.html#figure_C.7.3.1.1.2-1">Figure&nbsp;C.7.3.1.1.2-1</a> illustrates some of these Defined Terms for imaging equipment with a table, such as in X-Ray Angiography. The orientation of the patient related to gravity is always recumbent.</p>
<div>
<div>
<div>
<img src="http://dicom.nema.org/medical/dicom/current/output/html/figures/PS3.3_C.7.3.1.1.2-1.svg">
</div>
</div>
<p>
<strong>Figure&nbsp;C.7.3.1.1.2-1.&nbsp;Representation of the Eight Different Patient Positions on the X-Ray Table</strong>
</p>
</div>
<br>
<div>
<div>
<div>
<img src="http://dicom.nema.org/medical/dicom/current/output/html/figures/PS3.3_C.7.3.1.1.2-2.svg">
</div>
</div>
<p>
<strong>Figure&nbsp;C.7.3.1.1.2-2.&nbsp;Example of Right First-Prone (RFP) Patient Position Relative to the Gantry and Table for a Small Animal</strong>
</p>
</div>
<br>
</div></div>

<a href="https://dicom.innolitics.com/ciods/ct-image/ct-image/00181210">Patient Position</a>

In [None]:
tmp = Data['PatientPosition'].value_counts().to_frame().reset_index().sort_values(by = 'index')
tmp.columns = ['PatientPosition','Count']
plt.xticks(rotation=45)
ax = sns.barplot(y='Count', x = 'PatientPosition',  data=tmp)

Most used positions :
<img src="https://i.postimg.cc/bNWMcLzx/pos.png" align="right" width="200" height="100"> 
* HFS : Head First-Supine 

* FFS : Feet First-Supine
<BR CLEAR=”left” />

## 9. Instance Number <a id="2.9"></a>

number that identifies this image, This Attribute was named Image Number in earlier versions of this Standard.

<a href="https://dicom.innolitics.com/ciods/ct-image/general-image/00200013">Instance Number</a>

In [None]:
ig, ax = plt.subplots()

tmp = Data['InstanceNumber'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
sns.set(rc={'figure.figsize':(20,10)})
sns.distplot(tmp["InstanceNumber"])

ax2 = plt.axes([0.7, 0.5, .15, .3], facecolor='y')
ax2 = sns.violinplot(y=tmp["InstanceNumber"],  ax=ax2)

Mostly between 0 and 100 but whatever number doesn't matter it just gives the number of the picture it doesn't give any information about pulmonary fibrosis.

## 10. Image Position & Image Orientation (Patient) <a id="2.10"></a>

Image Position is the x, y, and z coordinates of the upper left hand corner (center of the first voxel transmitted) of the image, in mm. See Section C.7.6.2.1.1 for further explanation.

Image Orientation is the direction cosines of the first row and the first column with respect to the patient. See Section C.7.6.2.1.1 for further explanation.

<div class="m-a-1 detail-pane-section"><h2 class="section-title text-secondary">Section&nbsp;</h2><div class="external-reference"><div>
<div>
<div>
<div>
<h6>
&nbsp;Image Position and Image Orientation</h6>
</div>
</div>
</div>
<p>
Image Position (0020,0032) specifies the x, y, and z coordinates of the upper left hand corner of the image; it is the center of the first voxel transmitted. Image Orientation (0020,0037) specifies the direction cosines of the first row and the first column with respect to the patient. These Attributes shall be provide as a pair. Row value for the x, y, and z axes respectively followed by the Column value for the x, y, and z axes respectively.</p>
<p>
The direction of the axes is defined fully by the patient's orientation.</p>
<p>
If Anatomical Orientation Type (0010,2210) is absent or has a value of BIPED, the x-axis is increasing to the left hand side of the patient. The y-axis is increasing to the posterior side of the patient. The z-axis is increasing toward the head of the patient.</p>
<p>
If Anatomical Orientation Type (0010,2210) has a value of QUADRUPED, the</p>
<div>
<ul>
<li>
<p>
x-axis is increasing to the left (as opposed to right) side of the patient</p>
</li>
<li>
<p>
the y-axis is increasing towards</p>
<div>
<ul>
<li>
<p>
the dorsal (as opposed to ventral) side of the patient for the neck, trunk and tail,</p>
</li>
<li>
<p>
the dorsal (as opposed to ventral) side of the patient for the head,</p>
</li>
<li>
<p>
the dorsal (as opposed to plantar or palmar) side of the distal limbs,</p>
</li>
<li>
<p>
the cranial (as opposed caudal) side of the proximal limbs, and</p>
</li>
</ul>
</div>
</li>
<li>
<p>
the z-axis is increasing towards</p>
<div>
<ul>
<li>
<p>
the cranial (as opposed to caudal) end of the patient for the neck, trunk and tail,</p>
</li>
<li>
<p>
the rostral (as opposed to caudal) end of the patient for the head, and</p>
</li>
<li>
<p>
the proximal (as opposed to distal) end of the limbs</p>
</li>
</ul>
</div>
</li>
</ul>
</div>
<div>
<h3>Note</h3>
<div>
<ol type="1">
<li>
<p>
The axes for quadrupeds are those defined and illustrated in Smallwood et al for proper anatomic directional terms as they apply to various parts of the body.</p>
</li>
<li>
<p>
It should be anticipated that when quadrupeds are imaged on human equipment, and particularly when they are position in a manner different from the traditional human prone and supine head or feet first longitudinal position, then the equipment may well not indicate the correct orientation, though it will remain an orthogonal Cartesian right-handed system that could be corrected subsequently.</p>
</li>
</ol>
</div>
</div>
<p>
The Patient-Based Coordinate System is a right handed system, i.e., the vector cross product of a unit vector along the positive x-axis and a unit vector along the positive y-axis is equal to a unit vector along the positive z-axis.</p>
<div>
<h3>Note</h3>
<p>
If a patient is positioned parallel to the ground, in dorsal recumbency (i.e., for humans, face-up on the table), with the caudo-cranial (i.e., for humans, feet-to-head) direction the same as the front-to-back direction of the imaging equipment, the direction of the axes of this Patient-Based Coordinate System and the Equipment-Based Coordinate System in previous versions of this Standard will coincide.</p>
</div>
<p>
The Image Plane Attributes, in conjunction with the Pixel Spacing Attribute, describe the position and orientation of the image slices relative to the Patient-Based Coordinate System. In each image frame Image Position (Patient) (0020,0032) specifies the origin of the image with respect to the Patient-Based Coordinate System. RCS and Image Orientation (Patient) (0020,0037) values specify the orientation of the image frame rows and columns. The mapping of pixel location (i,j) to the RCS is calculated as follows:</p>
<p>
</p>
<div>
<p>
<strong>Equation&nbsp;.&nbsp;</strong>
</p>
<div>
<img src="http://dicom.nema.org/medical/dicom/current/output/html/figures/part03_withmml_image_1.svg">
</div>
</div>
<p>
<br>
</p>
<p>
Where:</p>
<div>
<ul>
<li>
<p>
P<sub>xyz</sub> The coordinates of the voxel (i,j) in the frame's image plane in units of mm.</p>
</li>
<li>
<p>
S<sub>xyz</sub> The three values of Image Position (Patient) (0020,0032). It is the location in mm from the origin of the RCS.</p>
</li>
<li>
<p>
X<sub>xyz</sub> The values from the row (X) direction cosine of Image Orientation (Patient) (0020,0037).</p>
</li>
<li>
<p>
Y<sub>xyz</sub> The values from the column (Y) direction cosine of Image Orientation (Patient) (0020,0037).</p>
</li>
<li>
<p>
<span>i</span> Column index to the image plane. The first column is index zero.</p>
</li>
<li>
<p>
<span>Δ<sub>i</sub>
</span> Column pixel resolution of Pixel Spacing (0028,0030) in units of mm.</p>
</li>
<li>
<p>
<span>j</span> Row index to the image plane. The first row index is zero.</p>
</li>
<li>
<p>
<span>Δ<sub>j</sub>
</span> Row pixel resolution of Pixel Spacing (0028,0030) in units of mm.</p>
</li>
</ul>
</div>
<p>
Additional constraints apply:</p>
<div>
<ol type="1">
<li>
<p>
The row and column direction cosine vectors shall be orthogonal, i.e., their dot product shall be zero.</p>
</li>
<li>
<p>
The row and column direction cosine vectors shall be normal, i.e., the dot product of each direction cosine vector with itself shall be unity.</p>
</li>
</ol>
</div>
</div></div></div>


<a href="https://dicom.innolitics.com/ciods/ct-image/image-plane/00200032">Image Position & Image Orientation (Patient)</a>

<img src="https://i.postimg.cc/cHcGMz87/x-y-z.png" align="right" width="500" height="300"> 

Then the 3 parameters which give the position of the upper left voxel of the image vary as shown in the image on the right.
correct me if im wrong.

let's see the position of the coordinates on the position of the image in a 3d space:
<BR CLEAR=”left” />

In [None]:
fig = px.scatter_3d(Data, x='ImagePositionPatient_x', y='ImagePositionPatient_y', z='ImagePositionPatient_z', color='PatientID')
# fig.update_layout(autosize=False,
#                   scene_camera_eye=dict(x=1.87, y=0.88, z=-0.64),
#                   width=500, height=500,
#                   margin=dict(l=65, r=50, b=65, t=90)
# )
fig.update_traces(marker=dict(size=5,
                              line=dict(width=0,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
fig.update_layout(showlegend=False) 
fig.show()


Each color represents a distinct patient.

First we can clearly see that the scanner images were taken one below the other successively with the aim of scanning up and down or vice versa a presize area of the lungs.

So the position of the images relative to the body are as follows:

<img src="https://i.postimg.cc/9fPxhkTw/Image-position.png"  width="300" height="1500">

Forgive me for this abominable drawing but the idea is there ^_^'.

now that we know the position of the images in relation to the human body, let's look at the direction.

If we display the axes of the director cosine on each position of the image we get this.

In [None]:
tmp1 = Data[['ImagePositionPatient_x','ImagePositionPatient_y', 'ImagePositionPatient_z', 'ImageOrientationPatient_a','ImageOrientationPatient_b', 'ImageOrientationPatient_c']]
tmp1.columns = ['x','y','z','a','b','c']

tmp1['Cos'] = 'red'
tmp2 = Data[['ImagePositionPatient_x','ImagePositionPatient_y', 'ImagePositionPatient_z', 'ImageOrientationPatient_d','ImageOrientationPatient_e', 'ImageOrientationPatient_f']]
tmp2.columns = ['x','y','z','a','b','c']
tmp2['Cos'] = 'blue'

cos = pd.concat([tmp1, tmp2], ignore_index = True)
cos['width'] = 10

cos[['a','b','c']] = cos[['a','b','c']] * 200

fig = plt.figure()
ax = fig.gca(projection='3d')
ax.view_init(60, 35)
ax.quiver(cos['x'], cos['y'], cos['z'], cos['a'], cos['b'], cos['c'], length=0.1, colors = cos['Cos'])

plt.show()



In [None]:
fig = plt.figure()
ax = fig.gca(projection='3d')

ax.quiver(cos['x'], cos['y'], cos['z'], cos['a'], cos['b'], cos['c'], length=0.1, colors = cos['Cos'])

plt.show()

We notice very well that all the images go in a direction anterior to the chest towards the lungs of the patient which is logical.

<img src="https://i.postimg.cc/hGPw0DT2/direction1.png" width="700" height="500"> 

Let's take a look at the vectors that indicate the direction of the image.

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=3)
Data['ImageOrientationPatient_a'].plot.hist(title  = 'Alpha 1', ax=axes[0,0])
Data['ImageOrientationPatient_b'].plot.hist(title  = 'Beta 1', ax=axes[0,1])
Data['ImageOrientationPatient_c'].plot.hist(title  = 'Gamma 1', ax=axes[0,2])
Data['ImageOrientationPatient_d'].plot.hist(title  = 'Alpha 2', ax=axes[1,0])
Data['ImageOrientationPatient_e'].plot.hist(title  = 'Beta 2', ax=axes[1,1])
Data['ImageOrientationPatient_f'].plot.hist(title  = 'Gamma 2', ax=axes[1,2])

We can clearly see that the only two axes which influence the direction of the image are alpha 1 and beta 2 which explains the converging direction of the images

## 11. Position Reference Indicator <a id="2.11"></a>

<div class="external-reference"><div>
<div>
<div>
<div>
<h6>
C.7.4.1.1.2&nbsp;Position Reference Indicator</h6>
</div>
</div>
</div>
<p>
The Position Reference Indicator (0020,1040) specifies the part of the imaging target that was used as a reference point associated with a specific Frame of Reference UID. The Position Reference Indicator may or may not coincide with the origin of the fixed Frame of Reference related to the Frame of Reference UID.</p>
<p>
For a Patient-related Frame of Reference, this is an anatomical reference point such as the iliac crest, orbital-medial, sternal notch, symphysis pubis, xiphoid, lower costal margin, or external auditory meatus, or a fiducial marker placed on the patient. The Patient-Based Coordinate System is described in <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.6.2.html#sect_C.7.6.2.1.1">Section&nbsp;C.7.6.2.1.1</a>.</p>
<p>
For a slide-related Frame of Reference, this is the slide corner as specified in <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.12.2.html#sect_C.8.12.2.1">Section&nbsp;C.8.12.2.1</a> and shall be identified in this Attribute with the value "SLIDE_CORNER". The slide-based coordinate system is described in <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.12.2.html#sect_C.8.12.2.1">Section&nbsp;C.8.12.2.1</a>.</p>
<p>
For an Ophthalmic Coordinate System, the Frame of Reference is based upon the corneal vertex. The corneal vertex is determined by the measuring instrument and shall be identified in this Attribute with the value CORNEAL_VERTEX_R (for the right eye) or CORNEAL_VERTEX_L (for the left eye). The Ophthalmic Coordinate System is described in <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.30.3.html#sect_C.8.30.3.1.4">Section&nbsp;C.8.30.3.1.4</a>.</p>
<p>
The Position Reference Indicator shall be used only for annotation purposes and is not intended to be used as a mathematical spatial reference.</p>
<div>
<h3>Note</h3>
<p>
The Position Reference Indicator may be encoded as zero length when it has no meaning, for example, when the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.7.4.html#sect_C.7.4.1">Frame of Reference Module</a> is required to relate mammographic images of the breast acquired without releasing breast compression, but where there is no meaningful anatomical reference point as such.</p>
</div>
</div></div>

<a href="https://dicom.innolitics.com/ciods/tractography-results/frame-of-reference/00201040">Position Reference Indicator</a>

In [None]:
tmp = Data.PositionReferenceIndicator.value_counts().to_frame().reset_index()
tmp.columns = ['Position Reference Indicator','Count']

sns.barplot(x="Position Reference Indicator", y="Count", data=tmp)

The majority of Position Reference Indicator are of no specific value.

## 12. Slice Location Attribute <a id="2.12"></a>

Slice Location is defined as the relative position of the image plane expressed in mm. This information is relative to an unspecified implementation specific reference point.

<a href="https://dicom.innolitics.com/ciods/ct-image/image-plane/00201041">Slice Location Attribute</a>

In [None]:
# Data.SliceLocation.astype(float).value_counts()

tmp = Data['SliceLocation'].astype('float').value_counts().to_frame().reset_index().sort_values(by = 'index')
sns.set(rc={'figure.figsize':(20,10)})
sns.distplot(tmp["SliceLocation"])

ax2 = plt.axes([0.7, 0.5, .15, .3], facecolor='y')
ax2 = sns.violinplot(y=tmp["SliceLocation"],  ax=ax2)

The image does not deviate too much from its origin, it is between 0 mm and 200 mm

## 13. Rows & Columns <a id="2.13"></a>

Rows :

Number of rows in the image.

Shall be an exact multiple of the vertical downsampling factor if any of the samples (planes) are encoded downsampled in the vertical direction for pixel data encoded in a Native (uncompressed) format. E.g., required to be an even value for a Photometric Interpretation (0028,0004) of YBR_FULL_422.

Columns :

Number of columns in the image.

Shall be an exact multiple of the horizontal downsampling factor if any of the samples (planes) are encoded downsampled in the horizontal direction for pixel data encoded in a Native (uncompressed) format. E.g., required to be an even value for a Photometric Interpretation (0028,0004) of YBR_FULL_422.


<a href="https://dicom.innolitics.com/ciods/mr-image/image-pixel/00280010">Rows & Columns</a>

In [None]:
sns.jointplot(x="Rows", y="Columns", data=Data[['Rows','Columns']].astype('int'), kind='reg',joint_kws={'color':'green'})

There is a strong correlation between columns and rows except in a few cases where the images are rectangular

## 14. Pixel Spacing Attribute <a id="2.14"></a>

Physical distance in the patient between the center of each pixel, specified by a numeric pair - adjacent row spacing (delimiter) adjacent column spacing in mm.

<div class="external-reference"><div>
<div>
<div>
<div>
<h4>
10.7.1.3&nbsp;Pixel Spacing Value Order and Valid Values</h4>
</div>
</div>
</div>
<p>
All pixel spacing related Attributes are encoded as the physical distance between the centers of each two-dimensional pixel, specified by two numeric values.</p>
<p>
The first value is the row spacing in mm, that is the spacing between the centers of adjacent rows, or vertical spacing.</p>
<p>
The second value is the column spacing in mm, that is the spacing between the centers of adjacent columns, or horizontal spacing.</p>
<p>
To illustrate, consider the example shown in <a href="http://dicom.nema.org/medical/dicom/current/output/html/part03.html#figure_10.7.1.3-1">Figure&nbsp;10.7.1.3-1</a>.</p>
<p>
</p>
<div>
<div>
<div>
<img src="http://dicom.nema.org/medical/dicom/current/output/html/figures/PS3.3_10.7.1.3-1.svg">
</div>
</div>
<p>
<strong>Figure&nbsp;10.7.1.3-1.&nbsp;Example of Pixel Spacing Value Order</strong>
</p>
</div>
<p>
<br>
</p>
<p>
Pixel Spacing = Row Spacing \ Column Spacing = 0.30\0.25.</p>
<p>
All pixel spacing related Attributes shall have positive non-zero values, except when there is only a single row or column or pixel of data present, in which case the corresponding value may be zero.</p>
<div>
<h3>Note</h3>
<p>
A single row or column or "pixel" may occur in MR Spectroscopy Instances.</p>
</div>
<p>
This description applies to:</p>
<div>
<ul>
<li>
<p>
Pixel Spacing (0028,0030)</p>
</li>
<li>
<p>
Imager Pixel Spacing (0018,1164)</p>
</li>
<li>
<p>
Nominal Scanned Pixel Spacing (0018,2010)</p>
</li>
<li>
<p>
Image Plane Pixel Spacing (3002,0011)</p>
</li>
<li>
<p>
Compensator Pixel Spacing (300A,00E9)</p>
</li>
<li>
<p>
Detector Element Spacing (0018,7022)</p>
</li>
<li>
<p>
Presentation Pixel Spacing (0070,0101)</p>
</li>
<li>
<p>
Printer Pixel Spacing (2010,0376)</p>
</li>
<li>
<p>
Object Pixel Spacing in Center of Beam (0018,9404)</p>
</li>
</ul>
</div>
</div></div>

<a href="https://dicom.innolitics.com/ciods/ct-image/image-plane/00280030#:~:text=All%20pixel%20spacing%20related%20Attributes,adjacent%20rows%2C%20or%20vertical%20spacing.">Pixel Spacing Attribute</a>

In [None]:
sns.jointplot(x="PixelSpacing_row", y="PixelSpacing_column", data=Data[['PixelSpacing_row','PixelSpacing_column']].astype('float'),kind='reg',joint_kws={'color':'green'})

Row spacing and column spacing are same

## 15. Bits Stored & High Bit <a id="2.15"></a>

Bits Stored is the number of bits stored for each pixel sample. Each sample shall have the same number of bits stored.

<a href="https://dicom.innolitics.com/ciods/ct-image/image-pixel/00280101">Bits Stored Attribute</a>

High Bit is the most significant bit for pixel sample data. Each sample shall have the same high bit. High Bit (0028,0102) shall be one less than Bits Stored (0028,0101).

<a href="https://dicom.innolitics.com/ciods/us-image/image-pixel/00280102">High Bit</a>

In [None]:
plt.figure(figsize=(30,10))

plt.subplot(1,2,1)
tmp = Data.BitsStored.value_counts().to_frame().reset_index()
tmp.columns = ['Bits Stored','Count']
sns.barplot(x="Bits Stored", y="Count", data=tmp)

plt.subplot(1,2,2)
tmp = Data.HighBit.value_counts().to_frame().reset_index()
tmp.columns = ['High Bit','Count']
sns.barplot(x="High Bit", y="Count", data=tmp)

In [None]:
sns.jointplot(x="HighBit", y="BitsStored", data=Data[['HighBit','BitsStored']].astype('float'), kind='reg',joint_kws={'color':'green'})

BitsStored = BitsStored +1

## 16. Pixel Representation Attribute <a id="2.16"></a>

Data representation of the pixel samples. Each sample shall have the same pixel representation.

<a href="https://dicom.innolitics.com/ciods/us-image/image-pixel/00280103">Pixel Representation Attribute</a>

In [None]:
tmp = Data.PixelRepresentation.value_counts().to_frame().reset_index()
tmp.columns = ['Pixel Representation','Count']
sns.barplot(x="Pixel Representation", y="Count", data=tmp)

## 17. Window Center & Window Width <a id="2.17"></a>

Widow Center attribute Defines a Window Center for display.

See Section C.8.11.3.1.5 for further explanation.

Required if Presentation Intent Type (0008,0068) is FOR PRESENTATION and VOI LUT Sequence (0028,3010) is not present. May also be present if VOI LUT Sequence (0028,3010) is present.

Window Width attribute Defines the Window Width for display. See Section C.8.11.3.1.5 for further explanation.

Required if Window Center (0028,1050) is present.

* <a href="https://dicom.innolitics.com/ciods/digital-x-ray-image/dx-image/00281050">Window Center </a>
* <a href="https://dicom.innolitics.com/ciods/digital-x-ray-image/dx-image/00281051">Window Width  </a>

<div class="external-reference"><div>
<div>
<div>
<div>
<h6>
C.8.11.3.1.5&nbsp;VOI Attributes</h6>
</div>
</div>
</div>
<p>
The Attributes of the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.11.2.html#sect_C.11.2">VOI LUT Module</a> are specialized in the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.11.3.html#sect_C.8.11.3">DX Image Module</a>.</p>
<p>
Window Center (0028,1050) and Window Width (0028,1051) specify a linear conversion (unless otherwise specified by the value of VOI LUT Function (0028,1056); See <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.11.2.html#sect_C.11.2.1.3">Section&nbsp;C.11.2.1.3</a>) from the output of the (conceptual) Modality LUT values to the input to the (conceptual) Presentation LUT. Window Center contains the value that is the center of the window. Window Width contains the width of the window.</p>
<p>
The application of Window Center (0028,1050) and Window Width (0028,1051) shall not produce a signed result.</p>
<div>
<h3>Note</h3>
<p>
If the Presentation LUT Shape (2050,0020) is IDENTITY, then the result of applying Window Center (0028,1050) and Window Width (0028,1051) is P-Values.</p>
</div>
<p>
If multiple values are present, both Attributes shall have the same number of values and shall be considered as pairs. Multiple values indicate that multiple alternative views should be presented.</p>
<p>
The VOI LUT Sequence specifies a (potentially non-linear) conversion from the output of the (conceptual) Modality LUT values to the input to the (conceptual) Presentation LUT.</p>
<p>
If multiple Items are present in VOI LUT Sequence (0028,3010), only one shall be applied. Multiple Items indicate that multiple alternative views should be presented.</p>
<p>
If any VOI LUT Attributes are included by an Image, a Window Width and Window Center or the VOI LUT Table, but not both, shall be applied to the Image for display. Inclusion of both indicates that multiple alternative views should be presented.</p>
<p>
The three values of LUT Descriptor (0028,3002) describe the format of LUT Data (0028,3006).</p>
<p>
The first value is the number of entries in the lookup table.</p>
<p>
The second value is the first stored pixel value mapped. This pixel value is mapped to the first entry in the LUT. All image pixel values less than the first value mapped are also mapped to the first entry in the LUT Data. An image pixel value one greater than the first value mapped is mapped to the second entry in the LUT Data. Subsequent image pixel values are mapped to the subsequent entries in the LUT Data up to an image pixel value equal to number of entries + first value mapped - 1 that is mapped to the last entry in the LUT Data. Image pixel values greater than number of entries + first value mapped are also mapped to the last entry in the LUT Data.</p>
<p>
The third value specifies the number of bits for each entry in the LUT Data (analogous to "bits stored"). It shall be between 10-16. The LUT Data shall be stored in a format equivalent to 16 "bits allocated" and "high bit" equal to "bits stored" - 1. The third value conveys the range of LUT entry values. These unsigned LUT entry values shall range between 0 and 2<sup>n</sup>-1, where n is the third value of the LUT Descriptor.</p>
<div>
<h3>Note</h3>
<div>
<ol type="1">
<li>
<p>
The third value is restricted in the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.11.2.html#sect_C.11.2">VOI LUT Module</a> to 8 or 16 but is specialized here.</p>
</li>
<li>
<p>
The first and second values are not specialized and are the same as in the <a href="http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.11.2.html#sect_C.11.2">VOI LUT Module</a>.</p>
</li>
</ol>
</div>
</div>
<p>
LUT Data (0028,3006) contains the LUT entry values.</p>
</div></div>



In [None]:
plt.figure(figsize=(30,10))

plt.subplot(1,2,1)

tmp = Data.WindowCenter.value_counts().to_frame().reset_index()
tmp.columns = ['Window Center','Count']
sns.barplot(x="Window Center", y="Count", data=tmp)

plt.subplot(1,2,2)

tmp = Data.WindowWidth.value_counts().to_frame().reset_index()
tmp.columns = ['Window Width','Count']
sns.barplot(x="Window Width", y="Count", data=tmp)

In [None]:
sns.jointplot(x="WindowCenter", y="WindowWidth", data=Data[Data['WindowCenter'] != '[-500, 40]'][['WindowCenter','WindowWidth']].astype('float'), )

In [None]:
tmp = Data.groupby(['WindowCenter','WindowWidth']).count().reset_index()[['WindowCenter','WindowWidth','ImageType']]
tmp.columns = ['WindowCenter','WindowWidth','Count']
tmp['Window Center and Width'] = tmp['WindowCenter'] + ' | ' + tmp['WindowWidth']
sns.barplot(x="Window Center and Width", y="Count", data=tmp)

## 18. Rescale Intercept & Rescale Slope <a id="2.18"></a>

Rescale Intercept is the value b in relationship between stored values (SV) and the output units.

Rescale Slope is m in the equation specified in Rescale Intercept (0028,1052).

Output units = m*SV+b

* <a href="https://dicom.innolitics.com/ciods/ct-image/ct-image/00281052">Rescale Intercept </a>
* <a href="https://dicom.innolitics.com/ciods/digital-x-ray-image/dx-image/00281053">Rescale Slope  </a>

In [None]:
plt.figure(figsize=(30,10))

plt.subplot(1,2,1)

tmp = Data.RescaleIntercept.astype('float').value_counts().to_frame().reset_index()
tmp.columns = ['Rescale Intercept','Count']
sns.barplot(x="Rescale Intercept", y="Count", data=tmp)

plt.subplot(1,2,2)

tmp = Data.RescaleSlope.astype('float').value_counts().to_frame().reset_index()
tmp.columns = ['Rescale Slope','Count']
sns.barplot(x="Rescale Slope", y="Count", data=tmp)

Most of the time the value b of the equation is negative which reduces the tail of the output

## 19. Images <a id="2.19"></a>

let's take a look at the images

In [None]:
img_array = []

for filename in glob.glob('../input/osic-pulmonary-fibrosis-progression/train/ID00061637202188184085559/*.dcm'):
    
    img = pydicom.dcmread(filename)
    img_array.append(img.pixel_array)
imageio.mimsave('movie.gif', img_array)

HTML('<img src="./movie.gif">')

Let's take a look of all images from random train patient.

In [None]:
plt.figure(figsize=(20,15))
i=1
for filename in glob.glob('../input/osic-pulmonary-fibrosis-progression/train/ID00048637202185016727717/*.dcm'): 
    plt.subplot(5,6,i)
    plt.grid(False)
    plt.imshow(pydicom.dcmread(filename).pixel_array, cmap=plt.cm.bone)
    i = i + 1

Do the same thing with test dicoms

In [None]:
plt.figure(figsize=(30,15))
i=1
for filename in glob.glob('../input/osic-pulmonary-fibrosis-progression/test/ID00421637202311550012437/*.dcm'): 
    plt.subplot(6,11,i)
    plt.grid(False)
    plt.imshow(pydicom.dcmread(filename).pixel_array, cmap=plt.cm.bone)
    i = i + 1

Now let's apply some effects and see what happen

In [None]:
plt.figure(figsize=(20,10))
img = '../input/osic-pulmonary-fibrosis-progression/train/ID00007637202177411956430/18.dcm'

plt.subplot(1,2,1)
plt.grid(False)
plt.imshow(pydicom.dcmread(img).pixel_array, cmap=plt.cm.bone)
plt.title("Original")

plt.subplot(1,2,2)
plt.grid(False)
test = cv2.bitwise_not(pydicom.dcmread(img).pixel_array)
plt.title("invert the image")

plt.imshow(test, cmap=plt.cm.bone)

In [None]:
image = pydicom.dcmread(img).pixel_array


imageio.imwrite('img.jpg', image)
image = imageio.imread('./img.jpg')

plt.figure(figsize=(30, 30))
plt.subplot(3, 2, 1)
plt.grid(False)
plt.title("Original")
plt.imshow(image, cmap=plt.cm.bone)

ret,thresh1 = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY)

plt.subplot(3, 2, 2)
plt.grid(False)
plt.title("Threshold Binary")
plt.imshow(thresh1, cmap=plt.cm.bone)

# image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# image = np.array(image, dtype=np.uint8)
# image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3, 3), 0)
# print(image)
# image = image.reshape(768, 768, 1)
thresh = cv2.adaptiveThreshold(image, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 3, 5) 

plt.subplot(3, 2, 3)
plt.grid(False)
plt.title("Adaptive Mean Thresholding")
plt.imshow(thresh, cmap=plt.cm.bone)


_, th2 = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

plt.subplot(3, 2, 4)
plt.grid(False)
plt.title("Otsu's Thresholding")
plt.imshow(th2, cmap=plt.cm.bone)


plt.subplot(3, 2, 5)
plt.grid(False)
blur = cv2.GaussianBlur(image, (5,5), 0)
_, th3 = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
plt.title("Guassian Otsu's Thresholding")
plt.imshow(th3, cmap=plt.cm.bone)
plt.show()

In [None]:
image = pydicom.dcmread(img).pixel_array

plt.figure(figsize=(20, 20))
plt.subplot(3, 2, 1)
plt.grid(False)
plt.title("Original")
plt.imshow(image, cmap=plt.cm.bone)


# Let's define our kernel size
kernel = np.ones((5,5), np.uint8)

# Now we erode
erosion = cv2.erode(image, kernel, iterations = 1)

plt.subplot(3, 2, 2)
plt.grid(False)
plt.title("Erosion")
plt.imshow(erosion, cmap=plt.cm.bone)

# 
dilation = cv2.dilate(image, kernel, iterations = 1)
plt.subplot(3, 2, 3)
plt.grid(False)
plt.title("Dilation")
plt.imshow(dilation, cmap=plt.cm.bone)


# Opening - Good for removing noise
opening = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
plt.subplot(3, 2, 4)
plt.grid(False)
plt.title("Opening")
plt.imshow(opening, cmap=plt.cm.bone)


# Closing - Good for removing noise
closing = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel)
plt.subplot(3, 2, 5)
plt.grid(False)
plt.title("Closing")
plt.imshow(closing, cmap=plt.cm.bone)


In [None]:
# image = pydicom.dcmread(img).pixel_array
image = imageio.imread('./img.jpg')

height, width = image.shape

# Extract Sobel Edges
sobel_x = cv2.Sobel(image, cv2.CV_64F, 0, 1, ksize=5)
sobel_y = cv2.Sobel(image, cv2.CV_64F, 1, 0, ksize=5)

plt.figure(figsize=(20, 20))

plt.subplot(3, 2, 1)
plt.grid(False)
plt.title("Original")
plt.imshow(image, cmap=plt.cm.bone)

plt.subplot(3, 2, 2)
plt.grid(False)
plt.title("Sobel X")
plt.imshow(sobel_x, cmap=plt.cm.bone)


plt.subplot(3, 2, 3)
plt.grid(False)
plt.title("Sobel Y")
plt.imshow(sobel_y, cmap=plt.cm.bone)

sobel_OR = cv2.bitwise_or(sobel_x, sobel_y)

plt.subplot(3, 2, 4)
plt.grid(False)
plt.title("sobel_OR")
plt.imshow(sobel_OR, cmap=plt.cm.bone)

laplacian = cv2.Laplacian(image, cv2.CV_64F)

plt.subplot(3, 2, 5)
plt.grid(False)
plt.title("Laplacian")
plt.imshow(laplacian, cmap=plt.cm.bone)

# image = np.array(image*255, dtype=np.uint8)
canny = cv2.Canny(image, 50, 120)

plt.subplot(3, 2, 6)
plt.grid(False)
plt.title("Canny")
plt.imshow(canny, cmap=plt.cm.bone)


In [None]:
# image = pydicom.dcmread(img).pixel_array
image = imageio.imread('./img.jpg')

plt.figure(figsize=(20, 20))

plt.subplot(2, 2, 1)
plt.grid(False)
plt.title("Original")
plt.imshow(image, cmap=plt.cm.bone)


# Grayscale
# gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

# Find Canny edges
edged = cv2.Canny(image, 30, 200)

plt.subplot(2, 2, 2)
plt.grid(False)
plt.title("Canny Edges")
plt.imshow(edged, cmap=plt.cm.bone)


# Finding Contours
# Use a copy of your image e.g. edged.copy(), since findContours alters the image
contours, hierarchy = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

plt.subplot(2, 2, 3)
plt.grid(False)
plt.title("Canny Edges After Contouring")
plt.imshow(edged, cmap=plt.cm.bone)

print("Number of Contours found = " + str(len(contours)))

# Draw all contours
# Use '-1' as the 3rd parameter to draw all
cv2.drawContours(image, contours, -1, (0,255,0), 3)

plt.subplot(2, 2, 4)
plt.grid(False)
plt.title("Contours")
plt.imshow(image, cmap=plt.cm.bone)

# Train and Test <a id="3"></a>

In [None]:
train = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/train.csv')
test = pd.read_csv('../input/osic-pulmonary-fibrosis-progression/test.csv')
data = pd.concat([train, test], ignore_index = True)

ProfileReport(data)

## 1. Smoking Status <a id="3.1"></a>

In [None]:
f, axes = plt.subplots(2 ,figsize=(30, 10), sharex=True)
# plt.subplot(1,2,1);
sns.lineplot(hue="SmokingStatus", x="Weeks", y = 'FVC', data = data, ax=axes[0])
# subplot(1,2,2);
sns.lineplot(hue="SmokingStatus", x="Weeks", y = 'Percent',  data = data, ax=axes[1])


Smokers clearly have more FVC and percent, however ex-smokers are slightly more affected by FVC but the first few weeks only until week 70 or so and after that they chauvinate with non-smokers.

The percentage of ex-smokers and non-smokers is equivalent.

In [None]:
f, axes = plt.subplots(3, figsize=(10, 20), sharex=True)
sns.violinplot(x="SmokingStatus", y="Age", data=data, ax=axes[0])


tmp = data.groupby(['SmokingStatus', 'Sex']).count()['Patient'].reset_index()
tmp.columns= ['Smoking Status', 'Sex', 'Count']

sns.barplot(x="Smoking Status", y="Count", hue="Sex", data=tmp, ax=axes[1])

tmp = data.groupby('SmokingStatus').count()['Patient'].reset_index()
tmp.columns= ['Smoking Status', 'Count']
sns.barplot(x="Smoking Status", y="Count", data= tmp, ax=axes[2])

* Smokers are generally elderly, non-smokers are less so, and ex-smokers are among the least aged.
* there are more male smokers than female smokers.
* Ex-smokers are many followed by non-smokers and finally ex-smokers are minority.

## 2. Sex <a id="3.2"></a>

In [None]:
f, axes = plt.subplots(2 ,figsize=(30, 10), sharex=True)
# plt.subplot(1,2,1);
sns.lineplot(hue="Sex", x="Weeks", y = 'FVC', data = data, ax=axes[0])
# subplot(1,2,2);
sns.lineplot(hue="Sex", x="Weeks", y = 'Percent',  data = data, ax=axes[1])

Women clearly have more FVC than men but the percentage remains broadly equivalent.

In [None]:
f, axes = plt.subplots(3, figsize=(10, 20), sharex=True)
sns.violinplot(x="Sex", y="Age", data=data, ax=axes[0])


tmp = data.groupby(['Sex', 'SmokingStatus']).count()['Patient'].reset_index()
tmp.columns= ['Sex', 'Smoking Status', 'Count']

sns.barplot(x="Sex", y="Count", hue="Smoking Status", data=tmp, ax=axes[1])

tmp = data.groupby('Sex').count()['Patient'].reset_index()
tmp.columns= ['Sex', 'Count']
sns.barplot(x="Sex", y="Count", data= tmp, ax=axes[2])

* The majority of women are older than men.
* There are more smokers and ex-smokers among men than women.
* There are alsom more men then womens.

## 3. Age <a id="3.3"></a>

In [None]:
tmp = pd.cut(data.Age, 3).to_frame().merge(data,left_index=True, right_index=True)
tmp.columns = ['Age_Range', 'Patient', 'Weeks', 'FVC', 'Percent','Age', 'Sex', 'SmokingStatus']

f, axes = plt.subplots(2 ,figsize=(30, 10), sharex=True)
# plt.subplot(1,2,1);
sns.lineplot(hue="Age_Range", x="Weeks", y = 'FVC', data = tmp, ax=axes[0])
# subplot(1,2,2);
sns.lineplot(hue="Age_Range", x="Weeks", y = 'Percent',  data = tmp, ax=axes[1])

# sns.distplot(data.Age)

If we look closely we notice that people aged between 62 and 75 years are more affected by CVF but the percentage is equivalent for all.

In [None]:
sns.distplot(data.Age)

The majority are between 65 and 75 years old.

## 4. FVC & Percentage <a id="3.4"></a>

In [None]:
# sns.lineplot( x="FVC", y = 'Percent',  data = tmp)

f, axes = plt.subplots(2 ,figsize=(30, 10), sharex=True)
# plt.subplot(1,2,1);
sns.lineplot( x="Weeks", y = 'FVC', data = tmp, ax=axes[0])
# subplot(1,2,2);
sns.lineplot( x="Weeks", y = 'Percent',  data = tmp, ax=axes[1])

# sns.distplot(data.Age)

Hum, High correlation between FVC and Percent.

In [None]:
sns.jointplot(x="FVC", y="Percent", data=data, kind='reg',
                  joint_kws={'line_kws':{'color':'green'}})

Indeed a positive correlation exists.

## 5. Heatmap <a id="3.5"></a>

In [None]:
corr = data[['Weeks','FVC','Percent','Age','Sex','SmokingStatus']].corr()

mask = np.triu(np.ones_like(corr, dtype=np.bool))

f, ax = plt.subplots(figsize=(11, 9))

cmap = sns.diverging_palette(220, 10, as_cmap=True)

sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5}, annot=True)

In [None]:
sns.pairplot(data[['Weeks','FVC','Percent','Age','Sex','SmokingStatus']])

# Conclusion <a id="4"></a>

It was my analysis on the csv data as well as the dicom files, it allowed me to have an intuition on the data to start the modeling.

If this is the case for you please drop an upvote it will help me a lot.

The next step is to train an LGBM and see if the performance is there before starting the CNN.

See you soon =)