<h1 style="color:green">What, Why and How?</h1>

<h3 style="color:gray">Pulomonary Embolism</h3>

![](https://upload.wikimedia.org/wikipedia/commons/7/77/SaddlePE.PNG)

Pulmonary embolism (PE) is a blockage of an artery in the lungs by a substance that has moved from elsewhere in the body through the bloodstream (embolism). Symptoms of a PE may include shortness of breath, chest pain particularly upon breathing in, and coughing up blood. Symptoms of a blood clot in the leg may also be present, such as a red, warm, swollen, and painful leg. Signs of a PE include low blood oxygen levels, rapid breathing, rapid heart rate, and sometimes a mild fever. Severe cases can lead to passing out, abnormally low blood pressure, and sudden death.

<h3 style="color:gray">How it happens?</h3>

PE usually results from a blood clot in the leg that travels to the lung. The risk of blood clots is increased by cancer, prolonged bed rest, smoking, stroke, certain genetic conditions, estrogen-based medication, pregnancy, obesity, and after some types of surgery. A small proportion of cases are due to the embolization of air, fat, or amniotic fluid.

<strong>In this competition, we are predicting the existence and characteristics of pulmonary embolisms.</strong>

<strong style="color:red">If you liked this notebook, don't forget to leave an upvote or a comment!</strong>

In [None]:
! pip install -q dabl

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pydicom as dcm
import glob
import random
import warnings
from tqdm.notebook import tqdm
from colorama import Fore, Style
import os

import dabl

import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import iplot

import matplotlib.animation as animation
from matplotlib.widgets import Slider
from IPython.display import HTML, Image

warnings.simplefilter("ignore")

In [None]:
def cout(string: str, color: str) -> str:
    """
    Prints a string in the required color
    """
    print(color+string+Style.RESET_ALL)
    
def read_image(filename: str) -> np.ndarray:
    """
    Read a DICOM Image File and return it (as a numpy array)
    """
    img = dcm.dcmread(filename).pixel_array
    img[img == -2000] = 0
    return img

def plot_dicom(image_list, rows=5, cols=4, cmap='jet', is_train=True):
    fig = plt.figure(figsize=(12, 12))
    if is_train:
        plt.title(f"DICOM Images from Training Set")
    else:
        plt.title(f"DICOM Images from Testing Set")
    img_count = 0
    for i in range(1, rows*cols+1):
        filename = image_list[img_count]
        image = read_image(filename)
        fig.add_subplot(rows, cols, i)
        plt.grid(False)
        plt.imshow(image, cmap=cmap)
        img_count += 1

In [None]:
train = pd.read_csv("../input/rsna-str-pulmonary-embolism-detection/train.csv")
test = pd.read_csv("../input/rsna-str-pulmonary-embolism-detection/test.csv")
sub = pd.read_csv("../input/rsna-str-pulmonary-embolism-detection/sample_submission.csv")

In [None]:
%%time
train_files = glob.glob("../input/rsna-str-pulmonary-embolism-detection/train/*/*/*.dcm")
test_files = glob.glob("../input/rsna-str-pulmonary-embolism-detection/test/*/*/*.dcm")

<h1 style="color:blue">Exploratory Data Analysis</h1>

Let's start with EDA. We'll cover both training and testing sets.

In [None]:
cout(f"Total Number of DICOM Images in Training Set: {len(train_files)}", Fore.GREEN)
cout(f"Total Number of DICOM Images in Testing Set:  {len(test_files)}", Fore.YELLOW)

<h2 style="color:aqua">Peaking at the DataFrames</h2>

Let's start by taking a look at the dataframes of train, test and submission sets.

In [None]:
train.head()   

In [None]:
test.head()

In [None]:
sub.head()

In [None]:
train.describe()

<h2 style="color:aqua">Feature Columns</h2>
At the hindsight it may appear that there are no features other than the images themselves, but we are provided with 4 Features that aren't used for prediction but only for helping our predictions.

These features are:

* **qa_motion** - Indicates whether radiologists noted an issue with motion in the study.
* **qa_contrast** - Indicates whether radiologists noted an issue with contrast in the study.
* **true_filling_defect_not_pe** - Indicates a defect that is NOT PE.
* **flow_artifact**

In [None]:
features = train[['qa_motion', 'qa_contrast', 'true_filling_defect_not_pe', 'flow_artifact']]
features.head()

<h3 style="color:yellow">Issue with Motion? (qa_motion)</h3>
This feature indicates if there was an issue with the motion noted by radiologists in the study.

In [None]:
vals = features['qa_motion'].value_counts().tolist()
idx = ['No Issue', 'Issue']
fig = px.pie(
    values=vals,
    names=idx,
    title='Issue with Motion in Studies',
    color_discrete_sequence=['blue', 'cyan']
)
iplot(fig)

<h3 style="color:yellow">Issue with Contrast? (qa_contrast)</h3>
This feature indicates if there was an issue with the contrast noted by radiologists in the study.

In [None]:
vals = features['qa_contrast'].value_counts().tolist()
idx = ['No Issue', 'Issue']
fig = px.pie(
    values=vals,
    names=idx,
    title='Issue with Contrast in Studies',
    color_discrete_sequence=['gold', 'yellow']
)
iplot(fig)

<h3 style="color:yellow">Indicates if a Defect isn't PE? (true_filling_defect_not_pe)</h3>
This feature indicates if the defect is a PE or not.

In [None]:
vals = features['true_filling_defect_not_pe'].value_counts().tolist()
idx = ['Defect not PE', 'Defect is PE']
fig = px.pie(
    values=vals,
    names=idx,
    title='Is defect PE or Not',
    color_discrete_sequence=['black', 'gray']
)
iplot(fig)

<h3 style="color:yellow">flow_artifact</h3>

In [None]:
vals = features['flow_artifact'].value_counts().tolist()
idx = ['Is an Artifact', 'Is not an Artifact']
fig = px.pie(
    values=vals,
    names=idx,
    title='Flow Artifact',
)
iplot(fig)

Note: I couldn't find what `flow_artifact` feature is about, please correct me if I am wrong about it.

<h2 style="color:aqua">Target Columns</h2>

The columns that remain after we disclude the feature columns are target columns. Here are the details of what different target columns mean:

* **negative_exam_for_pe** - exam-level, whether there are any images in the study that have PE present.
* **rv_lv_ratio_gte_1** - exam-level, indicates whether the RV/LV ratio present in the study is >= 1
* **rv_lv_ratio_lt_1** - exam-level, indicates whether the RV/LV ratio present in the study is < 1
* **leftsided_pe** - exam-level, indicates that there is PE present on the left side of the images in the study
* **chronic_pe** - exam-level, indicates that the PE in the study is chronic
* **rightsided_pe** - exam-level, indicates that there is PE present on the right side of the images in the study
* **acute_and_chronic_pe** - exam-level, indicates that the PE present in the study is both acute AND chronic
* **central_pe** - exam-level, indicates that there is PE present in the center of the images in the study
* **indeterminate** -exam-level, indicates that while the study is not negative for PE, an ultimate set of exam-level labels could not be created, due to QA issues

In [None]:
targets = train[['negative_exam_for_pe', 'rv_lv_ratio_gte_1', 'rv_lv_ratio_lt_1', 'leftsided_pe', 'chronic_pe', 'rightsided_pe', 'acute_and_chronic_pe', 'central_pe', 'indeterminate']]
targets.head()

<h3 style="color:yellow">DABL Plot</h3>

Let's plot some of the target columns using DABL

In [None]:
dabl.plot(targets, target_col='negative_exam_for_pe')

In [None]:
dabl.plot(targets, target_col='rv_lv_ratio_gte_1')

In [None]:
dabl.plot(targets, 'rightsided_pe')

<h2 style="color:aqua">DICOM Image Data Analysis</h2>

Let's now start doing the DICOM Image Data Analysis.

<h3 style="color:yellow">Inspect the Images</h3>

Let's start by just taking a look at a few images from training and testing sets

In [None]:
# Plot any-20 training images
train_imgs = train_files[:20]
plot_dicom(image_list=train_imgs, is_train=True, cmap='gray')

In [None]:
# Plot any-20 testing images
test_imgs = test_files[:20]
plot_dicom(image_list=test_imgs, is_train=False, cmap='gray')

<h3 style="color:yellow">Lung CT Animation</h3>

Finally, let's make Animation of different lung scan slices of a particular Study.

[This Notebook](https://www.kaggle.com/avloss/eda-with-animation) helped me in making the animation.

In [None]:
%%capture

ids = "../input/rsna-str-pulmonary-embolism-detection/train/6897fa9de148/2bfbb7fd2e8b/"

img_datas = []
for im in os.listdir(ids):
    meta = dcm.dcmread(os.path.join(ids, im))
    srl = meta.InstanceNumber
    data = meta.pixel_array
    data[data == -2000] = 0
    img_datas.append((srl, data))
    
img_datas.sort()
ims = []
fig = plt.figure()
for gg in img_datas:
    img_ = plt.imshow(gg[1], cmap='jet', animated=True)
    plt.axis("off")
    ims.append([img_])

ani = animation.ArtistAnimation(fig, ims, interval=1000//24, blit=False, repeat_delay=1000)

In [None]:
HTML(ani.to_jshtml())