# RSNA-STR Pulmonary Embolism Detection



##### File descriptions

- test - all test images
- train - all train images (note that your submission kernels will NOT have access to this set of images, so you must build your models elsewhere and incorporate them into your submissions)
- sample_submission.csv - contains rows for each UID+label combination that requires a prediction. Therefore it has a row for each image (for which you will be predicting the existence of a pulmonary embolism within the image) and row for each study+label that requires a study-level prediction.
- train.csv - contains UIDs and all labels.
- test.csv - contains UIDs.

### Data fields

- StudyInstanceUID - unique ID for each study (exam) in the data.
- SeriesInstanceUID - unique ID for each series within the study.
- SOPInstanceUID - unique ID for each image within the study (and data).
- pe_present_on_image - image-level, notes whether any form of PE is present on the image.
- negative_exam_for_pe - exam-level, whether there are any images in the study that have PE present.
- qa_motion - informational, indicates whether radiologists noted an issue with motion in the study.
- qa_contrast - informational, indicates whether radiologists noted an issue with contrast in the study.
- flow_artifact - informational
- rv_lv_ratio_gte_1 - exam-level, indicates whether the RV/LV ratio present in the study is >= 1
- rv_lv_ratio_lt_1 - exam-level, indicates whether the RV/LV ratio present in the study is < 1
- leftsided_pe - exam-level, indicates that there is PE present on the left side of the images in the study
- chronic_pe - exam-level, indicates that the PE in the study is chronic
- true_filling_defect_not_pe - informational, indicates a defect that is NOT PE
- rightsided_pe - exam-level, indicates that there is PE present on the right side of the images in the study
- acute_and_chronic_pe - exam-level, indicates that the PE present in the study is both acute AND chronic
- central_pe - exam-level, indicates that there is PE present in the center of the images in the study
- indeterminate -exam-level, indicates that while the study is not negative for PE, an ultimate set of exam-level labels could not be created, due to QA issues


In [None]:
# let us install gdcm library 
!conda install -c conda-forge gdcm -y

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


import os
import pydicom as dcm
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
import glob
import gdcm
from matplotlib import animation, rc

import matplotlib
%matplotlib inline
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import matplotlib.animation as animation
TRAIN_DIR = "../input/rsna-str-pulmonary-embolism-detection/train/"
files = glob.glob('../input/rsna-str-pulmonary-embolism-detection/train/*/*/*.dcm')

rc('animation', html='jshtml')


In [None]:
train = pd.read_csv('/kaggle/input/rsna-str-pulmonary-embolism-detection/train.csv')
test = pd.read_csv('/kaggle/input/rsna-str-pulmonary-embolism-detection/test.csv')

In [None]:
train.head()

In [None]:
test.head()


In [None]:
def bar_plot(column_name):
    ds = train[column_name].value_counts().reset_index()
    ds.columns = ['Values', 'Total Number']
    fig = px.bar(
        ds, 
        y='Values', 
        x="Total Number", 
        orientation='h', 
        title='Bar plot of: ' + column_name,
        width=600,
        height=400
    )
    fig.show()

In [None]:
col = train.columns
col

In [None]:
col[0+3]

# columns distribution 

In [None]:
len(col)-3

In [None]:
for i in range(len(col)-3):
    bar_plot(col[i+3])

# Columns and non-zero/zero samples

In [None]:
# drop the first column ('sig_id'), and 
df = train.drop(['StudyInstanceUID', 'SeriesInstanceUID', 'SOPInstanceUID'], axis=1).sum(axis=0).sort_values(ascending=False).reset_index()
df.head()

In [None]:

df.columns = ['column', 'nonzero_records']
fig = px.bar(
    df, 
    y='nonzero_records', 
    x='column', 
    orientation='v', 
    title='Columns and non zero samples', 
    height=500, 
    width=1000
)
fig.show()

# drop the first column ('sig_id') and count the 0s in 
df1 = train.drop(['StudyInstanceUID', 'SeriesInstanceUID', 'SOPInstanceUID'], axis=1).sum(axis=0).sort_values(ascending=False).reset_index()
df1.columns = ['column', 'zero_records']
df1['zero_records'] = len(train) -  df1['zero_records']
# plot the bar 

fig = px.bar(
    df1.head(50), 
    y='zero_records', 
    x='column', 
    orientation='v', 
    title='Columns with the zero samples ', 
    height=500, 
    width=1000
)
fig.show()

# Let us check correlation: 

In [None]:
corr = train.corr()
corr.style.background_gradient(cmap='coolwarm')

# Let us see the scans and save it as gif 

### Let's do animation (Inspired from https://www.kaggle.com/isaienkov/pulmonary-embolism-detection-eda)

## glob module used here: 
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order



In [None]:
scans = glob.glob('/kaggle/input/rsna-str-pulmonary-embolism-detection/train/*/*/')


In [None]:
def read_scan(path):
    fragments = glob.glob(path + '/*')
    
    slices = []
    for f in fragments:
        img = dcm.dcmread(f)
        img_data = img.pixel_array
        length = int(img.InstanceNumber)
        slices.append((length, img_data))
    slices.sort()
    return [s[1] for s in slices]

def animate(ims):
    fig = plt.figure(figsize=(11,11))
    plt.axis('off')
    im = plt.imshow(ims[0], cmap='gray')

    def animate_func(i):
        im.set_array(ims[i])
        return [im]

    anim = animation.FuncAnimation(fig, animate_func, frames = len(ims), interval = 1000//24)
    
    return anim

In [None]:
movie = animate(read_scan(scans[1]))

In [None]:
movie

In [None]:
movie.save('Test.gif', dpi=80, writer='imagemagick')