# RANZCR CLiP - Catheter and Line Position Challenge

![](https://storage.googleapis.com/kaggle-competitions/kaggle/23870/logos/header.png?t=2020-12-01-04-28-05)

This is a simple exploratory data analysis for the [RANZCR CLiP - Catheter and Line Position Challenge](https://www.kaggle.com/c/ranzcr-clip-catheter-line-classification).

In this competition, youâ€™ll detect the presence and position of catheters and lines on chest x-rays. Use machine learning to train and test your model on 40,000 images to categorize a tube that is poorly placed.

The dataset has been labelled with a set of definitions to ensure consistency with labelling:
* The normal category includes lines that were appropriately positioned and did not require repositioning.
* The borderline category includes lines that would ideally require some repositioning but would in most cases still function adequately in their current position.
* The abnormal category included lines that required immediate repositioning.

If successful, your efforts may help clinicians save lives. Earlier detection of malpositioned catheters and lines is even more important as COVID-19 cases continue to surge. Many hospitals are at capacity and more patients are in need of these tubes and lines. Quick feedback on catheter and line placement could help clinicians better treat these patients. Beyond COVID-19, detection of line and tube position will ALWAYS be a requirement in many ill hospital patients.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
import os
import ast

In [None]:
ETT_labels = ['ETT - Abnormal', 'ETT - Borderline', 'ETT - Normal']
NGT_labels = ['NGT - Abnormal', 'NGT - Borderline', 'NGT - Incompletely Imaged', 'NGT - Normal']
CVC_labels = ['CVC - Abnormal', 'CVC - Borderline', 'CVC - Normal']
SGC_labels = ['Swan Ganz Catheter Present']

# Train csv

### Columns
* StudyInstanceUID - unique ID for each image
* ETT - Abnormal - endotracheal tube placement abnormal
* ETT - Borderline - endotracheal tube placement borderline abnormal
* ETT - Normal - endotracheal tube placement normal
* NGT - Abnormal - nasogastric tube placement abnormal
* NGT - Borderline - nasogastric tube placement borderline abnormal
* NGT - Incompletely Imaged - nasogastric tube placement inconclusive due to imaging
* NGT - Normal - nasogastric tube placement borderline normal
* CVC - Abnormal - central venous catheter placement abnormal
* CVC - Borderline - central venous catheter placement borderline abnormal
* CVC - Normal - central venous catheter placement normal
* Swan Ganz Catheter Present
* PatientID - unique ID for each patient in the dataset

In [None]:
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv')
train.head()

# Train annotations

These are segmentation annotations for training samples that have them. They are included solely as additional information for competitors.

In [None]:
df_annot = pd.read_csv("../input/ranzcr-clip-catheter-line-classification/train_annotations.csv")
df_annot.head()

# But what are all these names?

In [None]:
def image_annotations(label_type, index):
    uid = df_annot[df_annot.label == label_type].StudyInstanceUID.values[index]
    row = df_annot[(df_annot.StudyInstanceUID == uid) & (df_annot.label == label_type)]      

    data = np.empty([0, 2])
    for i in row.data:
        row_data = np.array(ast.literal_eval(i))
        data = np.concatenate((data, row_data), axis=0)    
    
    image_path = f"../input/ranzcr-clip-catheter-line-classification/train/{uid}.jpg"
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (10,5))
    fig.suptitle(label_type, fontsize=20, y=1.05)
    ax1.imshow(image)
    ax2.imshow(image)
    ax2.scatter(data[:, 0], data[:, 1])

## 1- Endotracheal Tube Placement (ETT)

> Endotracheal tubes (ETT) are wide-bore plastic tubes that are inserted into the trachea to allow artificial ventilation. Tubes come in a variety of sizes and have a balloon at the tip to ensure that gastric contents are not aspirated into the lungs. Adult tubes are usually approximately 1 cm in diameter. Tubes have a radiopaque strip within them so that they are visible on radiographs.

In [None]:
label_type = ETT_labels[2]
image_annotations(label_type, 4)
label_type = ETT_labels[1]
image_annotations(label_type, 0)
label_type = ETT_labels[0]
image_annotations(label_type, 8)

## 2- Nasogastric Tube Placement (NGT)

> Nasopharyngeal airway tubes are commonly used adjunctory airway devices, primarily utilized as a temporary measure until more stable method of securing the airways (e.g. endotracheal intubation) can be performed. 
1. > 
> Nasopharyngeal airway tubes are made of soft plastic, and are introduced via the nose until the posterior pharynx is reached. As these devices do not trigger a gag reflex, they are particularly useful for alert patients, or in case of difficulty of opening the mouth (e.g. trismus) 1. 

In [None]:
label_type = NGT_labels[3]
image_annotations(label_type, 5)
label_type = NGT_labels[1]
image_annotations(label_type, 1)
label_type = NGT_labels[0]
image_annotations(label_type, 5)
label_type = NGT_labels[2]
image_annotations(label_type, 0)



## 3- Central Venous Catheter (CVC)

> Central venous catheters (CVC) or lines (CVL) refer to a wide range of central venous access devices but can broadly be divided into four categories. They may be inserted by medical, surgical, anesthetic/ITU, or radiology specialists. 

In [None]:
label_type = CVC_labels[2]
image_annotations(label_type, 5)
label_type = CVC_labels[1]
image_annotations(label_type, 1)
label_type = CVC_labels[0]
image_annotations(label_type, 2)

## 4- Swan Ganz Catheter Present

> Pulmonary artery catheters (or Swan-Ganz catheters) are balloon flotation catheters that can be inserted simply, quickly, with little training and without fluoroscopic guidance, at the bedside, even in the seriously ill patient. Historically they were widely used to measure right heart hemodynamic indices and pulmonary arterial and capillary wedge pressures. More recently their use has fallen out of favor, due to adverse trial data, however, they still have important niche uses.
> 
> These catheters should ideally be positioned in the proximal right or left main pulmonary artery. 

In [None]:
label_type = SGC_labels[0]
image_annotations(label_type, 3)

# Distribution of the labels

In [None]:
label_counts = train.iloc[:, 1:-1].sum(axis=0).reset_index()
label_counts.columns = ['Type', 'Observations']
label_counts = label_counts

sns.set(style="darkgrid")
plt.figure(figsize=(12, 8))
plt.title('Number of observations', fontsize=16)
ax = sns.barplot(data = label_counts, x = 'Type', y = 'Observations', palette="viridis")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right")

for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x() + p.get_width() / 2., height, '{:1.2f}'.format(height), ha="center", weight='regular')

In [None]:
train_grouped = pd.DataFrame()

train_grouped['ETT'] = train[ETT_labels].sum(axis=1)
train_grouped['NGT'] = train[NGT_labels].sum(axis=1)
train_grouped['CVC'] = train[CVC_labels].sum(axis=1)
train_grouped['SGC'] = train[SGC_labels].sum(axis=1)

In [None]:
sns.set(style="darkgrid")
chart_color = 'viridis'
title_fontsize = 12
suptitle_fontsize = 16
chart_w, chart_h = 3, 4

fig, (ax11, ax12, ax13) = plt.subplots(1, 3, figsize = (chart_w*3, chart_h))
fig.suptitle('Endotracheal Tube Placement', fontsize=suptitle_fontsize, y=1.05)

a1 = sns.countplot(train[ETT_labels[0]], ax=ax11, palette=chart_color)
a1.set_title(ETT_labels[0], fontsize=title_fontsize, y=1.03)

a2 = sns.countplot(train[ETT_labels[1]], ax=ax12, palette=chart_color)
a2.set_title(ETT_labels[1], fontsize=title_fontsize, y=1.03)

a3 = sns.countplot(train[ETT_labels[2]], ax=ax13, palette=chart_color)
a3.set_title(ETT_labels[2], fontsize=title_fontsize, y=1.03)

fig, (ax21, ax22, ax23, ax24) = plt.subplots(1, 4, figsize = (chart_w*4, chart_h))
fig.suptitle('Nasogastric Tube Placement', fontsize=suptitle_fontsize, y=1.05)

b1 = sns.countplot(train[NGT_labels[0]], ax=ax21, palette=chart_color)
b1.set_title(NGT_labels[0], fontsize=title_fontsize, y=1.03)

b2 = sns.countplot(train[NGT_labels[1]], ax=ax22, palette=chart_color)
b2.set_title(NGT_labels[1], fontsize=title_fontsize, y=1.03)

b3 = sns.countplot(train[NGT_labels[2]], ax=ax23, palette=chart_color)
b3.set_title(NGT_labels[2], fontsize=title_fontsize, y=1.03)

b4 = sns.countplot(train[NGT_labels[3]], ax=ax24, palette=chart_color)
b4.set_title(NGT_labels[3], fontsize=title_fontsize, y=1.03)


fig, (ax31, ax32, ax33) = plt.subplots(1, 3, figsize = (chart_w*3, chart_h))
fig.suptitle('Central Venous Catheter', fontsize=suptitle_fontsize, y=1.05)

c1 = sns.countplot(train[CVC_labels[0]], ax=ax31, palette=chart_color)
c1.set_title(CVC_labels[0], fontsize=title_fontsize, y=1.03)

c2 = sns.countplot(train[CVC_labels[1]], ax=ax32, palette=chart_color)
c2.set_title(CVC_labels[1], fontsize=title_fontsize, y=1.03)

c3 = sns.countplot(train[CVC_labels[2]], ax=ax33, palette=chart_color)
c3.set_title(CVC_labels[2], fontsize=title_fontsize, y=1.03)


fig, ax41 = plt.subplots(1, 1, figsize = (chart_w, chart_h))
fig.suptitle('Swan Ganz Catheter', fontsize=suptitle_fontsize, y=1.05)

d1 = sns.countplot(train[SGC_labels[0]], ax=ax41, palette=chart_color)
d1.set_title(SGC_labels[0], fontsize=title_fontsize, y=1.03)


for ax in [ax11, ax12, ax13, ax21, ax22, ax23, ax24, ax31, ax32, ax33, ax41]:
    for p in ax.patches:
        height = p.get_height()
        ax.text(p.get_x() + p.get_width() / 2., height, '{:1.2f}'.format(height), ha="center", weight='regular')

In [None]:
sns.set(style="white")

train_grouped_ones = pd.DataFrame()
for i in train_grouped.keys():
    train_grouped_ones[i] = train_grouped[i].map(lambda x: 1 if x != 0 else 0)

corr = train_grouped_ones.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
fig, ax = plt.subplots(figsize=(11, 9))
ax.set_title('Correlation between categories', fontsize=16)
cmap = sns.diverging_palette(230, 20, as_cmap=True)


sns.heatmap(corr, mask=mask, cmap=cmap, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .6}, annot=True)

# To be continued...

# References

* [radiopaedia](https://radiopaedia.org/articles/lines-and-tubes-radiograph)

## Hey guys, this is my first notebook, if you have any tips for me to improve, I would love to hear it!