# About the competition and dataset
 
The competition is related to healthcare domain which expects you to detect the position of catheters and lines on chest x-rays.

### What is Catheters?
Catheter is a soft hollow tube passed into the body for specific reasons. For example, a urinary catheter is passed into the body to drain out the urine from the body through these tubes if a person cannot empty the urine in the normal way. If these catheters are not introduced into the body at the right positions, it can cause serious complications to the person's body.

Below is an image of a catheter:
![Chest catheter](https://5.imimg.com/data5/EI/OF/UN/SELLER-134485/chest-drainage-catheter-250x250.jpg)

The dataset given in the competition is about catheter tubes placed in the upper parts of the human body whose position can be understood from a chest x-ray.

According to the competition description, the gold standard for the confirmation of tube positions are chest radiographs. But for a human expert, he/she will have to manually check the chest x-rays to confirm whether these tubes are in the optimal position which costs him/her a lot of time. So, the goal of this competition is to train machine learning models to predict whether a tube is placed in the optimal position or not, given a chest x-ray. Earlier detection of malpositioned catheters is even more important during this pandemic as COVID-19 patients are increasing.

### Getting familiar with the dataset

The dataset contains a lot of biological terms which may be confusing if you haven't looked back at your biology lessons for a long time. But we will discuss them together. 

The dataset contains three types of tubes for which we will have to make predictions.

1. **Endotracheal tube(ETT)** - This is nothing but a tube passed through the trachea(or windpipe) which is a pipe like structure which helps in the passage of air(breathing). Endotracheal tube is a flexible plastic tube placed through the mouth into the trachea to help a patient breathe.

<!--
<img align="left" src="https://med.stanford.edu/content/dam/sm/ctsurgery/images/clinical-care/thoracic-images/trachea-airway/tracheal-stenosis-2.jpg" style="width: 300px; height: 300px"/>
<img align="right" src="https://dm3omg1n1n7zx.cloudfront.net/rcni/static/journals/ns/30/35/ns.30.35.36.s46/graphic/ns_v30_n35_46_0002.jpg" style="width: 300px; height: 300px"/>
-->
<img src="https://dm3omg1n1n7zx.cloudfront.net/rcni/static/journals/ns/30/35/ns.30.35.36.s46/graphic/ns_v30_n35_46_0002.jpg" style="width: 300px; height: 300px"/>

2. **Nasogastric tube(NGT)** - This is a flexible tube made of either rubber or plastic which is passed through the nose to the esophagus(a tube connecting the throat with the stomach) into the stomach.This is used for feeding food and medicine to the stomach through the nose.
<img src="https://www.oxfordmedicaleducation.com/wp-content/uploads/2015/04/Diagram_showing_the_position_of_a_nasogastric_tube_CRUK_340.png" style="width: 300px; height: 300px"/>

3. **Central venous catheter(CVC)** - This is a thin, flexible tube inserted into the body through a vein usually below the collar bone and is guided into a large vein above the right side of the heart called superior vena cava. CVC is introduced to give fluids, blood or other drugs.

<img src="https://upload.wikimedia.org/wikipedia/commons/6/60/Blausen_0181_Catheter_CentralVenousAccessDevice_NonTunneled.png" style="width: 400px; height: 300px"/>


### Going deeper into the dataset

The ultimate goal of the competition is that given an image of the chest x-ray, our model should be able to make one of the following predictions:

1. **ETT - Abnormal(endotracheal tube placement abnormal)** --> This means that the endotracheal tube is positioned incorrectly and requires immediate repositioning.

2. **ETT - Borderline(endotracheal tube placement borderline)** --> This means that the endotracheal tube may require some repositioning but in most cases it will work fine in the current position.

3. **ETT - Normal(endotracheal tube placement normal)** --> This means that the endotracheal tube is placed correctly and no repositioning is required.

4. **NGT - Abnormal(nasogastric tube placement abnormal)** --> Nasogastric tube is placed incorrectly, immediate repositioning is required.

5. **NGT - Borderline(nasogastric tube placement borderline)** --> Nasogastric tube may require some repositioning but still it will work fine in the current position.

6. **NGT - Incompletely Imaged(nasogastric tube placement inconclusive due to imaging)** --> Not able to conclude whether nasogastric tube placement requires repositioning or not due to problems in imaging.

7. **NGT - Normal(nasogastric tube placement normal)** --> Nasogastric tube placement is correct and no repositioning is required.

8. **CVC - Abnormal(central venous catheter placement abnormal)** --> Central venous catheter placement is abnormal and requires immediate repositioning.

9. **CVC - Borderline(central venous catheter placement borderline)** --> Catheter may require some repositioning but still works fine in the current position.

10. **CVC - Normal(central venous catheter placement normal)** - Catheter placement is correct and no repositioning is required.

11. **Swan Ganz Catheter Present** - Swan Ganz Catheter is a thin tube passed into the right side of the heart and the arteries leading to the lungs. This is used for monitoring heart's function, blood flow and pressures in and around the heart. 

<img src="https://medlineplus.gov/ency/images/ency/fullsize/18087.jpg" style="width: 400px; height: 300px"/>

### Understanding the dataset structure

1. **train** --> This folder contains all the chest x-ray images for training the model.

2. **train.csv** --> This csv file contains the name of the chest x-ray image, patient id and all the labels as described in the above section. This is a sample from the train.csv file:

In [None]:
import pandas as pd
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv')
train.head()

3. train_annotations.csv --> This csv file contains the position of the catheter tube('data' column) in the x-ray image.

In [None]:
from ast import literal_eval
annot = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train_annotations.csv')
annot['data'] = annot['data'].apply(literal_eval)
annot.head()

# Basic EDA

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from ast import literal_eval

In [None]:
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv')
annot = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train_annotations.csv')
train.head()

In [None]:
annot.head()

In [None]:
train.shape, annot.shape

**As we've said earlier, train.csv has image id, patient id along with their labels and train_annotations.csv has image id and their corresponding annotation points. From the shapes of the csv files we can understand that the dataset has x-ray images with and without catheter tubes.**

In [None]:
train.info(), annot.info()

In [None]:
annot['data'][0][0]

**From the above output, we can see that the 'data' column is not in a way that we had expected. So, let's change it.**

In [None]:
annot['data'] = annot['data'].apply(literal_eval)
annot['data'][0][0], annot['data'].head()

In [None]:
print(f"Out of {train.shape[0]} there are {train['PatientID'].nunique()} unique patients")

In [None]:
patient_count = train['PatientID'].value_counts()

plt.figure(figsize=(10, 6))
sns.countplot(x=patient_count);

We can see that that there are multiple x-rays for some of the patients

In [None]:
label_count = train.drop(['StudyInstanceUID', 'PatientID'], axis=1).sum(axis=0)

sns.barplot(x=label_count.index, y=label_count)
plt.xticks(rotation=90);

**For each of the categoies(ETT, NGT and CVC) the normal images are higher in number and abnormal images has lower number of images**

In [None]:
normal, borderline, abnormal = 0, 0, 0
for i in label_count.index:
    if 'abnormal' in i.lower():
        abnormal += label_count[i]
    elif 'borderline' in i.lower():
        borderline += label_count[i]
    else:
        normal += label_count[i]
        
sns.barplot(x=['normal_imgs', 'borderline_imgs', 'abnormal_imgs'], 
            y=[normal, borderline, abnormal]);

**We can conclude that, overall, normal images are higher in number than abnormal images**

### Visualizing the chest x-rays

In [None]:
def plot_image(img_id):
    img_path = '../input/ranzcr-clip-catheter-line-classification/train/'+img_id+'.jpg'
    img = plt.imread(img_path)
    annotation = annot[annot['StudyInstanceUID']==img_id]['data'].values[0]
    label = str(annot[annot['StudyInstanceUID']==img_id]['label'].values[0])
    
    x = [x[0] for x in annotation]
    y = [x[1] for x in annotation]
    
    plt.figure(figsize=(10, 6))
    plt.subplot(1, 2, 1)
    plt.title('Actual Image')
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    
    plt.subplot(1, 2, 2)
    plt.title(label)
    plt.imshow(img, cmap='gray')
    plt.axis('off')
    plt.plot(x, y, 'r-', linewidth=4)
    plt.show()

start_idx = 100
end_idx = start_idx+10
for i in range(start_idx, end_idx, 2):
    plot_image(annot['StudyInstanceUID'][i])

**If you enjoyed the notebook, it will be great if you upvote my work**