# RANZCR Tracheal Bifurcations Datasets

This notebook uses two Datasets available for RANZCR - 
*     [raddar's bifurcation location predictions](https://www.kaggle.com/raddar/ranzcr-clip-tracheal-bifurcation)
*     [dr konya's 5k manual annotations](https://www.kaggle.com/sandorkonya/5k-trachea-bifurcation-on-chest-xray)

CSV format of 5K tracheal bifurcations annotation is created from json with predicted locations added and saved as output

ETT Visualisations are included for examples of RANZCR annotation intubation and the two tracheal bifurcations from datasets

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os

import ast
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
train = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train.csv')
train_annotations = pd.read_csv('../input/ranzcr-clip-catheter-line-classification/train_annotations.csv')

In [None]:
trncols = train.columns.values
target_cols = trncols[1:-1]  # Target Columns from train

https://www.kaggle.com/raddar/ranzcr-clip-tracheal-bifurcation 

This dataset contains tracheal (bronchial) bifurcation location predictions of a YOLOv3 detector trained on several thousand hand labelled images from external data sources. Bifurcation points are used as a reference for deciding if intubation tube has been inserted correctly. Intubation is considered normal when tube tip is no less than 3cm above bifurcation point. If the distance is lower - abnormality is considered.

In [None]:
tbif = pd.read_csv('../input/ranzcr-clip-tracheal-bifurcation/RANZCR_CLiP_tracheal_bifurcation.csv')
len(tbif)  # all entries in train

In [None]:
tbif.head()

https://www.kaggle.com/sandorkonya/5k-trachea-bifurcation-on-chest-xray

The dataset contains manually annotated 5281 trachea bifurcation on x-rays of the dataset of the current challenge.    


In [None]:
tbif5k = pd.read_json('../input/5k-trachea-bifurcation-on-chest-xray/trachea_annotations.json', orient='index')
tbif5k.reset_index(drop=True, inplace=True)
len(tbif5k)

In [None]:
# add StudyInstanceUID from jpg filename
tbif5k['StudyInstanceUID'] = tbif5k['filename'].apply(lambda x: x.split('.jpg')[0])

In [None]:
def get_point_xy(x):
    cx = 0
    cy = 0
    # skip empty regions
    if len(x)>=1:
        xd = x[0]  # dict from regions   
        cx = xd['shape_attributes']['cx'] 
        cy = xd['shape_attributes']['cy']
   
    return pd.Series([cx,cy])

In [None]:
# note 2 entries with empty regions - set cx cy columns as 0s default
tbif5k['cx'] = 0
tbif5k['cy'] = 0

tbif5k[['cx', 'cy']] = tbif5k.apply(lambda row: get_point_xy(row.regions), axis=1)
tbif5k.head()

Copy of 5K trachea bifurcation for output as csv. Drop columns not needed.

Add raddar's trachea bifurcation predictions for comparison

In [None]:
tbif5kout = tbif5k.copy()
tbif5kout.drop(['filename', 'size', 'regions', 'file_attributes'], axis=1, inplace=True)
tbif5kout = pd.merge(tbif5kout, tbif, on=['StudyInstanceUID'], how='left')
tbif5kout.head()

Output 5K tracheal bifurcations annotation points with dataset predicted included

In [None]:
tbif5kout.to_csv('ranzcr_5K_tracheal_bifurcation_annotations.csv', index=False)

Merge 5K with train to get target colums for visualisations

In [None]:
tbif5kout = pd.merge(tbif5kout, train, on=['StudyInstanceUID'], how='left')
tbif5kout.head()

In [None]:
def f_table(list1):
    table_dic = {}
    for i in list1:
        if i not in table_dic.keys():
            table_dic[i] = 1
        else:
            table_dic[i] += 1
    return(table_dic)

Determine what Targets are present in 5K dataset

In [None]:
tar_freq = np.array([np.min(list(f_table(tbif5kout[target_cols].iloc[:,i]).values())) for i in range(len(target_cols))])
tbif5ktarg = pd.DataFrame(
                {                
                'target' : target_cols,
                'count' : tar_freq,                     
                })    
tbif5ktarg.head(11)

In [None]:
# ETT UIDs in 5K
abnorm5k_uid = tbif5kout.loc[tbif5kout['ETT - Abnormal']==1,'StudyInstanceUID'].tolist()
border5k_uid = tbif5kout.loc[tbif5kout['ETT - Borderline']==1, 'StudyInstanceUID'].tolist()  
norm5k_uid   = tbif5kout.loc[tbif5kout['ETT - Normal']==1,'StudyInstanceUID'].tolist()

In [None]:
# ETT UIDS in annotations for borderline
borderann_uid = train_annotations.loc[train_annotations.label=='ETT - Borderline', 'StudyInstanceUID'].tolist() 
bordset = (set(borderann_uid).intersection(set(border5k_uid)))
borderlist = list(bordset)
len(borderlist)

In [None]:
# ETT UIDS in annotations for Normal
normann_uid = train_annotations.loc[train_annotations.label=='ETT - Normal', 'StudyInstanceUID'].tolist() 
normset = (set(normann_uid).intersection(set(norm5k_uid)))
normlist = list(normset)
len(normlist)

# Tracheal Bifurcations Visualisation for ETT

In [None]:
# ref https://www.kaggle.com/raddar/simple-ett-bifurcation-visualization
def plot_xray(StudyInstanceUID, label):
    """
    intubation as green (if annotation exists)
    bifurcation as red (from raddar's predicted tracheal bifurcation)
    bifurc 5k as blue (from 5k annotations)
    """
    has_annot = len(train_annotations.loc[(train_annotations.StudyInstanceUID==StudyInstanceUID) & (train_annotations.label==label)] )
    img = cv2.imread('../input/ranzcr-clip-catheter-line-classification/train/'+StudyInstanceUID+'.jpg')
    bifurc_5k = (tbif5kout.loc[tbif5kout.StudyInstanceUID==StudyInstanceUID,['cx', 'cy']].values[0])
    bifurcation = ast.literal_eval(tbif5kout.loc[tbif5kout.StudyInstanceUID==StudyInstanceUID,'tracheal_bifurcation'].values[0])
    if has_annot > 0:
        intubation = ast.literal_eval(train_annotations.loc[(train_annotations.StudyInstanceUID==StudyInstanceUID) & (train_annotations.label==label),'data'].values[0])[0]
        img = cv2.circle(img, tuple(intubation), 50, (0,255,0), 10)
    img = cv2.circle(img, tuple(bifurcation), 50, (255,0,0), 10)        
    img = cv2.circle(img,(bifurc_5k[0], bifurc_5k[1]),50,(0,0,255), 10)  
    
    plt.figure(figsize=(12,12))
    plt.title(label = (StudyInstanceUID + '   ' + label))
    
    plt.imshow(img)

Plot some examples for ETT - Abnormal 

Intubation is green if annotation exists. Bifurcation blue for 5K annotations and red for raddar's predicted 


In [None]:
plot_xray(abnorm5k_uid[1],'ETT - Abnormal' )

In [None]:
plot_xray(abnorm5k_uid[5],'ETT - Abnormal' )

Plot some examples for ETT - Borderline

In [None]:
plot_xray(borderlist[30],'ETT - Borderline' )

In [None]:
plot_xray(borderlist[55],'ETT - Borderline' )

Plot some examples for ETT - Normal

In [None]:
plot_xray(normlist[100],'ETT - Normal' )

In [None]:
plot_xray(normlist[400],'ETT - Normal' )