<center><h1 style="color:blue">VinBigData Chest X-ray Abnormalities Detection</h1></center>
<center><h1 style="color:red">Automatically localize and classify thoracic abnormalities from chest radiographs</h1></center>
<img src="https://storage.googleapis.com/kaggle-competitions/kaggle/24800/logos/header.png?t=2020-12-17-19-26-15">

# **Competetion**

- **Task**: Automatically localize and classify <span style="color:red">14 types of thoracic abnormalities</span> from chest radiographs. 
- **Dataset**: Consisting of <span style="color:red">18,000 scans</span>: <span style="color:blue">15,000 train images</span> and will be evaluated on a <span style="color:blue">test set of 3,000 images</span>. 

- These annotations were collected via VinBigData's web-based platform, VinLab. Details on building the dataset can be found in the organizer's recent paper [“VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations”](https://storage.googleapis.com/kaggle-media/competitions/VinBigData/VinDr_CXR_data_paper.pdf).

We are classifying common thoracic lung diseases and localizing critical findings. This is **<span style="color:green"> an object detection and classification </span>** problem.

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as ptc
import seaborn as sns
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut
from skimage import exposure
import warnings

warnings.filterwarnings('ignore')

# Load training data (Train.csv)

> For each test image, you will be predicting a bounding box and class for all findings. If you predict that there are no findings, you should create a prediction of "14 1 0 0 1 1" (14 is the class ID for no finding, and this provides a one-pixel bounding box with a confidence of 1.0).

The images are in <span style="color:red"> DICOM format</span>, which means they contain additional data that might be useful for visualizing and classifying.

In [None]:
df_train = pd.read_csv("../input/vinbigdata-chest-xray-abnormalities-detection/train.csv")
print (df_train.shape)
df_train.head()

In [None]:
df_train.isna().sum().to_frame().rename(columns={0:"NA Counts"}).style.background_gradient(cmap="summer")

In [None]:
df_train.nunique().to_frame().rename(columns={0:"Unique Values"}).style.background_gradient(cmap="plasma")

# **EDA**

In [None]:
plt.figure(figsize=(20, 8))
sns.set_style('darkgrid')
sns.countplot(y="class_name", data=df_train, palette='Set2')
plt.title("Class Name Distribution", weight='bold', fontsize=22)
plt.ylabel("Class Name", weight='bold')
plt.xlabel("Count", weight='bold')
plt.show()

In [None]:
plt.figure(figsize=(20, 8))
sns.set_style('darkgrid')
sns.countplot(y="rad_id", data=df_train)
plt.title("Rad ID Distribution", weight='bold', fontsize=22)
plt.ylabel("Rad ID", weight='bold')
plt.xlabel("Count", weight='bold')
plt.show()

# **Load DICOM data**

1. Reference: https://www.kaggle.com/raddar/convert-dicom-to-np-array-the-correct-way
1. Reference: https://www.kaggle.com/raddar/popular-x-ray-image-normalization-techniques

In [None]:
train_dir = "../input/vinbigdata-chest-xray-abnormalities-detection/train"
test_dir = "../input/vinbigdata-chest-xray-abnormalities-detection/test"

In [None]:
def read_xray(path, voi_lut=True, fix_monochrome=True):
    dcm_data = pydicom.read_file(path)
    
    def show_dcm_info(data):
        print("Gender :", data.PatientSex)
        if 'PixelData' in data:
            rows = int(data.Rows)
            cols = int(data.Columns)
            print("Image size : {rows:d} x {cols:d}, {size:d} bytes".format(rows=rows, cols=cols, size=len(data.PixelData)))
            if 'PixelSpacing' in data:
                print("Pixel spacing :", data.PixelSpacing)
    
    show_dcm_info(dcm_data)
    
    # VOI LUT (if available by DICOM device) is used to transform raw DICOM data to "human-friendly" view
    if voi_lut:
        data = apply_voi_lut(dcm_data.pixel_array, dcm_data)
    else:
        data = dcm_data.pixel_array
               
    # depending on this value, X-ray may look inverted - fix that:
    if fix_monochrome and dcm_data.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
        
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
        
    return data

## **Visualizations of DICOM Images**

1. Raw Image
1. Histogram Normalized Image
1. CLAHE Normalized Image
1. Annotated Image

In [None]:
flag = 1
while(flag):    
    idx = np.random.randint(0, df_train.shape[0])
    if df_train.loc[idx, 'class_name'] != 'No finding':
        flag = 0
        
        img_id = df_train.loc[idx, 'image_id']
        img = read_xray(os.path.join(train_dir, img_id+".dicom"))

        fig, ax = plt.subplots(2,2, figsize=(20,20))
        ax[0][0].imshow(img, 'gray')
        ax[0][0].set_title("Raw Image",fontsize=15)

        ax[0][1].imshow(exposure.equalize_hist(img), 'gray')
        ax[0][1].set_title("Histogram Normalized Image",fontsize=15)

        ax[1][1].imshow(exposure.equalize_adapthist(img), 'gray')
        ax[1][1].set_title("CLAHE Normalized Image",fontsize=15)

        bbox = [df_train.loc[idx, 'x_min'], df_train.loc[idx, 'y_min'], df_train.loc[idx, 'x_max'], df_train.loc[idx, 'y_max']]
        patch = ptc.Rectangle((bbox[0], bbox[1]), bbox[2]-bbox[0], bbox[3]-bbox[1], ec='r', fc='none', lw=2.)
        ax[1][0].imshow(img, 'gray')
        ax[1][0].add_patch(patch)
        ax[1][0].set_title("Annotated Image",fontsize=15)

        plt.suptitle('DICOM Image',fontsize=25)
        plt.show()
    
    else:
        continue

# **This work is in progress. Feel free to <span style="color:red"> Upvote </span> and give <span style="color:blue"> Feedback </span>.**