# About:

When you have a broken arm, radiologists help save the dayâ€”and the bone. These doctors diagnose and treat medical conditions using imaging techniques like CT and PET scans, MRIs, and, of course, X-rays. Yet, as it happens when working with such a wide variety of medical tools, radiologists face many daily challenges, perhaps the most difficult being the chest radiograph. The interpretation of chest X-rays can lead to medical misdiagnosis, even for the best practicing doctor. Computer-aided detection and diagnosis systems (CADe/CADx) would help reduce the pressure on doctors at metropolitan hospitals and improve diagnostic quality in rural areas.

Existing methods of interpreting chest X-ray images classify them into a list of findings. There is currently no specification of their locations on the image which sometimes leads to inexplicable results. A solution for localizing findings on chest X-ray images is needed for providing doctors with more meaningful diagnostic assistance.

# Data

In this competition, we are classifying common thoracic lung diseases and localizing critical findings. This is an object detection and classification problem.

For each test image, you will be predicting a bounding box and class for all findings. If you predict that there are no findings, you should create a prediction of "14 1 0 0 1 1" (14 is the class ID for no finding, and this provides a one-pixel bounding box with a confidence of 1.0).

The images are in DICOM format, which means they contain additional data that might be useful for visualizing and classifying.

**Dataset information**

The dataset comprises 18,000 postero-anterior (PA) CXR scans in DICOM format, which were de-identified to protect patient privacy. All images were labeled by a panel of experienced radiologists for the presence of 14 critical radiographic findings as listed below:

* 0 - Aortic enlargement
* 1 - Atelectasis
* 2 - Calcification
* 3 - Cardiomegaly
* 4 - Consolidation
* 5 - ILD
* 6 - Infiltration
* 7 - Lung Opacity
* 8 - Nodule/Mass
* 9 - Other lesion
* 10 - Pleural effusion
* 11 - Pleural thickening
* 12 - Pneumothorax
* 13 - Pulmonary fibrosis

The "No finding" observation (14) was intended to capture the absence of all findings above.



# Dicom to numpy ([From here](http://www.kaggle.com/raddar/convert-dicom-to-np-array-the-correct-way))

In [None]:
import pydicom
from pydicom.pixel_data_handlers.util import apply_voi_lut

import numpy as np
import pandas as pd 

import matplotlib.pyplot as plt
import  matplotlib.patches as patches
%matplotlib inline

import seaborn as sns

from random import randint

In [None]:
def read_xray(path, voi_lut = True, fix_monochrome = True):
    dicom = pydicom.read_file(path)
    
    # VOI LUT (if available by DICOM device) is used to transform raw DICOM data to "human-friendly" view
    if voi_lut:
        data = apply_voi_lut(dicom.pixel_array, dicom)
    else:
        data = dicom.pixel_array
               
    # depending on this value, X-ray may look inverted - fix that:
    if fix_monochrome and dicom.PhotometricInterpretation == "MONOCHROME1":
        data = np.amax(data) - data
        
    data = data - np.min(data)
    data = data / np.max(data)
    data = (data * 255).astype(np.uint8)
        
    return data

# Plot boxes

In [None]:
t = [1,2,3,4,5,6]
t[0:3][0:2]

In [None]:
def plot_dicom(path):
    img = read_xray(path)
    plt.figure(figsize = (12,12))
    plt.imshow(img, 'gray')
plot_dicom('/kaggle/input/vinbigdata-chest-xray-abnormalities-detection/test/f923cd5cb2daf790272dbef850f5647b.dicom')

In [None]:
data = pd.read_csv('/kaggle/input/vinbigdata-chest-xray-abnormalities-detection/train.csv')
data.head(10)


In [None]:
data[data['image_id']=='051132a778e61a86eb147c7c6f564dfe']

We see that one image may appear many time in the excel file, since it contains many abnormalities.

Let's make a function to plot borders.

In [None]:
def plot_borders(imageid,data):
    img = read_xray('/kaggle/input/vinbigdata-chest-xray-abnormalities-detection/train/{}.dicom'.format(imageid))
    infos = data[data['image_id'] == imageid]
    fig = plt.figure(figsize = (12,12)) 
    class_ids = infos['class_id'].unique()

    label2color = {class_id:[randint(0,255)/255 for i in range(3)] for class_id in class_ids}
  
    ax = fig.add_subplot(111) 
    for index, row in infos.iterrows():
        # Create a Rectangle patch.
        x_min = row['x_min']
        x_max = row['x_max']
        y_min = row['y_min']
        y_max = row['y_max']
        color = label2color[row['class_id']]
        # Draw the rectangle
        rect = patches.Rectangle((x_min,y_min),x_max-x_min,y_max-y_min,linewidth=2,edgecolor= color,facecolor='none',label=row['class_name'])

        # Add the patch to the Axes
        ax.add_patch(rect)
        ax.legend()

    plt.imshow(img, 'gray')
plot_borders('051132a778e61a86eb147c7c6f564dfe',data)

In [None]:
#Another example
plot_borders('9a5094b2563a1ef3ff50dc5c7ff71345',data)

In [None]:
#The last one
plot_borders('afb6230703512afc370f236e8fe98806',data)

In [None]:
plt.figure(figsize=(22,5))
sns.countplot(data['class_name'])

The data is unbalanced, a lot of images have no findings of pneumonial anomalies. For the anomalies, the most occurent ones are Cardiomegaly,Aortic enlargement, Pleural thickening, Pulmonary fibrosis. Because of this, we would need some techniques to combat this unbalance problem

1. I will try to make some insights on the coordinates containing anomalies, and try to see if there is any correlation between the abnormalities types and the coordinates.

In [None]:
mean_coordinates = data.groupby('class_name').mean()
mean_coordinates['image_id'] = '50a418190bc3fb1ef1633bf9678929b3' # this is an image with no findings, we take it just to show the locations of the mean cordiantes
mean_coordinates['class_name'] =  mean_coordinates.index
mean_coordinates.reset_index(drop=True)
plot_borders('50a418190bc3fb1ef1633bf9678929b3',mean_coordinates)

# Crop images containig anomalies:

In [None]:
def crop(imageid):
    
    w=12
    h=12
    fig=plt.figure(figsize=(9, 25))
    rows = 1
    
    img = read_xray('/kaggle/input/vinbigdata-chest-xray-abnormalities-detection/train/{}.dicom'.format(imageid))
    infos = data[data['image_id'] == imageid]
    columns = infos.shape[0]

    i=1
    for index, row in infos.iterrows():
        x_min = int(row['x_min'])
        x_max = int(row['x_max'])
        y_min = int(row['y_min'])
        y_max = int(row['y_max'])
        label = row['class_name']
        
        abnormality = img[x_min:x_max,y_min:y_max]
        fig.add_subplot(rows, columns, i)
        plt.imshow(abnormality, 'gray')
        plt.title(label)
        i=i+1
    plt.tight_layout() 
    plt.show()
crop('9a5094b2563a1ef3ff50dc5c7ff71345')