# Table of Contents
1. Introduction
2. Objectives
3. Data
4. Methods(Implementation)
5. Exploratory Data Analysis
6. Results
7. Discussion
8. Furture Improvement
9. References

**1. Introduction**

When you have a broken arm, radiologists help save the day—and the bone. These doctors diagnose and treat medical conditions using imaging techniques like CT and PET scans, MRIs, and, of course, X-rays. Yet, as it happens when working with such a wide variety of medical tools, radiologists face many daily challenges, perhaps the most difficult being the chest radiograph. The interpretation of chest X-rays can lead to medical misdiagnosis, even for the best practicing doctor. Computer-aided detection and diagnosis systems (CADe/CADx) would help reduce the pressure on doctors at metropolitan hospitals and improve diagnostic quality in rural areas.

The annotations were collected via VinBigData's web-based platform, VinLab. Details on building the dataset can be found in the organizer's recent paper “VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations”.und in the organizer's recent paper “VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations”.

**2. Objective**

Existing methods of interpreting chest X-ray images classify them into a list of findings. There is currently no specification of their locations on the image which sometimes leads to inexplicable results. A solution for localizing findings on chest X-ray images is needed for providing doctors with more meaningful diagnostic assistance

In this competition, we are classifying common thoracic lung diseases and localizing critical findings. This is an object detection and classification problem from chest x-ray image (radiographs)

**3. Data**

**3.1 Intro**

In this competition:

Task: Automatically localize and classify 14 types of thoracic abnormalities from chest radiographs.
Dataset: Consisting of 18,000 scans: 15,000 train images and will be evaluated on a test set of 3,000 images.
For each test image, you will be predicting a bounding box and class for all findings. If you predict that there are no findings, you should create a prediction of "14 1 0 0 1 1" (14 is the class ID for no finding, and this provides a one-pixel bounding box with a confidence of 1.0).

The images are in DICOM format, which means they contain additional data that might be useful for visualizing and classifying

**3.2 Dataset information**

The dataset comprises 18,000 postero-anterior (PA) CXR scans in DICOM format, which were de-identified to protect patient privacy. All images were labeled by a panel of experienced radiologists for the presence of 14 critical radiographic findings as listed below:

We consider 14 critical radiographic findings as listed below (click for further informations):

0 - [Aortic enlargement](https://en.wikipedia.org/wiki/Aortic_aneurysm) <br>
1 - [Atelectasis](https://en.wikipedia.org/wiki/Atelectasis) <br>
2 - [Calcification](https://en.wikipedia.org/wiki/Calcification) <br>
3 - [Cardiomegaly](https://en.wikipedia.org/wiki/Cardiomegaly) <br>
4 - [Consolidation](https://en.wikipedia.org/wiki/Pulmonary_consolidation) <br>
5 - [ILD](https://en.wikipedia.org/wiki/Interstitial_lung_disease) <br>
6 - [Infiltration](https://en.wikipedia.org/wiki/Infiltration_(medical)) <br>
7 - [Lung Opacity](https://en.wikipedia.org/wiki/Ground-glass_opacity) <br>
8 - [Nodule/Mass](https://en.wikipedia.org/wiki/Lung_nodule) <br>
9 - Other lesion <br>
10 - [Pleural effusion](https://en.wikipedia.org/wiki/Pleural_effusion) <br>
11 - [Pleural thickening](https://en.wikipedia.org/wiki/Pleural_thickening) <br>
12 - [Pneumothorax](https://en.wikipedia.org/wiki/Pneumothorax) <br>
13 - [Pulmonary fibrosis](https://en.wikipedia.org/wiki/Pulmonary_fibrosis#:~:text=Pulmonary%20fibrosis%20is%20a%20condition,%2C%20pneumothorax%2C%20and%20lung%20cancer.)
14 - No Finding

The "No finding" observation (14) was intended to capture the absence of all findings above.

**4. Methods (Implementation)**

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import pydicom as dicom
import cv2

import warnings
warnings.filterwarnings("ignore")

In [None]:
path = '/kaggle/input/vinbigdata-chest-xray-abnormalities-detection/'
os.listdir(path)

In [None]:
train_data = pd.read_csv(path+'train.csv')
samp_subm = pd.read_csv(path+'sample_submission.csv')

**5. Exploratory Data Analysis**

In [None]:
print('Number train samples:', len(train_data.index))
print('Number test samples:', len(samp_subm.index))

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(12, 4))
x = train_data['class_name'].value_counts().keys()
y = train_data['class_name'].value_counts().values
ax.bar(x, y)
ax.set_xticklabels(x, rotation=90)
ax.set_title('Distribution of the labels')
plt.grid()
plt.show()

As we can see the dataset is inbalanced.

In [None]:
# Read DICOM Files
idnum = 2
image_id = train_data.loc[idnum, 'image_id']
data_file = dicom.dcmread(path+'train/'+image_id+'.dicom')
img = data_file.pixel_array

In [None]:
print(data_file)

In [None]:
print('Image shape:', img.shape)

In [None]:
bbox = [train_data.loc[idnum, 'x_min'],
        train_data.loc[idnum, 'y_min'],
        train_data.loc[idnum, 'x_max'],
        train_data.loc[idnum, 'y_max']]
fig, ax = plt.subplots(1, 1, figsize=(20, 4))
ax.imshow(img, cmap='gray')
p = matplotlib.patches.Rectangle((bbox[0], bbox[1]),
                                 bbox[2]-bbox[0],
                                 bbox[3]-bbox[1],
                                 ec='r', fc='none', lw=2.)
ax.add_patch(p)
plt.show()

In [None]:
def plot_train_data(idx_list):
    fig, axs = plt.subplots(1, 3, figsize=(15, 10))
    fig.subplots_adjust(hspace = .1, wspace=.1)
    axs = axs.ravel()
    for i in range(3):
        image_id = train_data.loc[idx_list[i], 'image_id']
        data_file = dicom.dcmread(path+'train/'+image_id+'.dicom')
        img = data_file.pixel_array
        axs[i].imshow(img, cmap='gray')
        axs[i].set_title(train_data.loc[idx_list[i], 'class_name'])
        axs[i].set_xticklabels([])
        axs[i].set_yticklabels([])
        if train_data.loc[idx_list[i], 'class_name'] != 'No finding':
            bbox = [train_data.loc[idx_list[i], 'x_min'],
                    train_data.loc[idx_list[i], 'y_min'],
                    train_data.loc[idx_list[i], 'x_max'],
                    train_data.loc[idx_list[i], 'y_max']]
            p = matplotlib.patches.Rectangle((bbox[0], bbox[1]),
                                             bbox[2]-bbox[0],
                                             bbox[3]-bbox[1],
                                             ec='r', fc='none', lw=2.)
            axs[i].add_patch(p)

In [None]:
for num in range(15):
    idx_list = train_data[train_data['class_id']==num][0:3].index.values
    plot_train_data(idx_list)

6. Results


In [None]:
#samp_subm.to_csv('submission1.csv', index=False)

In [None]:
pred_2class = pd.read_csv("../input/vinbigdata-2class-prediction/2-cls test pred.csv")
low_threshold = 0.001
high_threshold = 0.87
pred_2class

In [None]:
NORMAL = "14 1 0 0 1 1"

pred_det_df = pd.read_csv("../input/vinbigdatastack/submission_postprocessed.csv")
n_normal_before = len(pred_det_df.query("PredictionString == @NORMAL"))
merged_df = pd.merge(pred_det_df, pred_2class, on="image_id", how="left")


if "target" in merged_df.columns:
    merged_df["class0"] = 1 - merged_df["target"]

c0, c1, c2 = 0, 0, 0
for i in range(len(merged_df)):
    p0 = merged_df.loc[i, "class0"]
    if p0 < low_threshold:

        c0 += 1
    elif low_threshold <= p0 and p0 < high_threshold:

        merged_df.loc[i, "PredictionString"] += f" 14 {p0} 0 0 1 1"
        c1 += 1
    else:

        merged_df.loc[i, "PredictionString"] = NORMAL
        c2 += 1

In [None]:
n_normal_after = len(merged_df.query("PredictionString == @NORMAL"))
print(
    f"n_normal: {n_normal_before} -> {n_normal_after} with threshold {low_threshold} & {high_threshold}"
)
print(f"Keep {c0} Add {c1} Replace {c2}")
submission_filepath = str("submission.csv")
submission_df = merged_df[["image_id", "PredictionString"]]
submission_df.to_csv(submission_filepath, index=False)
print(f"Saved to {submission_filepath}")

**7. Discussion**

**8. Furture Improvement**


**9. References**

1. https://www.kaggle.com/kyawkyaw/vinbigdata-chest-x-ray-abnormalities-classifier