<div align='center'><font size="5" color='#353B47'>Chest-X-ray</font></div>
<div align='center'><font size="4" color="#353B47">Starter exploration and data explanations</font></div>
<br>
<hr>

The objective of this notebook is to give some insights regarding common thoracic lung diseases and its localisation.

# <div id="summary">Summary</div>

**<font size="2"><a href="#chap1">1. Load libraries</a></font>**
**<br><font size="2"><a href="#chap2">2. Quick Analysis</a></font>**
**<br><font size="2"><a href="#chap3">3. Extracting meta information from .dicom files</a></font>**
**<br><font size="2"><a href="#chap4">4. Preprocessing meta information</a></font>**
**<br><font size="2"><a href="#chap5">5. Fusion of boxings</a></font>**
**<br><font size="2"><a href="#chap6">6. Presentation of each radiographic observations</a></font>**

# <div id="chap1">1. Load libraries

In [None]:
import warnings
warnings.filterwarnings('ignore')

from matplotlib.patches import Rectangle
import numpy as np
import pandas as pd
import os
import re
import random
import matplotlib.pyplot as plt
import plotly
import plotly.graph_objects as go
import plotly.express as px
from pydicom import dcmread
from tqdm import tqdm
import multiprocessing as mp

In [None]:
# Path to dataframe and image folders
PATH = "../input/vinbigdata-chest-xray-abnormalities-detection"

# Import trainset
train_dataframe = pd.read_csv(os.path.join(PATH, 'train.csv'))

# <div id="chap2">2. Quick Analysis

## Careful, not unique image_id !

In [None]:
train_dataframe.image_id.value_counts()

In [None]:
len(train_dataframe.image_id.unique())

This means that several radiologists gave different boxing coordinates for a same image. 

## Distribution of critical radiographic observations

There are 15 different radiographic observations which correspond to: 

0 - Aortic enlargement
<br>1 - Atelectasis
<br>2 - Calcification
<br>3 - Cardiomegaly
<br>4 - Consolidation
<br>5 - ILD
<br>6 - Infiltration
<br>7 - Lung Opacity
<br>8 - Nodule/Mass
<br>9 - Other lesion
<br>10 - Pleural effusion
<br>11 - Pleural thickening
<br>12 - Pneumothorax
<br>13 - Pulmonary fibrosis
<br>14 - No finding

Number of radiographic observations among all train images

In [None]:
def plot_distribution_classes(x_values, y_values, title):
    
    colors = ['rgb(26, 118, 255)',] * 15
    colors[0] = 'lightslategray'

    fig = go.Figure(data=[go.Bar(
        x=x_values, 
        y=y_values,
        text=y_values,
        marker_color=colors
    )])

    fig.update_layout(height=600, width=900, title_text=title)
    fig.update_xaxes(type="category")

    fig.show()

In [None]:
indexes = train_dataframe.class_id.unique()
counts = train_dataframe.class_id.value_counts()

sorted_dict = dict(zip(indexes, counts))
sorted_dict = {k: v for k, v in sorted(sorted_dict.items(), key=lambda item: item[1], reverse = True)}

x = list(sorted_dict.keys())
y = list(sorted_dict.values())

plot_distribution_classes(x, y, 
                          title="Distribution of radiographic observations")

The gray bar means 31818 observations concluded without any finding from radiologists

In [None]:
train_dataframe.info()

In [None]:
train_dataframe.head()

## Distribution of radiologist observations

Number of train images treated per radiologist

In [None]:
def plot_distribution_radiologist_obs(x_values, y_values, title):
    
    colors = ['lightslategray',] * 17
    colors[0] = 'crimson'
    colors[1] = 'crimson'
    colors[2] = 'crimson'
    
    fig = go.Figure(data=[go.Bar(
        x=x_values, 
        y=y_values,
        text=y_values,
        marker_color=colors
    )])

    fig.update_layout(height=600, width=900, title_text=title)
    fig.update_xaxes(type="category")

    fig.show()

In [None]:
# Show distribution of count of radio
indexes = train_dataframe[['rad_id', 'image_id']].groupby(['rad_id']).agg(['count']).index
counts = train_dataframe[['rad_id', 'image_id']].groupby(['rad_id']).agg(['count']).values.ravel()

sorted_dict = dict(zip(indexes, counts))
sorted_dict = {k: v for k, v in sorted(sorted_dict.items(), key=lambda item: item[1], reverse = True)}

x = list(sorted_dict.keys())
y = list(sorted_dict.values())

plot_distribution_radiologist_obs(x, y,
                                  title = "Distribution of radiologist observations")

--------

**<font size="2"><a href="#summary">Back to summary</a></font>**

# <div id="chap3">3. Extracting meta information from .dicom files

In [None]:
ds = dcmread(os.path.join(PATH, 'train', '000434271f63a053c4128a0ba6352c7f.dicom'))
ds

In [None]:
def infer_one_observation(file_path):
    
    ds = dcmread(file_path)
    image_id = os.path.basename(file_path)
        
    observation_dict = {}
    observation_dict['image_id'] = image_id.split(sep=".")[0]
    
    file_meta_keys = list(ds.file_meta._dict.keys())
    remaining_meta_keys = list(ds._dict.keys())
    
    for key in file_meta_keys:
        observation_dict[str(key)] = str(ds.file_meta[key].value)
        
    # Not taking into account pixel value
    for key in remaining_meta_keys:
        if key != (0x7fe0, 0x0010):
            observation_dict[str(key)] = str(ds[key].value)
        
    return observation_dict

In [None]:
mapper_dict = {'image_id':'image_id',
               '(0002, 0000)':"File Meta Information Group Length",
               '(0002, 0001)':"File Meta Information Version",
               '(0002, 0002)':"Media Storage SOP Class UID",
               '(0002, 0003)':"Media Storage SOP Instance UID",
               '(0002, 0010)':"Transfer Syntax UID",
               '(0002, 0012)':"Implementation Class UID",
               '(0002, 0013)':"Implementation Version Name",
               '(0002, 0016)':"Source Application Entity Title",
               '(0010, 0040)':"Patient's Sex",
               '(0010, 1010)':"Patient's Age",
               '(0010, 1020)':"Patient's Size",
               '(0010, 1030)':"Patient's Weight",
               '(0028, 0002)':"Samples per Pixel",
               '(0028, 0004)':"Photometric Interpretation",
               '(0028, 0008)':"Number of Frames",
               '(0028, 0010)':"Rows",
               '(0028, 0011)':"Columns",
               '(0028, 0030)':"Pixel Spacing",
               '(0028, 0034)':"Pixel Aspect Ratio",
               '(0028, 0100)':"Bits Allocated",
               '(0028, 0101)':"Bits Stored",
               '(0028, 0102)':"High Bit",
               '(0028, 0103)':"Pixel Representation",
               '(0028, 0106)':"Smallest Image Pixel Value",
               '(0028, 0107)':"Largest Image Pixel Value",
               '(0028, 1050)':"Window Center",
               '(0028, 1051)':"Window Width",
               '(0028, 1052)':"Rescale Intercept",
               '(0028, 1053)':"Rescale Slope",
               '(0028, 2110)':"Lossy Image Compression",
               '(0028, 2112)':"Lossy Image Compression Ratio",
               '(0028, 2114)':"Lossy Image Compression Method"
              }

In [None]:
def extract_meta_information(folder):
    
    folder_filenames = os.listdir(os.path.join(PATH, folder))
    one_obs = infer_one_observation(os.path.join(PATH, folder, folder_filenames[0]))
    metadata = pd.DataFrame(columns = one_obs.keys())

    for filename in tqdm(folder_filenames[:20]):
        one_obs = infer_one_observation(os.path.join(PATH, folder, filename))
        metadata = metadata.append(one_obs, ignore_index=True)
        
    metadata.columns = metadata.columns.map(mapper_dict)
    metadata.to_csv(f"{folder}_dicom_metadata.csv", index=False)
    
    return metadata

Please note that this function only select first 20 elements of folder. For time execution purpose. If you want to rerun the whole extraction, please change this line:

    for filename in tqdm(folder_filenames[:20]):

to

    for filename in tqdm(folder_filenames):


In [None]:
train_metadata = extract_meta_information("train")
test_metadata = extract_meta_information("test")

There is a quicker way to extract metada, this way require using multiprocessing. I have 4 cores CPU. I'll create 4 Batches and perform the extraction in parallel on each core. As a result, I'll get 4 dataframes that I will simply have to concatenate.

In [None]:
train_filenames = os.listdir(os.path.join(PATH, "train"))
BATCH_SIZES = list(map(lambda x:int(x/100), [0, 4000, 8000, 12000, 15000]))

one_obs_train = infer_one_observation(os.path.join(PATH, 'train', train_filenames[0]))

def extract_train_meta(BATCH):
    
    print("----- Train metadata extraction starting -----")
    
    index_loop = 0
    metadata = pd.DataFrame(columns = one_obs_train.keys())
    
    for filename in train_filenames[BATCH_SIZES[BATCH]:BATCH_SIZES[BATCH+1]]:
        one_obs = infer_one_observation(os.path.join(PATH, 'train', filename))
        metadata = metadata.append(one_obs_train, ignore_index=True)

        if index_loop%10==0:
            print(f"{index_loop} train DICOM metadata successfully extracted")
        index_loop+=1
    
    metadata.columns = metadata.columns.map(mapper_dict)
    metadata.to_csv(f"batch_{BATCH}_dicom_train_metadata.csv", index=False)
    
    print("----- Train metadata extraction fully completed -----")
    
    return metadata

In [None]:
test_filenames = os.listdir(os.path.join(PATH, "test"))
BATCH_SIZES = list(map(lambda x:int(x/10), [0, 750, 1500, 2250, 3000]))

one_obs_test = infer_one_observation(os.path.join(PATH, 'test', test_filenames[0]))

def extract_test_meta(BATCH):
    print("----- Test metadata extraction starting -----")
    index_loop = 0
    metadata = pd.DataFrame(columns = one_obs_test.keys())
    
    for filename in test_filenames[BATCH_SIZES[BATCH]:BATCH_SIZES[BATCH+1]]:
        one_obs = infer_one_observation(os.path.join(PATH, 'test', filename))
        metadata = metadata.append(one_obs_test, ignore_index=True)

        if index_loop%10==0:
            print(f"{index_loop} test DICOM metadata successfully extracted")
        index_loop+=1
    
    metadata.columns = metadata.columns.map(mapper_dict)
    metadata.to_csv(f"batch_{BATCH}_dicom_test_metadata.csv", index=False)
    
    print("----- Test metadata extraction fully completed -----")
    
    return metadata

In [None]:
pool = mp.Pool(mp.cpu_count())
pool.map(extract_train_meta, [i for i in range(4)])
pool.map(extract_test_meta, [i for i in range(4)])
pool.close()

Same as before, this function only select small batch sizes for time execution purpose. If you want to rerun the whole extraction, please change these lines:

    BATCH_SIZES = list(map(lambda x:int(x/100), [0, 4000, 8000, 12000, 15000]))
    
and
    
    if index_loop%10==0:

to

    BATCH_SIZES = [0, 4000, 8000, 12000, 15000]

and

    if index_loop%1000==0:
    
for train. And for the test, change these following lines:

    BATCH_SIZES = list(map(lambda x:int(x/10), [0, 750, 1500, 2250, 3000]))

and

    if index_loop%10==0:

to

    BATCH_SIZES = [0, 750, 1500, 2250, 3000]

and

    if index_loop%100==0:

--------

**<font size="2"><a href="#summary">Back to summary</a></font>**

# <div id="chap4">4. Preprocessing meta information

Let's consider we extracted all metadata, you can find the data here: https://www.kaggle.com/bryanb/vinbigdata-chestxray-metadata

I chose to keep only few information: sex, age, height and weight

In [None]:
# Load full metadata set
train_metadata = pd.read_csv("../input/vinbigdata-chestxray-metadata/train_dicom_metadata.csv")
test_metadata = pd.read_csv("../input/vinbigdata-chestxray-metadata/test_dicom_metadata.csv")

In [None]:
# Keep some columns only
train_metadata_filtered = train_metadata[["Patient's Sex", "Patient's Age", "Patient's Size", "Patient's Weight"]]
test_metadata_filtered = test_metadata[["Patient's Sex", "Patient's Age", "Patient's Size", "Patient's Weight"]]

In [None]:
print(train_metadata_filtered["Patient's Size"].value_counts())
print(train_metadata_filtered["Patient's Weight"].value_counts())

print(test_metadata_filtered["Patient's Size"].value_counts())
print(test_metadata_filtered["Patient's Weight"].value_counts())

train_metadata_filtered = train_metadata_filtered.drop(["Patient's Size", "Patient's Weight"], axis=1)
test_metadata_filtered = test_metadata_filtered.drop(["Patient's Size", "Patient's Weight"], axis=1)

In [None]:
def get_first_el(row):
    resu = 'NA'
    if len(str(row))>5:
        resu = re.search(r"(?<=\[)(.*?)(?=\,)", row).group()
    return resu

In [None]:
train_metadata_filtered["Patient's Sex"] = train_metadata_filtered["Patient's Sex"].fillna("NA")
train_metadata_filtered.loc[train_metadata_filtered["Patient's Sex"]=="O"] = np.nan

train_metadata_filtered["Patient's Age"] = train_metadata_filtered["Patient's Age"].fillna("0")
train_metadata_filtered["Patient's Age"] = train_metadata_filtered["Patient's Age"].apply(lambda x:re.search(r"\d*", str(x)).group())
train_metadata_filtered.loc[train_metadata_filtered["Patient's Age"]== '']= np.nan
train_metadata_filtered["Patient's Age"] = train_metadata_filtered["Patient's Age"].astype(float)

In [None]:
test_metadata_filtered["Patient's Sex"] = test_metadata_filtered["Patient's Sex"].fillna("NA")
test_metadata_filtered.loc[test_metadata_filtered["Patient's Sex"]=="O"] = np.nan

test_metadata_filtered["Patient's Age"] = test_metadata_filtered["Patient's Age"].fillna("0")
test_metadata_filtered["Patient's Age"] = test_metadata_filtered["Patient's Age"].apply(lambda x:re.search(r"\d*", str(x)).group())
test_metadata_filtered.loc[test_metadata_filtered["Patient's Age"]== '']= np.nan
test_metadata_filtered["Patient's Age"] = test_metadata_filtered["Patient's Age"].astype(float)

In [None]:
train_metadata_filtered.to_csv("train_dicom_metadata_filtered.csv", index=False)
test_metadata_filtered.to_csv("test_dicom_metadata_filtered.csv", index=False)

In [None]:
train_metadata_age = train_metadata_filtered.loc[(train_metadata_filtered["Patient's Age"] > 0) & 
                                                 (train_metadata_filtered["Patient's Age"] < 100), :]

fig = px.histogram(train_metadata_age, x="Patient's Age",
                   marginal="box",
                   hover_data=train_metadata_age.columns)

fig.update_layout(
    title="Age distribution (train)")

fig.show()

del(train_metadata_age)

In [None]:
train_metadata_counts = list(train_metadata_filtered.loc[train_metadata_filtered["Patient's Sex"] != "NA", "Patient's Sex"].value_counts())
train_metadata_labels = list(train_metadata_filtered.loc[train_metadata_filtered["Patient's Sex"] != "NA", "Patient's Sex"].value_counts().index)

fig = go.Figure(data=[go.Pie(labels=train_metadata_labels, 
                             values=train_metadata_counts, 
                             hole=.3,
                             title_text="Sex distribution (train)")])
fig.show()

del(train_metadata_counts, train_metadata_labels)

In [None]:
data = train_metadata_filtered.loc[(train_metadata_filtered["Patient's Sex"] != "NA") &
                                   (train_metadata_filtered["Patient's Age"] > 0) &
                                   (train_metadata_filtered["Patient's Age"] < 100), :]

fig = px.histogram(data, 
                   x="Patient's Age", 
                   color="Patient's Sex", 
                   marginal="box",
                   hover_data=data.columns,
                   histnorm = "probability")

fig.update_layout(
    title="Age distribution by sex (train)")

fig.show()

del(data)

--------

**<font size="2"><a href="#summary">Back to summary</a></font>**

# <div id="chap5">5. Fusion of boxings

In [None]:
train_dataframe.head()

In [None]:
def fusing_boxes(dataframe, class_id):
    # filter on class_id
    filtered_dataframe = dataframe.loc[dataframe.class_id == class_id,
                                       ['image_id','x_min','y_min','x_max','y_max']]
    # aggregate on image_id to average radiologists's estimations
    return filtered_dataframe.groupby(['image_id']).mean()

In [None]:
# Plot deduplicated class counts

indexes = train_dataframe.class_id.unique()
counts = [fusing_boxes(train_dataframe, class_id).shape[0] for class_id in range(14)]

sorted_dict = dict(zip(indexes, counts))
sorted_dict = {k: v for k, v in sorted(sorted_dict.items(), key=lambda item: item[1], reverse = True)}

x = list(sorted_dict.keys())
y = list(sorted_dict.values())

plot_distribution_classes(x, y, 
                          title="Deduplicated distribution of radiographic observations")

--------

**<font size="2"><a href="#summary">Back to summary</a></font>**

# <div id="chap6">6. Presentation of each radiographic observations

In [None]:
def get_rectangle_parameter(dataframe, index):
    
    "Adapt coordinates of bounding box for patch.Rectangle function"
    
    x_min = dataframe.loc[index, 'x_min']
    y_min = dataframe.loc[index, 'y_min']
    x_max = dataframe.loc[index, 'x_max']
    y_max = dataframe.loc[index, 'y_max']
    
    anchor_point = (x_min, y_min)
    height = y_max - y_min
    width = x_max - x_min
    
    return anchor_point, height, width

In [None]:
def select_9_from_each(dataframe):
    
    "For each class, returns 9 indexes and image paths"
    
    # Initialize dictionaries
    class_id_index_examples, class_id_image_examples = {}, {}
    image_ids_train_dataframe = list(train_dataframe.image_id)
    
    # Loop over different classes
    for class_id in range(14):
        fusing_boxes_dataframe = fusing_boxes(dataframe, class_id).sample(n=9)
        # image_id
        fusing_box_indexes = fusing_boxes_dataframe.index
        # Infer indexes
        class_id_index_examples[str(class_id)] = [image_ids_train_dataframe.index(fusing_box_indexes[cid]) for cid in range(9)]
        # Infer image paths
        class_id_image_examples[str(class_id)] = fusing_box_indexes
        
    return class_id_index_examples, class_id_image_examples

class_id_index_examples, class_id_image_examples = select_9_from_each(train_dataframe)

In [None]:
def display_images(class_id, graph_indexes = np.arange(9)):
    
    # Get files
    files_index = class_id_index_examples[str(class_id)]
    files_list = class_id_image_examples[str(class_id)]
    
    # define subplot
    fig, axs = plt.subplots(3,3, figsize=(12,12))
    for graph_index in graph_indexes:
        
        full_filename = files_list[graph_index]+'.dicom'
        ds = dcmread(os.path.join(PATH, 
                                  'train',
                                  full_filename))
        

#         axs[graph_index%3, (graph_index)//3].set_title('Label: %s \n'%class_id,
#                   fontsize=18)
        axs[graph_index%3, (graph_index)//3].imshow(ds.pixel_array, cmap=plt.get_cmap('gray'))
                  
        if str(class_id) != '14':
            
            # Add rectangle
            anchor_point, height, width = get_rectangle_parameter(train_dataframe, 
                                                                  files_index[graph_index])
            rect = Rectangle(anchor_point, 
                                     height, 
                                     width, 
                                     edgecolor='r', 
                                     facecolor="none")
            axs[graph_index%3, (graph_index)//3].add_patch(rect)
                     
    # the bottom of the subplots of the figure
    plt.subplots_adjust(bottom = 0.001)
    plt.subplots_adjust(top = 0.99)
    
    # show the figure
    plt.show()

I am not a doctor and cannot talk about each of these class fluently, hence I copy-pasted some accurate definitions from trusted sources. The main goal of this section is to share clear explanations regarding each class with plotted images to illustrate.

## Aortic enlargement

An aortic aneurysm is an enlargement (dilatation) of the aorta to greater than 1.5 times normal size. They usually cause no symptoms except when ruptured. Occasionally, there may be abdominal, back, or leg pain.

They are most commonly located in the abdominal aorta, but can also be located in the thoracic aorta. Aortic aneurysms cause weakness in the wall of the aorta and increase the risk of aortic rupture. When rupture occurs, massive internal bleeding results and, unless treated immediately, shock and death can occur.

In [None]:
display_images('0')

## Atelectasis

Atelectasis (at-uh-LEK-tuh-sis) is a complete or partial collapse of the entire lung or area (lobe) of the lung. It occurs when the tiny air sacs (alveoli) within the lung become deflated or possibly filled with alveolar fluid.

Atelectasis is one of the most common breathing (respiratory) complications after surgery. It's also a possible complication of other respiratory problems, including cystic fibrosis, lung tumors, chest injuries, fluid in the lung and respiratory weakness. You may develop atelectasis if you breathe in a foreign object.

Atelectasis can make breathing difficult, particularly if you already have lung disease. Treatment depends on the cause and severity of the collapse.

In [None]:
display_images('1')

## Calcification

Calcification is the accumulation of calcium salts in a body tissue. It normally occurs in the formation of bone, but calcium can be deposited abnormally in soft tissue, causing it to harden. Calcifications may be classified on whether there is mineral balance or not, and the location of the calcification.

In [None]:
display_images('2')

## Cardiomegaly

Cardiomegaly is an enlarged heart. It is not a disease, but a sign of another condition. Less severe forms of cardiomegaly are referred to as mild cardiomegaly. As mild cardiomegaly does not always cause symptoms, many people with a slightly enlarged heart are unaware of the problem.

In [None]:
display_images('3')

## Consolidation

A pulmonary consolidation is a region of normally compressible lung tissue that has filled with liquid instead of air. The condition is marked by induration (swelling or hardening of normally soft tissue) of a normally aerated lung. It is considered a radiologic sign.

In [None]:
display_images('4')

## ILD

Interstitial lung disease (ILD) is a group of many lung conditions. All interstitial lung diseases affect the interstitium, a part of your lungs. The interstitium is a lace-like network of tissue that goes throughout both lungs. It supports your lungs' tiny air sacs, called alveoli.

In [None]:
display_images('5')

## Infiltration

As part of a disease process, infiltration is sometimes used to define the invasion of cancer cells into the underlying matrix or the blood vessels. Similarly, the term may describe the deposition of amyloid protein. During leukocyte extravasation, white blood cells move in response to cytokines from within the blood, into the diseased or infected tissues, usually in the same direction as a chemical gradient,[1] in a process called chemotaxis. The presence of lymphocytes in tissue in greater than normal numbers is likewise called infiltration.

As part of medical intervention, local anaesthetics may be injected at more than one point so as to infiltrate an area prior to a surgical procedure. However, the term may also apply to unintended iatrogenic leakage of fluids from phlebotomy or intravenous drug delivery procedures, a process also known as extravasation or "tissuing".

In [None]:
display_images('6')

## Lung Opacity

Pulmonary opacification represents the result of a decrease in the ratio of gas to soft tissue (blood, lung parenchyma and stroma) in the lung. When reviewing an area of increased attenuation (opacification) on a chest radiograph or CT it is vital to determine where the opacification is. The patterns can broadly be divided into airspace opacification, lines and dots.

In [None]:
display_images('7')

## Nodule/Mass

A lung nodule or pulmonary nodule is a relatively small focal density in the lung. A solitary pulmonary nodule (SPN) or coin lesion, is a mass in the lung smaller than 3 centimeters in diameter. There may also be multiple nodules.

One or more lung nodules can be an incidental finding found in up to 0.2% of chest X-rays and around 1% of CT scans.

The nodule most commonly represents a benign tumor such as a granuloma or hamartoma, but in around 20% of cases it represents a malignant cancer, especially in older adults and smokers. Conversely, 10 to 20% of patients with lung cancer are diagnosed in this way. If the patient has a history of smoking or the nodule is growing, the possibility of cancer may need to be excluded through further radiological studies and interventions, possibly including surgical resection. The prognosis depends on the underlying condition.

In [None]:
display_images('8')

## Other lesion

In [None]:
display_images('9')

## Pleural effusion

Pleural effusion, sometimes referred to as “water on the lungs,” is the build-up of excess fluid between the layers of the pleura outside the lungs. The pleura are thin membranes that line the lungs and the inside of the chest cavity and act to lubricate and facilitate breathing. Normally, a small amount of fluid is present in the pleura.

In [None]:
display_images('10')

## Pleural thickening

What Is Pleural Thickening? Pleural thickening develops when scar tissue thickens the delicate membrane lining the lungs (the pleura). Pleural thickening can develop following asbestos exposure or other conditions, such as infection. It may be a symptom of a more severe diagnosis such as malignant pleural mesothelioma.

In [None]:
display_images('11')

## Pneumothorax

A pneumothorax (noo-moe-THOR-aks) is a collapsed lung. A pneumothorax occurs when air leaks into the space between your lung and chest wall. This air pushes on the outside of your lung and makes it collapse. Pneumothorax can be a complete lung collapse or a collapse of only a portion of the lung.

In [None]:
display_images('12')

## Pulmonary fibrosis

Pulmonary fibrosis is a lung disease that occurs when lung tissue becomes damaged and scarred. This thickened, stiff tissue makes it more difficult for your lungs to work properly. As pulmonary fibrosis worsens, you become progressively more short of breath.

The scarring associated with pulmonary fibrosis can be caused by a multitude of factors. But in most cases, doctors can't pinpoint what's causing the problem. When a cause can't be found, the condition is termed idiopathic pulmonary fibrosis.

The lung damage caused by pulmonary fibrosis can't be repaired, but medications and therapies can sometimes help ease symptoms and improve quality of life. For some people, a lung transplant might be appropriate.

In [None]:
display_images('13')

--------

**<font size="2"><a href="#summary">Back to summary</a></font>**

# References

#### DICOM files
* <a href="https://pydicom.github.io/pydicom/stable/auto_examples/input_output/plot_read_dicom.html">Deal with .dicom files</a>
* <a href="https://dicom.innolitics.com/ciods/rt-plan/patient-study/00101020">Matching DICOM metadata</a>

#### Abnormality classes
* <a href="https://en.wikipedia.org/wiki/Aortic_aneurysm">Aortic aneurysm - Wikipedia</a>
*  <a href="https://www.mayoclinic.org/diseases-conditions/atelectasis/symptoms-causes/syc-20369684#:~:text=Atelectasis%20(at%2Duh%2DLEK,(respiratory)%20complications%20after%20surgery">Atelectasis - Mayoclinic</a>
*  <a href="https://en.wikipedia.org/wiki/Calcification#:~:text=Calcification%20is%20the%20accumulation%20of,the%20location%20of%20the%20calcification">Calcification - Wikipedia</a>
*  <a href="https://www.medicalnewstoday.com/articles/320591#:~:text=Cardiomegaly%20is%20an%20enlarged%20heart,are%20unaware%20of%20the%20problem">Cardiomegaly - Medicalnewstoday</a>
*  <a href="https://en.wikipedia.org/wiki/Pulmonary_consolidation#:~:text=A%20pulmonary%20consolidation%20is%20a,is%20considered%20a%20radiologic%20sign">Pulmonary consolidation - Wikipedia</a>
*  <a href="https://www.webmd.com/lung/interstitial-lung-disease#:~:text=Interstitial%20lung%20disease%20(ILD)%20is,tiny%20air%20sacs%2C%20called%20alveoli">ILD - Webmd</a>
*  <a href="https://en.wikipedia.org/wiki/Infiltration_(medical)">Infiltration - Wikipedia</a>
*  <a href="https://radiopaedia.org/articles/pulmonary-opacification#:~:text=Pulmonary%20opacification%20represents%20the%20result,determine%20where%20the%20opacification%20is">Lung Opacity - Radiopaedia</a>
*  <a href="https://en.wikipedia.org/wiki/Lung_nodule">Nodule - Wikipedia</a>
*  <a href="https://my.clevelandclinic.org/health/diseases/17373-pleural-effusion-causes-signs--treatment#:~:text=Pleural%20effusion%2C%20sometimes%20referred%20to,to%20lubricate%20and%20facilitate%20breathing">Pleural effusion - Clevelandclinic</a>
*  <a href="https://www.mesothelioma.com/asbestos-cancer/pleural-thickening/#:~:text=Is%20Pleural%20Thickening%3F-,What%20Is%20Pleural%20Thickening%3F,such%20as%20malignant%20pleural%20mesothelioma">Pleural thickening - Mesothelioma</a>
*  <a href="https://www.mayoclinic.org/diseases-conditions/pneumothorax/symptoms-causes/syc-20350367#:~:text=A%20pneumothorax%20(noo%2Dmoe%2D,a%20portion%20of%20the%20lung">Pneumothorax - Mayoclinic</a>
*  <a href="https://www.mayoclinic.org/diseases-conditions/pulmonary-fibrosis/symptoms-causes/syc-20353690#:~:text=Pulmonary%20fibrosis%20is%20a%20lung,progressively%20more%20short%20of%20breath">Pulmonary Fibrosis - Mayoclinic</a>

<hr>
<div align='justify'><font color="#353B47" size="4">Thank you for taking the time to read this notebook. I hope that I was able to answer your questions or your curiosity and that it was quite understandable. <u>any constructive comments are welcome</u>. They help me progress and motivate me to share better quality content. I am above all a passionate person who tries to advance my knowledge but also that of others. If you liked it, feel free to <u>upvote and share my work.</u> </font></div>
<br>
<div align='center'><font color="#353B47" size="3">Thank you and may passion guide you.</font></div>