### This is an interactive notebook. To use the interactive features, please fork it and then run again. 
<div align="center">
<font size="6"> OSIC Pulmonary Fibrosis Progression </font>  
</div>

<div align="center">
<font size="4"> Predict lung function decline </font>  
</div>

&nbsp;

<!-- <font size="2"> -->

<img align="right" src="https://www.osicild.org/uploads/1/2/2/7/122798879/editor/kaggle-v01-clipped.png" data-canonical-src="https://www.osicild.org/uploads/1/2/2/7/122798879/editor/kaggle-v01-clipped.png" width="400" height="300" />

Imagine one day, your breathing became consistently labored and shallow. Months later you were finally diagnosed with pulmonary fibrosis, a disorder with no known cause and no known cure, created by scarring of the lungs. If that happened to you, you would want to know your prognosis. That’s where a troubling disease becomes frightening for the patient: outcomes can range from long-term stability to rapid deterioration, but doctors aren’t easily able to tell where an individual may fall on that spectrum. Your help, and data science, may be able to aid in this prediction, which would dramatically help both patients and clinicians.



[Open Source Imaging Consortium (OSIC)](https://www.osicild.org/) is a **not-for-profit, co-operative effort between academia, industry and philanthropy**. The group enables rapid advances in the fight against **Idiopathic Pulmonary Fibrosis (IPF)**, **fibrosing interstitial lung diseases (ILDs)**, and other respiratory diseases, including emphysematous conditions. Its mission is to bring together radiologists, clinicians and computational scientists from around the world to improve imaging-based treatments.

&nbsp;
<!-- <br/><br/> -->

**Common difficulties in treating lung diseases**
* Even with access to a CT-scan it is difficult to identify lung disease.
* Varied prognosis to different lung diseases create additional difficulties in clinical treatment. 
* Patients become anxious with fibrosis-related symptoms as it is often unclear what is causing problem in the first place. 

**In this competition, you’ll:**
* Predict a patient’s severity of decline in lung function based on a CT scan of their lungs.
* Determine lung function based on output from a spirometer, which measures the volume of air inhaled and exhaled;
* Challenge to use machine learning techniques to make a prediction with the image, metadata, and baseline FVC as input.

**Goals:**
* Patients and their families would better understand their prognosis when they are first diagnosed with this incurable lung disease;
* improved severity detection would positively impact treatment trial design and accelerate the clinical development of novel treatments.

Before diving into the competition, let's have a look at pulmonary fibrosis, its causes, symptoms and complications and other non cs stuffs. Though I am not a medical specialist, I collected bits and pieces from here and there and put together in this notebook. If something is not right, please comment so that all are benifitted from that knowledge.  



<img align="right" src="https://www.mayoclinic.org/-/media/kcms/gbs/patient-consumer/images/2016/08/10/14/57/mcdc7_pulmonaryfibrosis-8col.jpg" data-canonical-src="https://www.mayoclinic.org/-/media/kcms/gbs/patient-consumer/images/2016/08/10/14/57/mcdc7_pulmonaryfibrosis-8col.jpg" width="450" height="450" />

## Pulmonary fibrosis
Pulmonary fibrosis is a lung disease that occurs when lung tissue becomes damaged and scarred. This thickened, stiff tissue makes it more difficult for your lungs to work properly. As pulmonary fibrosis worsens, you become progressively more short of breath.

The scarring associated with pulmonary fibrosis can be caused by a multitude of factors. But in most cases, doctors can't pinpoint what's causing the problem. When a cause can't be found, the condition is termed idiopathic pulmonary fibrosis.

The lung damage caused by pulmonary fibrosis can't be repaired, but medications and therapies can sometimes help ease symptoms and improve quality of life. For some people, a lung transplant might be appropriate.


&nbsp;
<!-- <br/><br/> -->
<img src="https://glassboxmedicine.files.wordpress.com/2020/03/ct-gif.gif?w=616" alt="Drawing" align="right" width=300 height=200/>


## Symptoms of Pulmonary fibrosis

Signs and symptoms of pulmonary fibrosis may include:
* Shortness of breath (dyspnea)
* A dry cough
* Fatigue
* Unexplained weight loss
* Aching muscles and joints
* Widening and rounding of the tips of the fingers or toes (clubbing)

The course of pulmonary fibrosis — and the severity of symptoms — can vary considerably from person to person. Some people become ill very quickly with severe disease. Others have moderate symptoms that worsen more slowly, over months or years.

Some people may experience a rapid worsening of their symptoms (acute exacerbation), such as severe shortness of breath, that may last for several days to weeks. People who have acute exacerbations may be placed on a mechanical ventilator.
[Source](https://www.mayoclinic.org/diseases-conditions/pulmonary-fibrosis/symptoms-causes/syc-20353690#:~:text=Pulmonary%20fibrosis%20is%20a%20lung,progressively%20more%20short%20of%20breath.)


&nbsp;
<!-- <br/><br/> -->
<img src="https://www.lungsandyou.com/static/images/content/possible-ipf-risk-factors.jpg" alt="Drawing" align="right" width=400 height=300/>

## Risk factors in pulmonary fibrosis 

* **Age:** Although pulmonary fibrosis has been diagnosed in children and infants, the disorder is much more likely to affect middle-aged and older adults.
* **Sex:** Idiopathic pulmonary fibrosis is more likely to affect men than women.
* **Smoking:** Far more smokers and former smokers develop pulmonary fibrosis than do people who have never smoked. Pulmonary fibrosis can occur in patients with emphysema.
* **Certain occupations:** You have an increased risk of developing pulmonary fibrosis if you work in mining, farming or construction or if you're exposed to pollutants known to damage your lungs.
* **Cancer treatments:** Having radiation treatments to your chest or using certain chemotherapy drugs can increase your risk of pulmonary fibrosis.
* **Genetic factors:** Some types of pulmonary fibrosis run in families, and genetic factors may be a component.
[source](https://www.lungsandyou.com/facts/ipf-risk-factors)


&nbsp;
<!-- <br/><br/> -->
## Complications

* **High blood pressure in your lungs (pulmonary hypertension):** Unlike systemic high blood pressure, this condition affects only the arteries in your lungs. It begins when the smallest arteries and capillaries are compressed by scar tissue, causing increased resistance to blood flow in your lungs.This in turn raises pressure within the pulmonary arteries and the lower right heart chamber (right ventricle). Some forms of pulmonary hypertension are serious illnesses that become progressively worse and are sometimes fatal.

* **Right-sided heart failure (cor pulmonale):** This serious condition occurs when your heart's lower right chamber (ventricle) has to pump harder than usual to move blood through partially blocked pulmonary arteries.
* **Respiratory failure:** This is often the last stage of chronic lung disease. It occurs when blood oxygen levels fall dangerously low.
* **Lung cancer:** Long-standing pulmonary fibrosis also increases your risk of developing lung cancer.
* **Lung complications:** As pulmonary fibrosis progresses, it may lead to complications such as blood clots in the lungs, a collapsed lung or lung infections.


Now Let's have a look at the data itself. 

In [None]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('AfK9LPNj-Zo', width=800, height=600)

In [None]:
import os
import json
from pathlib import Path
from glob import glob

import matplotlib.pyplot as plt
%matplotlib inline

train_data_dir = '../input/osic-pulmonary-fibrosis-progression/train/'
test_data_dir = '../input/osic-pulmonary-fibrosis-progression/test/'

## There are 176 patients in training set and in the test set, there are only 5 patients. 

In [None]:
patient_ids = os.listdir(train_data_dir)
print('Training Patient NOs:', len(patient_ids))

patient_ids_test = os.listdir(test_data_dir)
print('Test Patient NOs:', len(patient_ids_test))

## Read DICOM Files
### What is a DICOM file?
A DICOM file is an image saved in the Digital Imaging and Communications in Medicine (DICOM) format. It contains an image from a medical scan, such as an ultrasound or MRI. DICOM files may also include identification data for patients so that the image is linked to a specific individual. The DICOM format was developed by the NEMA (National Electrical Manufacturers Association). It was designed for the exchanging and viewing of medical images, such as CT scans, MRIs, and ultrasound images.


In [None]:
train_image_paths_dcom  = glob(train_data_dir + '*/*.dcm')
print(f'Total train images {len(train_image_paths_dcom)}')

In [None]:
import pydicom
from pydicom.data import get_testdata_files
print(__doc__)

A ton of metadata is associated with each of the dicom files. However, in order to get each of the instances we just have to consider the attribute with a . sign. For example. if we want to access the rows and columns of the image. we can just write 
```Python
# For getting the row and columns
RefDs.Rows
RefDs.Columns

##For getting the actual pixel values
RefDs.PixelData
```

In [None]:
print(train_image_paths_dcom[0])
RefDs = pydicom.dcmread(train_image_paths_dcom[0])
print(f'Image size: {RefDs.Rows}x{RefDs.Columns}' )
RefDs

## Interactive Metadata Visualizations
Let's look at the few metadata attached with one of the files. Just select any patient ID from the dropdown list and then from the slider, select the instance. If that instance is present, you can see the metatada of that patient. Feel free to change the range of instance in the calling function to see more diversified instances of any patient.

In [None]:
from ipywidgets import interact, interactive, IntSlider, ToggleButtons
def explore_patients_metadata(patient_id, instance):
    RefDs = pydicom.dcmread('../input/osic-pulmonary-fibrosis-progression/train/' + 
                            patient_id +'/' + 
                            str(instance) + '.dcm')
    pat_name = RefDs.PatientName
    display_name = pat_name.family_name + ", " + pat_name.given_name
    print("Patient's name................:", display_name)
    print("Patient id....................:", RefDs.PatientID)
    print("Scan Instance.................:", RefDs.InstanceNumber)
    print("Modality......................:", RefDs.Modality)
    print("BodyPartExamined..............:", RefDs.BodyPartExamined)  
    print("Image Position    (Patient)...:", RefDs.ImagePositionPatient)
    print("Image Orientation (Patient)...:", RefDs.ImageOrientationPatient)
    print("Pixel Spacing.................:", RefDs.PixelSpacing)
    print('Window Center.................:', RefDs.WindowCenter)
    print('Window Width..................:', RefDs.WindowWidth)
    print('Window Intercept..............:', RefDs.RescaleIntercept)

    
interact(explore_patients_metadata, patient_id= patient_ids, instance = (1,150))


## Observation
By selecting different values of patient IDs, it is evident that, different patients have different parameters set up in their following properties. They are
* Image Position
* Pixel Spacing

In my opinion, preprocessing the images based on the image position and pixel spacing to make them uniform in appearance can be a very good preprocessing step. 

## Interactive Plot Visualizations
Let's have an interactive visualization of the instances given for each of the patients. 

In [None]:
# Define a function to visualize the data
def explore_dicoms(patient_id, instance):
    RefDs = pydicom.dcmread('../input/osic-pulmonary-fibrosis-progression/train/' + 
                            patient_id +
                            '/' + 
                            str(instance) + '.dcm')
    plt.figure(figsize=(10, 5))
    
    plt.imshow(RefDs.pixel_array, cmap='gray');
    plt.title(f'P_ID: {patient_id}\nInstance: {instance}')
    plt.axis('off')
interact(explore_dicoms, patient_id= patient_ids, instance = (1,40))

## Observations
It seems that different patients have different parameters setup in their image capturing device, specially image position and pixel spacing. Therefore these changes should be taken in consideration when preprocessing. 

# I am not an specialist in Pulmonary Fibrosis. If there is any mistake in my info, please do comment so that I can correct them.  