# **What is Pulmonary fibrosis(Introduction)**

* Pulmonary fibrosis is a lung disease that occurs when lung tissue becomes damaged and scarred. This thickened, stiff tissue makes it more difficult for your lungs to work properly.
* As pulmonary fibrosis worsens, you become progressively more short of breath.

![image.png](attachment:image.png)

* The scarring associated with pulmonary fibrosis can be caused by a multitude of factors. But in most cases, doctors can't pinpoint what's causing the problem. When a cause can't be found, the condition is termed idiopathic pulmonary fibrosis.
* The lung damage caused by pulmonary fibrosis can't be repaired, but medications and therapies can sometimes help ease symptoms and improve quality of life. For some people, a lung transplant might be appropriate.

![image.png](attachment:image.png)

* According to the ATS, the incidence of IPF was estimated at 10.7 cases per 100,000 per year for men and 7.4 cases per 100,000 per year for women in a population-based study from the county of Bernalillo, New Mexico. A study from the United Kingdom reported an overall incidence rate of only 4.6 per 100,000 person-years, but estimated that the incidence of IPF increased by 11% annually between 1991 and 2003. This increase was not felt to be attributable to the aging of the population or increased ascertainment of milder cases. A third study from the United States estimated the incidence of IPF to be between 6.8 and 16.3 per 100,000 persons using a large database of healthcare claims in a health plan.

# **Reference**
https://www.physio-pedia.com/Pulmonary_Fibrosis

**Risk factors??**

Factors that make you more susceptible to pulmonary fibrosis include:

* **Age.** Although pulmonary fibrosis has been diagnosed in children and infants, the disorder is much more likely to affect middle-aged and older adults.

* **Sex.** Idiopathic pulmonary fibrosis is more likely to affect men than women.
* **Smoking.** Far more smokers and former smokers develop pulmonary fibrosis than do people who have never smoked. Pulmonary fibrosis can occur in patients with emphysema.

* **Certain occupations.** You have an increased risk of developing pulmonary fibrosis if you work in mining, farming or construction or if you're exposed to pollutants known to damage your lungs.

* **Cancer treatments.** Having radiation treatments to your chest or using certain chemotherapy drugs can increase your risk of pulmonary fibrosis.

* **Genetic factors.** Some types of pulmonary fibrosis run in families, and genetic factors may be a component.

**Complications**

Complications of pulmonary fibrosis may include:

* **High blood pressure in your lungs (pulmonary hypertension).** Unlike systemic high blood pressure, this condition affects only the arteries in your lungs. It begins when the smallest arteries and capillaries are compressed by scar tissue, causing increased resistance to blood flow in your lungs.

This in turn raises pressure within the pulmonary arteries and the lower right heart chamber (right ventricle). Some forms of pulmonary hypertension are serious illnesses that become progressively worse and are sometimes fatal.

* **Right-sided heart failure (cor pulmonale).** This serious condition occurs when your heart's lower right chamber (ventricle) has to pump harder than usual to move blood through partially blocked pulmonary arteries.

* **Respiratory failure.** This is often the last stage of chronic lung disease. It occurs when blood oxygen levels fall dangerously low.
* **Lung cancer.** Long-standing pulmonary fibrosis also increases your risk of developing lung cancer.

* **Lung complications.** As pulmonary fibrosis progresses, it may lead to complications such as blood clots in the lungs, a collapsed lung or lung infections.

# **What's End user Goal ??**

* Pulmonary fibrosis is one of the disease, Its hard to indentify the stage of the infection to the people and for doctors its hard to analyse the stage only few expert doctors could tell the stage.

* If we know the stage of the dieases and then we can take a safety precaution, so that people could servive from this disease isn't it. Sounds great right.......

* Here we are going to building the Pulmonary fibrosis predictive model. Which it will help to hospital as well as doctors to identify the Pulmonary fibrosis disease on early stage.....

* Building the Pulmonary fibrosis predictive model isn't that essay but its possible. **Remember nothing is impossible untill we try.**

* We have 2 class type of data one is Pulmonary fibrosis disease and another one not. and we have x-rays images so we are going to preprocess those image.and extract the insights from the x-ray image.....

* **So Lets do it..If not now then when?????**

# Reading Dataset Folder

* I will import required libraries on the place where it requires... It's good pratice to import required library not at starting at the stage where it required

In [None]:
# Importing Libraries

import os
from os import listdir as ld


list(ld("../input/osic-pulmonary-fibrosis-progression"))

**Data Description**

* train folder - It contains the patients CT scan images in DICOM format, we splitted CT scans into train and test folders.
* test folder - Which it holds test set CT scane images in DICOM format.
* train.csv File - It holds the entire clinical information. We're using those information to build & train the model.
* test.csv File - It holds few clinical information to check. How? good our model predicting the new data.
* Sample_submissin.csv file - It holds test data result.

# Importing Dataset

* Importing the train and test csv files which it will holds the clinical informations......

In [None]:
# Importing libraries

import pandas as pd

train_dataset=pd.read_csv('../input/osic-pulmonary-fibrosis-progression/train.csv')
test_dataset=pd.read_csv('../input/osic-pulmonary-fibrosis-progression/test.csv')

* head() function is used to view the top 5 rows in form of dataframe.
* head function gives a breif overview about what kind of datatype values we have and column attribute names.. Sounds interesting right????

In [None]:
train_dataset.head()

In [None]:
test_dataset.head()

* **Patient :** Column attribute indicates the unique paitient id and this attribute is not required to train model. we can remove.
* Remaining attribute like weeks, FVC, Percent, Age and Smoking Status attributes required to train our model....
* Here we have only 6 input attirbutes so....Its much essiar to understand because all 5 attributes except patient attribute all are important....
* If we have more than 20 input attributes then its diffcult to know!! which attributes have highest correlation with other attributes...So here we dont need to know about Feature selection technique....
* To know feature selection technique please refer to my tatanic kernel.

# **Descriptive Analysis**

**Shape**

* Displaying the shape of train and test dataset...
* shape attribute is used to view the (rows*columns) to know how much data we have.. the more data we have the better data points extraction will be essay.

In [None]:
print(" train_dataset shape is :",train_dataset.shape)
print(" test_dataset shape is :",test_dataset.shape)

* As per above shape we have 1549 rows and 7 column attributes, we have on training dataset
* We have 5 rows and 7 columns attributes, we have on test dataset

***Datatype***

* It will indicate what type of data we have in each attribute.

In [None]:
train_dataset.dtypes

**Null values**

* Here we are going to identify, We have any null values in both training and test set.
* To know null values we are using simple function from pandas as info().

In [None]:
train_dataset.info()

* It might be little confusing isn't it?... here its just showing value as 1549 non-null, if we have more no.of attributes its hard to check each and every attributes.
* So understand better way we have isna() pandas funcation which it will return no.of cells on each attribute.

In [None]:
train_dataset.isna().sum()

* As per above outcome its clear that we dont have any missing values. It's eassy to understand isn't it?...
* I will give an example, suppose if we have missing values in our dataset and how it looks like ?

![image.png](attachment:image.png)

In [None]:
test_dataset.info()

In [None]:
test_dataset.isna().sum()

# **To Be Continued**

****If you like the kernel please upvote, it will always motivates me to do indepth analysis.**** 