# Competition
In this competition, we have to simulate a ventilator connected to a sedated patient's lung. The best submissions will take lung attributes compliance and resistance into account.

In this competition, participants are given numerous time series of breaths and will learn to predict the airway pressure in the respiratory circuit during the breath, given the time series of control inputs.

Each time series represents an approximately 3-second breath. The files are organized such that each row is a time step in a breath and gives the two control signals, the resulting airway pressure, and relevant attributes of the lung.

# Evaluation Metric
The competition will be scored as the __mean absolute error__ between the predicted and actual pressures during the inspiratory phase of each breath. The expiratory phase is not scored

# Background
There are two types of mechanical ventilators:

- <font color = 'green'> Positive-pressure Ventilation</font>: 
    - Pushes the air into the lungs.
    - Developed in early 1950s to treat polio patients
    - They may be invasive or noninvasive.
    - __Invasive Ventilation__:
        - *Endotracheal intubation*: the tube is inserted into the patient’s airway (trachea) through the mouth or nose
        - *Tracheostomy*: the tube is inserted through a hole made into the airway.
    - __Noninvasive Ventilation__:
        - *Continuous positive airway pressure (CPAP)*: delivers constant and steady air pressure.
        - *Autotitrating (adjustable) positive airway pressure (APAP)*: changes air pressure according to the breathing pattern.
        - *Bilevel positive airway pressure (BiPAP)*: delivers air with different pressures for inhalation and exhalation.
        
- __Negative-pressure ventilation__: 
    - Sucks the air into the lungs by making the chest expand and contract.
    - Early ventilators were negative-pressure ventilators
    - They are very little in use now


### <font color = 'green'>Positive-pressure Ventilation</font>
Currently positive pressure ventilation is the common form of mechanical ventilation in hospitals. The positive-pressure ventilators push the air into the patient’s airway. The ventilator continually blows and stops in regular preset cycles enabling the lungs to receive oxygen and expel carbon dioxide. Positive-pressure ventilators may be

- __Volume-controlled__: delivers a preset volume of air into the patient’s trachea even if it entails high airway pressure. When the flow is stopped the chest recoils and expels the air out.
- __Pressure-controlled__: delivers air till the airway pressure limit is reached and the valve opens to expel air. The volume of air delivered may vary depending on the airway resistance and lung capacity.
- __Dual control__: these combine the advantages of volume control and pressure control and deliver airflow based on the requirement and response of the patient.

[Ref](https://www.medicinenet.com/different_types_of_mechanical_ventilation/article.htm)

![Ventilator](https://raw.githubusercontent.com/google/deluca-lung/main/assets/2020-10-02%20Ventilator%20diagram.svg)

In [None]:
%matplotlib inline

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
from pathlib import Path

from tqdm import tqdm
import gc

import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display

from plotly.offline import iplot

import cufflinks as cf
cf.go_offline()
cf.set_config_file(offline = False, world_readable = True)

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

plt.rcParams["figure.figsize"] = (12, 10)
plt.rcParams['axes.titlesize'] = 12

   
from time import time, strftime, gmtime

print(os.listdir('../input/ventilator-pressure-prediction/'))

start = time()
print(start)

import datetime
print(str(datetime.datetime.now()))

import warnings
warnings.simplefilter('ignore')

In [None]:
base_dir = '../input/ventilator-pressure-prediction/'

In [None]:
train = pd.read_csv(base_dir + 'train.csv')
print(train.shape)
train.head()

In [None]:
test = pd.read_csv(base_dir + 'test.csv')
print(test.shape)
test.head()

In [None]:
sub = pd.read_csv(base_dir + 'sample_submission.csv')
print(sub.shape)
sub.head()

In [None]:
train.describe().T

In [None]:
test.describe().T

In [None]:
train.info()

### Check data has any missing values

In [None]:
train.isna().sum(), test.isna().sum()

In [None]:
train.nunique(), test.nunique()

In [None]:
train.nunique().iplot(kind = 'bar', 
                        xTitle = 'Features', 
                        yTitle = 'Num of Unique Values', 
                        title = f'<b> Number of Unique Values in Features Train Data </b>', 
                        color = 'purple')

In [None]:
test.nunique().iplot(kind = 'bar', 
                        xTitle = 'Features', 
                        yTitle = 'Num of Unique Values', 
                        title = f'<b> Number of Unique Values in Features in Test Data</b>', 
                        color = 'blue')

In [None]:
fig, ax = plt.subplots(1, 2, figsize = (16, 10))
ax[0].set_title('Target: Pressure Distribution')
sns.distplot(train['pressure'], bins = 150, color = 'green', ax = ax[0])
ax[1].set_title('Target: Log1p - Pressure Distribution')
sns.distplot(np.log1p(train['pressure']), bins = 150, color = 'green', ax = ax[1])
sns.despine(trim = True, left = True)

In [None]:
print(f"There are {train['breath_id'].nunique()} unique breath_ids in train")

In [None]:
def plot_breath_id(b_id: int):
    temp = train[train['breath_id'] == b_id]
    temp.nunique().iplot(kind = 'bar', 
                        xTitle = 'Features', 
                        yTitle = 'Num of Unique Values', 
                        title = f'<b> Number of Unique Values in Features for breath_id {b_id}</b>', 
                        color = 'red')
    plt.figure(figsize = (16, 4))
    plt.plot(temp['time_step'], temp['u_in'], label = 'u_in', color = 'green')
    plt.plot(temp['time_step'], temp['pressure'], label = 'pressure', color = 'red')
    plt.plot(temp['time_step'], temp['u_out'], label = 'u_out', color = 'yellow')
    plt.legend()
    plt.show()
    plt.title(f'Pressure Distribution for breath_id {b_id}', fontsize = 16)
    sns.kdeplot(temp['pressure'], shade = True)

In [None]:
b_id = np.random.choice(train['breath_id'], 1)[0]
plot_breath_id(b_id)

In [None]:
b_id = np.random.choice(train['breath_id'], 1)[0]
plot_breath_id(b_id)

Let's plot the target along with shift in u_in feature

In [None]:
temp = train.copy()
temp['shift_1'] = temp['u_in'].shift(1).fillna(0)
temp['shift_2'] = temp['u_in'].shift(2).fillna(0)

In [None]:
temp = temp[temp['breath_id'] == 25]
plt.figure(figsize = (18, 8))
plt.plot(temp['time_step'], temp['shift_1'], label = 'uin_shift_1', color = 'green')
plt.plot(temp['time_step'], temp['pressure'], label = 'pressure', color = 'orange')
plt.legend()
plt.title('Pressure and u_in_shifted 1')
plt.show()

In [None]:
plt.figure(figsize = (18, 8))
plt.plot(temp['time_step'], temp['shift_2'], label = 'uin_shift_2', color = 'blue')
plt.plot(temp['time_step'], temp['pressure'], label = 'pressure', color = 'orange')
plt.legend()
plt.title('Pressure and u_in_shifted 2')
plt.show()

- Shift 2 matches the pressure

In [None]:
corr = train.corr()
plt.subplots(figsize = (12, 8))
sns.heatmap(corr, vmax = 0.9, cmap = "Blues", square = True);

In [None]:
#Simple train/test split
Xtrain, Xvalid, ytrain, yvalid = train_test_split(train.drop(['id', 'breath_id', 'pressure'], axis = 1), train['pressure'], 
                                                  test_size = 0.2, random_state = 42)
print(Xtrain.shape, ytrain.shape, Xvalid.shape, yvalid.shape)

# Baseline Model

Thanks to @titericz for this Rapids GPU notebook

In [None]:
import xgboost as xgb

In [None]:
xg_params = {
        "subsample": 0.60,
        "colsample_bytree": 0.40,
        "max_depth": 6,
        "learning_rate": 0.02,
        "objective": "reg:squarederror",
        'disable_default_eval_metric': 1, 
        'metrics': 'mae',
        "nthread": -1,
        "tree_method": "gpu_hist",
        "gpu_id": 0,
        "max_bin": 128, 
        'min_child_weight': 2,
        'reg_lambda': 0.001,
        'reg_alpha': 0.01, 
        'seed' : 2021,
    }

In [None]:
def evaluate_error(preds, xg_train):
    labels = xg_train.get_label()
    err = mean_absolute_error(labels, preds)
    return 'mae', err

In [None]:
xg_train = xgb.DMatrix(Xtrain, ytrain)
xg_valid = xgb.DMatrix(Xvalid, yvalid)

model = xgb.train(xg_params, xg_train, 10000,
                [(xg_valid, 'valid')],
                verbose_eval = 250,
                early_stopping_rounds = 50,
                feval = evaluate_error
            )

In [None]:
fig, ax = plt.subplots(figsize = (8,12))
xgb.plot_importance(model, height = 0.8, ax = ax)
plt.show()

In [None]:
xg_test = xgb.DMatrix(test.drop(['id', 'breath_id'], axis = 1))
test_preds = model.predict(xg_test)
test_preds[:10]

In [None]:
plt.title(f'Pressure Distribution of Prediction', fontsize = 16)
sns.kdeplot(test_preds, shade = True, color = 'green');

In [None]:
sub['pressure'] = test_preds
sub.to_csv('./submission.csv', index = False)
sub.head()

In [None]:
finish = time()
print(strftime("%H:%M:%S", gmtime(finish - start)))