<h1><center>Ventilator: InDepth EDA + Understanding + Model + W&B</center></h1>
<h2><center>One Stop for all your needs!</center></h2>
                                                      
<center><img src = "https://cdn.dribbble.com/users/1083804/screenshots/5841972/google_brain_conceptartboard_1.png" width = "750" height = "500"/></center>                                                                                               

<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Contents</center></h2>

1. [Competition Overview](#competition-overview)  
2. [Ventilator Understanding](#ventilator-understanding)
3. [Libraries](#libraries)  
4. [Weights and Biases](#weights-and-biases)
4. [Load Datasets](#load-datasets)  
5. [Tabular Exploration](#tabular-exploration)  
6. [Dataset Distribution](#dataset-distribution) 
7. [Feature Correlation](#feature-correlation)
8. [Individual Breath Analysis](#individual-breath-analysis)
9. [Transformer Model Understanding](#transformer-model-understanding)  
10. [Transformer Model Implementation](#transformer-model-implementation)
11. [References](#references)

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='background:maroon; border:0; color:white' role="tab" aria-controls="home"><center>If you find this notebook useful, do give me an upvote, it helps to keep up my motivation. This notebook will be updated frequently so keep checking for furthur developments.</center></h3>

<a id="competition-overview"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Competition Overview</center></h2>

### Description

In this competition, you’ll simulate a ventilator connected to a sedated patient's lung. The best submissions will take lung attributes compliance and resistance into account.

If successful, you'll help overcome the cost barrier of developing new methods for controlling mechanical ventilators. 

This will pave the way for algorithms that adapt to patients and reduce the burden on clinicians during these novel times and beyond. As a result, ventilator treatments may become more widely available to help patients breathe.

### Evaluation Criteria

The competition will be scored as the mean absolute error between the predicted and actual pressures during the inspiratory phase of each breath. The expiratory phase is not scored.

<a id="ventilator-understanding"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Ventilator Understanding</center></h2>

<center><img src = "https://i.guim.co.uk/img/media/60bba82aaeedb75bb5d1d50e51f5e64283ae491a/0_325_4879_2928/master/4879.jpg?width=445&quality=45&auto=format&fit=max&dpr=2&s=21baed785ce44a9e9ca8687e2edf7b04" width = "750" height = "500"/></center>


### What is a Ventilator?

- A ventilator is a machine that helps people breathe (ventilate). 
- These machines are often used in hospitals as life support for patients who have difficulty breathing or who have lost all ability to breathe on their own. Mechanical ventilation may be either invasive or noninvasive (e.g. using a tight-fitting external mask). 
- Invasive modes require the insertion of internal tubes/devices through endotracheal intubation or tracheostomy.

### When Are Ventilators Used?

Many diseases and other factors can affect lung function and cause difficulty breathing to the point that a person may need a ventilator to stabilize their condition. Examples include:

- Respiratory infections like pneumonia, influenza (flu) and coronavirus (COVID-19)
- Lung diseases like asthma, COPD (chronic obstructive pulmonary disease), cystic fibrosis and lung cancer
- Acute respiratory distress syndrome (ARDS)
- Damage to the nerves and/or muscles involved in breathing (can be caused by upper spinal cord injuries, polio, amyotrophic lateral sclerosis, myasthenia gravis, etc.)
- Brain injury
- Stroke
- Drug overdose

Patients who can’t breathe on their own at all also use ventilators while undergoing treatment for the underlying condition(s) that caused respiratory failure or respiratory arrest. Long-term ventilator care may be needed if a patient cannot regain the ability to breathe independently.

### How does a ventilator work?

A ventilator moves air into and out of the lungs (oxygen in and carbon dioxide out). It can be inserted through the mouth or nose, and down the trachea, or through a surgical opening, via tracheostomy. Depending on the patient’s medical condition, they may be able to use a respiratory mask in lieu of the breathing tubes. This is known as non-invasive mechanical ventilation.

The amount of oxygen the patient receives can be controlled through a monitor connected to the ventilator. If the patient’s condition is particularly delicate, the monitor will be set up to send an alarm to the caregiver indicating an increase in air pressure.

The machine works by bringing oxygen to the lungs and taking carbon dioxide out of the lungs. This allows a patient who has trouble breathing to receive the proper amount of oxygen. It also helps the patient’s body to heal, since it eliminates the extra energy of labored breathing.

<a id="libraries"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Libraries</center></h2>

In [None]:
import gc
import os
import random
import wandb
import math

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 18})
plt.style.use('fivethirtyeight')

import seaborn as sns

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

from termcolor import colored
from IPython import display

from sklearn.model_selection import GroupKFold
from tqdm.notebook import tqdm

import torch
import torch.nn as nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts, CosineAnnealingLR, ReduceLROnPlateau

from transformers import get_linear_schedule_with_warmup, get_cosine_schedule_with_warmup

device = torch.device("cuda")

In [None]:
# W&B for experiment tracking
import wandb
wandb.login()

<a id="weights-and-biases"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Weights and Biases</center></h2>

<center><img src = "https://i.imgur.com/1sm6x8P.png" width = "750" height = "500"/></center>  

**Weights & Biases** is the machine learning platform for developers to build better models faster. 

You can use W&B's lightweight, interoperable tools to 
- quickly track experiments, 
- version and iterate on datasets, 
- evaluate model performance, 
- reproduce models, 
- visualize results and spot regressions, 
- and share findings with colleagues. 

Set up W&B in 5 minutes, then quickly iterate on your machine learning pipeline with the confidence that your datasets and models are tracked and versioned in a reliable system of record.

In this notebook I will use Weights and Biases's amazing features to perform wonderful visualizations and logging seamlessly. 

<a id="global-config"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Global Config</center></h2>

In [None]:
class config:
    DIRECTORY_PATH = "../input/ventilator-pressure-prediction"
    
    INPUT = "/kaggle/input/ventilator-pressure-prediction"
    TRAIN_FILE_PATH = DIRECTORY_PATH + "/train.csv"
    TEST_FILE_PATH = DIRECTORY_PATH + "/test.csv"
    SAMPLE_FILE_PATH = DIRECTORY_PATH + "/sample_submission.csv"
    OUTPUT = "/kaggle/working"
    N_FOLD = 5
    SKIP_FOLDS = [1, 2, 3, 4]  # only fold-0
    SEED = 0
    
    LR = 2.5e-2
    N_EPOCHS = 50
    HIDDEN_SIZE = 64
    BS = 2048
    WEIGHT_DECAY = 1e-5
    
    NOT_WATCH_PARAM = ['INPUT']
    
    EXP_NAME = "google-brain-ventilator"

# wandb config
WANDB_CONFIG = {
     'competition': 'google-brain', 
              '_wandb_kernel': 'neuracort'
    }

In [None]:
def set_seed(seed=config.SEED):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

<a id="load-datasets"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Load Datasets</center></h2>

## About the Dataset

### Files

- **train.csv** - the training set
- **test.csv** - the test set
- **sample_submission.csv** - a sample submission file in the correct format

### Columns
- `id` - globally-unique time step identifier across an entire file
- `breath_id` - globally-unique time step for breaths
- `R` - lung attribute indicating how restricted the airway is (in cmH2O/L/S). Physically, this is the change in pressure per change in flow (air volume per time). Intuitively, one can imagine blowing up a balloon through a straw. We can change R by changing the diameter of the straw, with higher R being harder to blow.
- `C` - lung attribute indicating how compliant the lung is (in mL/cmH2O). Physically, this is the change in volume per change in pressure. Intuitively, one can imagine the same balloon example. We can change C by changing the thickness of the balloon’s latex, with higher C having thinner latex and easier to blow.
- `time_step` - the actual time stamp.
- `u_in` - the control input for the inspiratory solenoid valve. Ranges from 0 to 100.
- `u_out` - the control input for the exploratory solenoid valve. Either 0 or 1.
- `pressure` - the airway pressure measured in the respiratory circuit, measured in cmH2O.

In [None]:
train = pd.read_csv(config.TRAIN_FILE_PATH,index_col=0)
test  = pd.read_csv(config.TEST_FILE_PATH, index_col=0)
sample = pd.read_csv(config.SAMPLE_FILE_PATH)

<a id="tabular-exploration"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Tabular Exploration</center></h2>

In [None]:
train.head()

In [None]:
test.head()

In [None]:
train.info()

### Dataset Size

In [None]:
print(f"Training Dataset Shape: {colored(train.shape, 'yellow')}")
print(f"Test Dataset Shape: {colored(test.shape, 'yellow')}")

### Column-wise Unique Values

In [None]:
for col in train.columns:
    print(col + ":" + colored(str(len(train[col].unique())), 'yellow'))

<a id="dataset-distribution"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Dataset Distribution</center></h2>

## Distribution Plot

In [None]:
def plot_distribution(x, title):
    
    """
    Function to obtain the distribution plot of given data.
    
    params: x(string)     : Name of the Column for the Plot.
            title(string) : Title of the Plot
    """
    sns.displot(train, x = x, kind="kde", bw_adjust=2)

    plt.title(title, fontsize = 15)
    plt.show()

In [None]:
plot_list = [("R", "Lung Airway Restricted"),
             ("C", "Lung Compliance"),
             ("u_in", "Control input for the inspiratory solenoid"),
             ("u_out", "Control input for the exploratory solenoid valve"),
             ("pressure", "Airway pressure")
            ]

In [None]:
for column, title in plot_list:
    plot_distribution(x = column, title = title)   

---

## CountPlots

In [None]:
fig, ax = plt.subplots(figsize = (12, 8))
plt.subplot(2, 2, 1)
sns.countplot(x='R', data=train)
plt.title('Counts of R in train');
plt.subplot(2, 2, 2)
sns.countplot(x='R', data=test)
plt.title('Counts of R in test');
plt.subplot(2, 2, 3)
sns.countplot(x='C', data=train)
plt.title('Counts of C in train');
plt.subplot(2, 2, 4)
sns.countplot(x='C', data=test)
plt.title('Counts of C in test');

In [None]:
#for train set
pair_rc = train.groupby(["R", "C"]).size().reset_index(name="Counts")
pair_rc["R"] = pair_rc[["R","C"]].apply(lambda cols: (cols[0],cols[1]),axis=1)
pair_rc.drop("C",axis=1,inplace=True)
pair_rc.rename(columns={'R':'R-C pair'},inplace=True)
fig,ax = plt.subplots(1,2,figsize=(16,4))
sns.barplot(x="R-C pair",y="Counts",data=pair_rc,ax=ax[0]);
ax[0].set_title("Counts of R-C pairs train set");

#for test set
pair_rc = test.groupby(["R", "C"]).size().reset_index(name="Counts")
pair_rc["R"] = pair_rc[["R","C"]].apply(lambda cols: (cols[0],cols[1]),axis=1)
pair_rc.drop("C",axis=1,inplace=True)
pair_rc.rename(columns={'R':'R-C pair'},inplace=True)
sns.barplot(x="R-C pair",y="Counts",data=pair_rc,ax=ax[1]);
ax[1].set_title("Counts of R-C pairs test set");

<a id="feature-correlation"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Feature Correlation</center></h2>

## HeatMap

### Train Dataset

In [None]:
corr = train.corr().abs()
mask = np.triu(np.ones_like(corr, dtype=np.bool))

fig, ax = plt.subplots(figsize=(14, 14))

# plot heatmap
sns.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap='coolwarm',
            cbar_kws={"shrink": .8}, vmin=0, vmax=1)
# yticks
plt.yticks(rotation=0)
plt.show()

---

### Test Dataset

In [None]:
corr = test.corr().abs()
mask = np.triu(np.ones_like(corr, dtype=np.bool))

fig, ax = plt.subplots(figsize=(14, 14))

# plot heatmap
sns.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap='coolwarm',
            cbar_kws={"shrink": .8}, vmin=0, vmax=1)
# yticks
plt.yticks(rotation=0)
plt.show()

<a id="individual-breath-analysis"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Individual Breath Analysis</center></h2>

In [None]:
for i in range(1,5,1):
    one_breath = train[train["breath_id"]==i]

    plt.figure(figsize=(8,6));
    sns.lineplot(x = 'id',y='pressure',data=one_breath[one_breath['u_out']==0],color='green',label='pressure inhale');
    sns.lineplot(x = 'id',y='pressure',data=one_breath[one_breath['u_out']==1],color='orange',label='pressure exhale');
    sns.lineplot(x = 'id',y='u_in',data=one_breath,color='blue',label='valve position')
    plt.title(f"Variation of Pressure and Input valve position during breath {i}");
    plt.legend();

<a id="transformer-model-understanding"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Transformer Model Understanding</center></h2>

The Explanations and Images belong to the original research paper [Attention is All you Need](https://arxiv.org/abs/1706.03762)

## Understanding Transformers

### Introduction

- The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. 
- The best performing models also connect the encoder and decoder through an attention mechanism.
- The Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. 
- Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train
---

### Model Architecture

- Most competitive neural sequence transduction models have an encoder-decoder structure. Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence of continuous representations z = (z1, ..., zn). 

- Given z, the decoder then generates an output sequence (y1, ..., ym) of symbols one element at a time. At each step the model is auto-regressive, consuming the previously generated symbols as additional input when generating the next.

- The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,respectively.

In [None]:
display.Image("../input/transformer-architecture/Transformer.png")

---

### Encoder and Decoder

**Encoder:**   
- The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, positionwise fully connected feed-forward network. 

- A residual connection is employed around each ofthe two sub-layers, followed by layer normalization. That is, the output of each sub-layer is  
`LayerNorm(x + Sublayer(x))`,   
where `Sublayer(x)` is the function implemented by the sub-layer itself. 

- To facilitate these residual connections, all sub-layers in the model, as well as the embedding layers, produce outputs of dimension dmodel = 512.

--- 
**Decoder:**   
- The decoder is also composed of a stack of N = 6 identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. 
- Similar to the encoder, residual connections are employed around each of the sub-layers, followed by layer normalization. 
- The self-attention sub-layer is modified in the decoder stack to prevent positions from attending to subsequent positions. 
- This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position i can depend only on the known outputs at positions less than i.
---

## Attention

An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the
query with the corresponding key.

In [None]:
display.Image("../input/transformer-architecture/Attention.png")

### Applications of Attention in the Model
The Transformer uses multi-head attention in three different ways:  

- In **encoder-decoder attention** layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder. This allows every position in the decoder to attend over all positions in the input sequence. This mimics the typical encoder-decoder attention mechanisms in sequence-to-sequence models.  

- The encoder contains self-attention layers. In a self-attention layer all of the keys, values and queries come from the same place, in this case, the output of the previous layer in the encoder. Each position in the encoder can attend to all positions in the previous layer of the encoder.  

- Similarly, self-attention layers in the decoder allow each position in the decoder to attend to all positions in the decoder up to and including that position. We need to prevent leftward information flow in the decoder to preserve the auto-regressive property. This is implemented inside of scaled dot-product attention by masking out (setting to −∞) all values in the input of the softmax which correspond to illegal connections.

---

<a id="transformer-model-implementation"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>Transformer Model Implementation</center></h2>

In [None]:
# Initialise wandb
wandb.init(project='google-brain', config=WANDB_CONFIG)

For the implementation I have referred to [this](https://www.kaggle.com/takamichitoda/ventilator-train-transformer) notebook by Takamichi Toda.

## PyTorch Dataset Class

In [None]:
class VentilatorDataset(Dataset):
    
    def __init__(self, df):
        self.dfs = [_df for _, _df in df.groupby("breath_id")]
        
    def __len__(self):
        return len(self.dfs)
    
    def __getitem__(self, item):
        df = self.dfs[item]
        
        X = df[['R_cate', 'C_cate', 'u_in', 'u_out']].values
        y = df['pressure'].values
        d = {
            "X": torch.tensor(X).float(),
            "y": torch.tensor(y).float(),
        }
        return d

## Model

In [None]:
class PositionalEncoding(nn.Module):

    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(max_len, 1, d_model)
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)

    def forward(self, x):
        """
        Args:
            x: Tensor, shape [seq_len, batch_size, embedding_dim]
        """
        x = x + self.pe[:x.size(0)]
        return self.dropout(x)
    
class VentilatorModel(nn.Module):
    
    def __init__(self):
        super(VentilatorModel, self).__init__()
        # This embedding method from: https://www.kaggle.com/theoviel/deep-learning-starter-simple-lstm
        self.r_emb = nn.Embedding(3, 2, padding_idx=0)
        self.c_emb = nn.Embedding(3, 2, padding_idx=0)
        self.seq_emb = nn.Sequential(
            nn.Linear(9, config.HIDDEN_SIZE),
            nn.LayerNorm(config.HIDDEN_SIZE),
            nn.ReLU(),
            nn.Dropout(0.2),
        )
        self.pos_encoder = PositionalEncoding(d_model=config.HIDDEN_SIZE, dropout=0.2)
        encoder_layers = nn.TransformerEncoderLayer(d_model=config.HIDDEN_SIZE, nhead=8, dim_feedforward=2048, dropout=0.2, )
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers=2)
        self.head = nn.Linear(config.HIDDEN_SIZE, 1)
        
        # Encoder
        initrange = 0.1
        self.r_emb.weight.data.uniform_(-initrange, initrange)
        self.c_emb.weight.data.uniform_(-initrange, initrange)

    def forward(self, X, y=None):
        bs = X.shape[0]
        r_emb = self.r_emb(X[:,:,0].long()).view(bs, 80, -1)
        c_emb = self.c_emb(X[:,:,1].long()).view(bs, 80, -1)
        seq_x = torch.cat((r_emb, c_emb, X[:, :, 2:]), 2)
        h = self.seq_emb(seq_x)
        h = self.pos_encoder(h)
        h = self.transformer_encoder(h)
        regr = self.head(h)
        
        if y is None:
            loss = None
        else:
            loss = self.loss_fn(regr.squeeze(2), y)
            
        return regr, loss
    
    def loss_fn(self, y_pred, y_true):
        loss = nn.L1Loss()(y_pred, y_true)
        return loss

## Training, Test and Validation Loops

In [None]:
def train_loop(model, optimizer, loader):
    losses, lrs = [], []
    model.train()
    optimizer.zero_grad()
    for d in loader:
        out, loss = model(d['X'].to(device), d['y'].to(device))
        
        losses.append(loss.item())
        step_lr = np.array([param_group["lr"] for param_group in optimizer.param_groups]).mean()
        lrs.append(step_lr)
        
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    return np.array(losses).mean(), np.array(lrs).mean()

def valid_loop(model, loader):
    losses, predicts = [], []
    model.eval()
    for d in loader:
        with torch.no_grad():
            out, loss = model(d['X'].to(device), d['y'].to(device))
        losses.append(loss.item())
        predicts.append(out.cpu())

    return np.array(losses).mean(), torch.vstack(predicts).squeeze(2).numpy().reshape(-1)

def test_loop(model, loader):
    predicts = []
    model.eval()
    for d in loader:
        with torch.no_grad():
            out, _ = model(d['X'].to(device))
        predicts.append(out.cpu())

    return torch.vstack(predicts).squeeze(2).numpy().reshape(-1)

## Lamb Optimizer

In [None]:
from torch.optim.optimizer import Optimizer
class Lamb(Optimizer):
    # Reference code: https://github.com/cybertronai/pytorch-lamb

    def __init__(
        self,
        params,
        lr: float = 1e-3,
        betas = (0.9, 0.999),
        eps: float = 1e-6,
        weight_decay: float = 0,
        clamp_value: float = 10,
        adam: bool = False,
        debias: bool = False,
    ):
        if lr <= 0.0:
            raise ValueError('Invalid learning rate: {}'.format(lr))
        if eps < 0.0:
            raise ValueError('Invalid epsilon value: {}'.format(eps))
        if not 0.0 <= betas[0] < 1.0:
            raise ValueError(
                'Invalid beta parameter at index 0: {}'.format(betas[0])
            )
        if not 0.0 <= betas[1] < 1.0:
            raise ValueError(
                'Invalid beta parameter at index 1: {}'.format(betas[1])
            )
        if weight_decay < 0:
            raise ValueError(
                'Invalid weight_decay value: {}'.format(weight_decay)
            )
        if clamp_value < 0.0:
            raise ValueError('Invalid clamp value: {}'.format(clamp_value))

        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
        self.clamp_value = clamp_value
        self.adam = adam
        self.debias = debias

        super(Lamb, self).__init__(params, defaults)

    def step(self, closure = None):
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data
                if grad.is_sparse:
                    msg = (
                        'Lamb does not support sparse gradients, '
                        'please consider SparseAdam instead'
                    )
                    raise RuntimeError(msg)

                state = self.state[p]

                # State initialization
                if len(state) == 0:
                    state['step'] = 0
                    # Exponential moving average of gradient values
                    state['exp_avg'] = torch.zeros_like(
                        p, memory_format=torch.preserve_format
                    )
                    # Exponential moving average of squared gradient values
                    state['exp_avg_sq'] = torch.zeros_like(
                        p, memory_format=torch.preserve_format
                    )

                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2 = group['betas']

                state['step'] += 1

                # Decay the first and second moment running average coefficient
                # m_t
                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)
                # v_t
                exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)

                # Paper v3 does not use debiasing.
                if self.debias:
                    bias_correction = math.sqrt(1 - beta2 ** state['step'])
                    bias_correction /= 1 - beta1 ** state['step']
                else:
                    bias_correction = 1

                # Apply bias to lr to avoid broadcast.
                step_size = group['lr'] * bias_correction

                weight_norm = torch.norm(p.data).clamp(0, self.clamp_value)

                adam_step = exp_avg / exp_avg_sq.sqrt().add(group['eps'])
                if group['weight_decay'] != 0:
                    adam_step.add_(p.data, alpha=group['weight_decay'])

                adam_norm = torch.norm(adam_step)
                if weight_norm == 0 or adam_norm == 0:
                    trust_ratio = 1
                else:
                    trust_ratio = weight_norm / adam_norm
                state['weight_norm'] = weight_norm
                state['adam_norm'] = adam_norm
                state['trust_ratio'] = trust_ratio
                if self.adam:
                    trust_ratio = 1

                p.data.add_(adam_step, alpha=-step_size * trust_ratio)

        return loss

In [None]:
def main():
    train_df = pd.read_csv(f"{config.INPUT}/train.csv")
    test_df = pd.read_csv(f"{config.INPUT}/test.csv")
    sub_df = pd.read_csv(f"{config.INPUT}/sample_submission.csv")
    oof = np.zeros(len(train_df))
    test_preds_lst = []

    gkf = GroupKFold(n_splits=config.N_FOLD).split(train_df, train_df.pressure, groups=train_df.breath_id)
    for fold, (_, valid_idx) in enumerate(gkf):
        train_df.loc[valid_idx, 'fold'] = fold

    train_df['C_cate'] = train_df['C'].map({10: 0, 20: 1, 50:2})
    train_df['R_cate'] = train_df['R'].map({5: 0, 20: 1, 50:2})
    test_df['C_cate'] = test_df['C'].map({10: 0, 20: 1, 50:2})
    test_df['R_cate'] = test_df['R'].map({5: 0, 20: 1, 50:2})

    test_df['pressure'] = -1
    test_dset = VentilatorDataset(test_df)
    test_loader = DataLoader(test_dset, batch_size=config.BS,
                             pin_memory=True, shuffle=False, drop_last=False, num_workers=os.cpu_count())
    
    for fold in range(config.N_FOLD):
#         if fold in []
        print(f'Fold-{fold}')
        train_dset = VentilatorDataset(train_df.query(f"fold!={fold}"))
        valid_dset = VentilatorDataset(train_df.query(f"fold=={fold}"))

        set_seed()
        train_loader = DataLoader(train_dset, batch_size=config.BS,
                                  pin_memory=True, shuffle=True, drop_last=True, num_workers=os.cpu_count(),
                                  worker_init_fn=lambda x: set_seed())
        valid_loader = DataLoader(valid_dset, batch_size=config.BS,
                                  pin_memory=True, shuffle=False, drop_last=False, num_workers=os.cpu_count())

        model = VentilatorModel()
        model.to(device)

        optimizer = AdamW(model.parameters(), lr=config.LR, weight_decay=config.WEIGHT_DECAY)

        uniqe_exp_name = f"{config.EXP_NAME}_f{fold}"

        wandb_config.fold = fold
        
        for k, v in dict(vars(config)).items():
            if k[:2] == "__" or k in config.NOT_WATCH_PARAM:
                continue
            wandb_config[k] = v
        wandb.watch(model)
        
        os.makedirs(f'{config.OUTPUT}/{config.EXP_NAME}', exist_ok=True)
        model_path = f"{config.OUTPUT}/{config.EXP_NAME}/ventilator_f{fold}_best_model.bin"
        
        valid_best_loss = float('inf')
        for epoch in tqdm(range(config.N_EPOCHS)):

            train_loss, lrs = train_loop(model, optimizer, train_loader)
            valid_loss, valid_predict = valid_loop(model, valid_loader)
            valid_score = np.abs(valid_predict - train_df.query(f"fold=={fold}")['pressure'].values).mean()

            if valid_loss < valid_best_loss:
                valid_best_loss = valid_loss
                torch.save(model.state_dict(), model_path)
                oof[train_df.query(f"fold=={fold}").index.values] = valid_predict

            wandb.log({
                "train_loss": train_loss,
                "valid_loss": valid_loss,
                "valid_best_loss": valid_best_loss,
                "valid_score": valid_score,
                "learning_rate": lrs,
            })
            
            torch.cuda.empty_cache()
            gc.collect()
        
        model.load_state_dict(torch.load(model_path))
        test_preds = test_loop(model, test_loader)
        test_preds_lst.append(test_preds)
        
        sub_df['pressure'] = test_preds
        sub_df.to_csv(f"{config.OUTPUT}/{config.EXP_NAME}/sub_f{fold}.csv", index=None)
        
    train_df['oof'] = oof
    train_df.to_csv(f"{config.OUTPUT}/{config.EXP_NAME}/oof.csv", index=None)
    
    if len(config.SKIP_FOLDS) == 0:
        sub_df['pressure'] = np.stack(test_preds_lst).mean(0)
        sub_df.to_csv(f"{config.OUTPUT}/{config.EXP_NAME}/submission.csv", index=None)
    
        cv_score = train_df.apply(lambda x: abs(x['oof'] - x['pressure']), axis=1).mean()
        print("CV:", cv_score)

In [None]:
if __name__ == "__main__":
    main()

In [None]:
wandb.finish()

<a id="references"></a>
<div class="list-group" id="list-tab" role="tablist">
<h2 class="list-group-item list-group-item-action active" data-toggle="list" style='background:orange; border:0; color:white' role="tab" aria-controls="home"><center>References</center></h2>

- [AgingCare](https://www.agingcare.com/articles/ventilators-can-help-your-elderly-parent-breath-easier-136879.htm)
- [Ventilator Pressure Prediction: EDA, FE and models](https://www.kaggle.com/artgor/ventilator-pressure-prediction-eda-fe-and-models)
- [Simple EDA Beginner](https://www.kaggle.com/zhaodianwen/simple-eda-beginner)
- [ventilator pressure prediction: EDA](https://www.kaggle.com/bibhash123/ventilator-pressure-prediction-eda)

<h1><center>More Plots and Models coming soon!</center></h1>
                                                      
<center><img src = "https://static.wixstatic.com/media/5f8fae_7581e21a24a1483085024f88b0949a9d~mv2.jpg/v1/fill/w_934,h_379,al_c,q_90/5f8fae_7581e21a24a1483085024f88b0949a9d~mv2.jpg" width = "750" height = "500"/></center> 

### Reach Out to me on [LinkedIn](https://www.linkedin.com/in/ishandutta0098)