# Introduction
Mechanical ventilation is a clinician-intensive procedure that was prominently on display during the early days of the COVID-19 pandemic. Developing new methods for controlling mechanical ventilators is prohibitively expensive, even before reaching clinical trials. High-quality simulators could reduce this barrier.
 Current simulators are trained as an ensemble, where each model simulates a single lung setting. However, lungs and their attributes form a continuous space, so a parametric approach must be explored that would consider the differences in patient lungs.
 Here we simulate a ventilator connected to a sedated patient's lung.

# Loading our Dataset

Competition dataset located in "/kaggle/input"; This path defined by Kaggle to access the competition file. We will list two files from this path as input files.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        path=os.path.join(dirname, filename)
        if 'train' in path:
            __training_path=path
        elif 'test' in path:
            __test_path=path

# Input Dataset

In [2]:
#loaded files
print(f'Training path:{__training_path}\nTest path:{__test_path}')

Training path:/kaggle/input/ventilator-pressure-prediction/train.csv
Test path:/kaggle/input/ventilator-pressure-prediction/test.csv


In [3]:
# Kaggle Environment Prepration
#update kaggle env
import sys
#you may update the environment that allow you to run the whole code
!{sys.executable} -m pip install --upgrade scikit-learn=="0.24.2"

Collecting scikit-learn==0.24.2
  Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
     |████████████████████████████████| 22.3 MB 185 kB/s             
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.23.2
    Uninstalling scikit-learn-0.23.2:
      Successfully uninstalled scikit-learn-0.23.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.5.0 which is incompatible.
hypertools 0.7.0 requires scikit-learn!=0.22,<0.24,>=0.19.1, but you have scikit-learn 0.24.2 which is incompatible.[0m
Successfully installed scikit-learn-0.24.2


In [4]:
#record this information if we have to run the Kernel internally
import sklearn; sklearn.show_versions()


System:
    python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)  [GCC 9.4.0]
executable: /opt/conda/bin/python
   machine: Linux-5.10.68+-x86_64-with-debian-bullseye-sid

Python dependencies:
          pip: 21.3.1
   setuptools: 59.1.1
      sklearn: 0.24.2
        numpy: 1.19.5
        scipy: 1.7.2
       Cython: 0.29.24
       pandas: 1.3.4
   matplotlib: 3.5.0
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True


# Input Dataset

In [5]:
def __load__data(__training_path, __test_path, concat=False):
	"""load data as input dataset
	params: __training_path: the training path of input dataset
	params: __test_path: the path of test dataset
	params: if it is True, then it will concatinate the training and test dataset as output
	returns: generate final loaded dataset as dataset, input and test
	"""
	# LOAD DATA
	import pandas as pd
	__train_dataset = pd.read_csv(__training_path, delimiter=',')
	__test_dataset = pd.read_csv(__test_path, delimiter=',')
	return __train_dataset, __test_dataset
__train_dataset, __test_dataset = __load__data(__training_path, __test_path, concat=True)
__train_dataset.head()

Unnamed: 0,id,breath_id,R,C,time_step,u_in,u_out,pressure
0,1,1,20,50,0.0,0.083334,0,5.837492
1,2,1,20,50,0.033652,18.383041,0,5.907794
2,3,1,20,50,0.067514,22.509278,0,7.876254
3,4,1,20,50,0.101542,22.808822,0,11.742872
4,5,1,20,50,0.135756,25.35585,0,12.234987


In [6]:
#Store the columns relevant to submission
__test_dataset_submission_columns = __test_dataset['id']

### Discard Irrelevant Columns
In the given input dataset there are <b>1</b> column that can be removed as follows:* id *.

In [7]:
#Remove Unnecessary coloumns
__train_dataset.drop(['id'], axis=1, inplace=True)
__test_dataset.drop(['id'], axis=1, inplace=True)

### Target Column
The target column is the value which we need to predict.
Therefore, we need to detach the target columns in prediction.

If we don't drop this fields, it will generate a model with high accuracy on training and worst accuracy on test (because the value in test dataset is Null).
Here is the list of *target column*: <b>pressure</b>

In [8]:
#detatch the target column
__feature_train = __train_dataset.drop(['pressure'], axis=1)
__target_train =__train_dataset['pressure']
__feature_test = __test_dataset

# Training Model and Prediction
First, we will train a model based on preprocessed values of training data set.
Second, let's predict test values based on the trained model.

## LightGBM Regressor
We will use *LightGBM Regressor* which is constructing a gradient boosting model. We will use *lightgbm* package.

(https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html).

In [9]:
#Import libraries and the model
import numpy as np
from lightgbm import LGBMRegressor
__model = LGBMRegressor()
__model.fit(__feature_train, __target_train) 
__y_pred = __model.predict(__feature_test)

# Submission File
We have to maintain the target columns in "submission.csv" which will be submitted as our prediction results.

In [10]:
submission = pd.DataFrame(columns=['id'], data=__test_dataset_submission_columns)
submission['pressure'] = __y_pred
submission.head()

Unnamed: 0,id,pressure
0,1,6.298749
1,2,5.739619
2,3,6.890605
3,4,8.840034
4,5,10.24894


In [11]:
#save the submission file
submission.to_csv("kaggle_submission.csv", index=False)