## References

- https://www.kaggle.com/alexryzhkov/lightautoml-continuer
- https://lightautoml.readthedocs.io/en/latest/
- https://www.kaggle.com/kensit/improvement-base-on-tensor-bidirect-lstm-0-173

## LightAutoML installation

In [None]:
!pip install -U lightautoml

## Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

Here we setup the constants to use in the kernel:

- `N_THREADS` - number of vCPUs for LightAutoML model creation
- `N_FOLDS` - number of folds in LightAutoML inner CV
- `RANDOM_STATE` - random seed for better reproducibility
- `TIMEOUT` - limit in seconds for model to train
- `TARGET_NAME` - target column name in dataset

In [None]:
N_THREADS = 4
N_FOLDS = 5
RANDOM_STATE = 42
TIMEOUT = 72000
TARGET_NAME = 'pressure'

In [None]:
# for reproducibility
np.random.seed(RANDOM_STATE)
torch.set_num_threads(N_THREADS)

## Data loading

In [None]:
train = pd.read_csv('../input/ventilator-pressure-prediction/train.csv')
test = pd.read_csv('../input/ventilator-pressure-prediction/test.csv')
sample_sub = pd.read_csv('../input/ventilator-pressure-prediction/sample_submission.csv')

In [None]:
print(train.shape)
train.head()

In [None]:
print(test.shape)
test.head()

## Add a bidirectional LSTM feature

I copied [Improvement base on Tensor Bidirect LSTM (0.173)](https://www.kaggle.com/kensit/improvement-base-on-tensor-bidirect-lstm-0-173) and created a bidirectional LSTM feature. My copy (score: 0.161) is [here](https://www.kaggle.com/tsano430/improvement-base-on-tensor-bidirect-lstm-0-173/data).

In [None]:
!cp -r ../input/googlebrainbilstm/bilstm_test.csv ./

In [None]:
train_bilstm = pd.read_csv('../input/googlebrainbilstm/bilstm_train.csv')
test_bilstm = pd.read_csv('../input/googlebrainbilstm/bilstm_test.csv')

train['bilstm_pred'] = train_bilstm['pressure']
test['bilstm_pred'] = test_bilstm['pressure']

In [None]:
del train_bilstm, test_bilstm

## LightAutoML model building

### Task setup

On the cell below we create Task object - the class to setup what task LightAutoML model should solve with specific loss and metric if necessary (more info can be found [here](https://lightautoml.readthedocs.io/en/latest/generated/lightautoml.tasks.base.Task.html#lightautoml.tasks.base.Task) in our documentation):

In [None]:
task = Task('reg', loss='mae', metric='mae')

### Feature roles setup

To solve the task, we need to setup columns roles. The only role you must setup is target role, everything else (drop, numeric, categorical, group, weights etc.) is up to user - LightAutoML models have automatic columns typization inside:

In [None]:
roles = {
    'drop': 'id',
    'group': 'breath_id', # for group k-fold
    'category': ['R', 'C'],
    'target': TARGET_NAME
}

### LightAutoML model creation - TabularAutoML preset

In next the cell we are going to create LightAutoML model with `TabularAutoML` class - preset with default model structure like in the image below:

![LightAutoML model](https://raw.githubusercontent.com/sberbank-ai-lab/LightAutoML/master/imgs/tutorial_blackbox_pipeline.png "LightAutoML model")

in just several lines. Let's discuss the params we can setup:

- `task` - the type of the ML task (the only must have parameter)
- `timeout` - time limit in seconds for model to train
- `cpu_limit` - vCPU count for model to use
- `reader_params` - parameter change for Reader object inside preset, which works on the first step of data preparation: automatic feature typization, preliminary almost-constant features, correct CV setup etc. For example, we setup `n_jobs` threads for typization algo, `cv` folds and `random_state` as inside CV seed.
- `general_params` - we use `use_algos` key to setup the model structure to work with (Linear and LGBM model on the first level and their weighted composition creation on the second). This setup is only to speedup the kernel, you can remove this `general_params` setup if you want the whole LightAutoML model to run.

In [None]:
%%time
automl = TabularAutoML(task=task, 
                       timeout=TIMEOUT,
                       cpu_limit=N_THREADS,
                       reader_params={'n_jobs': N_THREADS, 'cv': N_FOLDS, 'random_state': RANDOM_STATE},
                       general_params={'use_algos': [['linear_l2', 'lgb', 'lgb_tuned']]},
                       tuning_params={'max_tuning_time': 1800}
                      )
oof_pred = automl.fit_predict(train, roles=roles)

In [None]:
# Prediction
test_pred = automl.predict(test)
sample_sub[TARGET_NAME] = test_pred.data[:, 0]

## Create submission file

In [None]:
sample_sub.head()

In [None]:
sample_sub.to_csv('submission.csv', index=False)