This notebook is inspired by recent notebooks from [Zhangxin](https://www.kaggle.com/tenffe/finetune-of-tensorflow-bidirectional-lstm) and [Chris Deotte](https://www.kaggle.com/cdeotte/ensemble-folds-with-median-0-153). Since it is important to dicretize the output, I propose a custom TensorFlow layer that will automatically do that for you. The optimization will therefore happen under contrains that the output should be bounded and discrete as the inputed.

In [1]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.callbacks import LearningRateScheduler, ReduceLROnPlateau
from tensorflow.keras.optimizers.schedules import ExponentialDecay
from tensorflow.keras.callbacks import Callback
import tensorflow.keras.backend as K

from sklearn.metrics import mean_absolute_error as mae
from sklearn.preprocessing import RobustScaler, normalize
from sklearn.model_selection import train_test_split, GroupKFold, KFold

2021-10-04 22:00:12.649059: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
2021-10-04 22:00:12.649169: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [2]:
DEBUG = False

train = pd.read_csv('../input/ventilator-pressure-prediction/train.csv')
test = pd.read_csv('../input/ventilator-pressure-prediction/test.csv')
submission = pd.read_csv('../input/ventilator-pressure-prediction/sample_submission.csv')

if DEBUG:
    train = train[:80*1000]

In [3]:
train

Unnamed: 0,id,breath_id,R,C,time_step,u_in,u_out,pressure
0,1,1,20,50,0.000000,0.083334,0,5.837492
1,2,1,20,50,0.033652,18.383041,0,5.907794
2,3,1,20,50,0.067514,22.509278,0,7.876254
3,4,1,20,50,0.101542,22.808822,0,11.742872
4,5,1,20,50,0.135756,25.355850,0,12.234987
...,...,...,...,...,...,...,...,...
6035995,6035996,125749,50,10,2.504603,1.489714,1,3.869032
6035996,6035997,125749,50,10,2.537961,1.488497,1,3.869032
6035997,6035998,125749,50,10,2.571408,1.558978,1,3.798729
6035998,6035999,125749,50,10,2.604744,1.272663,1,4.079938


In [4]:
all_pressure = sorted(train.pressure.unique())
PRESSURE_MIN = np.min(all_pressure)
PRESSURE_MAX = np.max(all_pressure)
PRESSURE_STEP = all_pressure[1] - all_pressure[0]

In [5]:
def add_features(df):
    df['area'] = df['time_step'] * df['u_in']
    df['area'] = df.groupby('breath_id')['area'].cumsum()
    
    df['u_in_cumsum'] = (df['u_in']).groupby(df['breath_id']).cumsum()
    
    df['u_in_lag1'] = df.groupby('breath_id')['u_in'].shift(1)
    df['u_out_lag1'] = df.groupby('breath_id')['u_out'].shift(1)
    df['u_in_lag_back1'] = df.groupby('breath_id')['u_in'].shift(-1)
    df['u_out_lag_back1'] = df.groupby('breath_id')['u_out'].shift(-1)
    
    df['u_in_lag2'] = df.groupby('breath_id')['u_in'].shift(2)
    df['u_out_lag2'] = df.groupby('breath_id')['u_out'].shift(2)
    df['u_in_lag_back2'] = df.groupby('breath_id')['u_in'].shift(-2)
    df['u_out_lag_back2'] = df.groupby('breath_id')['u_out'].shift(-2)
    
    df['u_in_lag3'] = df.groupby('breath_id')['u_in'].shift(3)
    df['u_out_lag3'] = df.groupby('breath_id')['u_out'].shift(3)
    df['u_in_lag_back3'] = df.groupby('breath_id')['u_in'].shift(-3)
    df['u_out_lag_back3'] = df.groupby('breath_id')['u_out'].shift(-3)
    
    df['u_in_lag4'] = df.groupby('breath_id')['u_in'].shift(4)
    df['u_out_lag4'] = df.groupby('breath_id')['u_out'].shift(4)
    df['u_in_lag_back4'] = df.groupby('breath_id')['u_in'].shift(-4)
    df['u_out_lag_back4'] = df.groupby('breath_id')['u_out'].shift(-4)
    df = df.fillna(0)
    
    df['breath_id__u_in__max'] = df.groupby(['breath_id'])['u_in'].transform('max')
    df['breath_id__u_out__max'] = df.groupby(['breath_id'])['u_out'].transform('max')
    
    df['u_in_diff1'] = df['u_in'] - df['u_in_lag1']
    df['u_out_diff1'] = df['u_out'] - df['u_out_lag1']
    df['u_in_diff2'] = df['u_in'] - df['u_in_lag2']
    df['u_out_diff2'] = df['u_out'] - df['u_out_lag2']
    
    df['breath_id__u_in__diffmax'] = df.groupby(['breath_id'])['u_in'].transform('max') - df['u_in']
    df['breath_id__u_in__diffmean'] = df.groupby(['breath_id'])['u_in'].transform('mean') - df['u_in']
    
    df['breath_id__u_in__diffmax'] = df.groupby(['breath_id'])['u_in'].transform('max') - df['u_in']
    df['breath_id__u_in__diffmean'] = df.groupby(['breath_id'])['u_in'].transform('mean') - df['u_in']
    
    df['u_in_diff3'] = df['u_in'] - df['u_in_lag3']
    df['u_out_diff3'] = df['u_out'] - df['u_out_lag3']
    df['u_in_diff4'] = df['u_in'] - df['u_in_lag4']
    df['u_out_diff4'] = df['u_out'] - df['u_out_lag4']
    df['cross']= df['u_in']*df['u_out']
    df['cross2']= df['time_step']*df['u_out']
    
    df['R'] = df['R'].astype(str)
    df['C'] = df['C'].astype(str)
    df['R__C'] = df["R"].astype(str) + '__' + df["C"].astype(str)
    df = pd.get_dummies(df)
    return df

In [6]:
train = add_features(train)
test = add_features(test)

In [7]:
targets = train[['pressure']].to_numpy().reshape(-1, 80)
train.drop(['pressure', 'id', 'breath_id'], axis=1, inplace=True)
test = test.drop(['id', 'breath_id'], axis=1)

In [8]:
RS = RobustScaler()
train = RS.fit_transform(train)
test = RS.transform(test)

In [9]:
train = train.reshape(-1, 80, train.shape[-1]).astype(np.float32)
test = test.reshape(-1, 80, train.shape[-1]).astype(np.float32)
targets = targets.astype(np.float32)

The following custom layer will rescale the output to fit the discrete steps in values to be found in the target. In such a way, you will force your network to learn how to provide outputs that do not need further post processing.

Please notice the custom rounding **round_with_gradients** function since tf.round has no gradients and it won't be differentiable.

In [10]:
@tf.custom_gradient
def round_with_gradients(x):
    def grad(dy):
        return dy
    return tf.round(x), grad

class ScaleLayer(tf.keras.layers.Layer):
    def __init__(self):
        super(ScaleLayer, self).__init__()
        self.min = tf.constant(PRESSURE_MIN, dtype=np.float32)
        self.max = tf.constant(PRESSURE_MAX, dtype=np.float32)
        self.step = tf.constant(PRESSURE_STEP, dtype=np.float32)

    def call(self, inputs):
        steps = tf.math.divide(tf.math.add(inputs, -self.min), self.step)
        int_steps = round_with_gradients(steps)
        rescaled_steps = tf.math.add(tf.math.multiply(int_steps, self.step), self.min)
        clipped = tf.clip_by_value(rescaled_steps, self.min, self.max)
        return clipped

In [11]:
EPOCH = 300
BATCH_SIZE = 1024
NUM_FOLDS = 10

In [12]:
# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()

# instantiate a distribution strategy
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

with tpu_strategy.scope():
    
    K = keras.backend

    def create_model():
        inputs = keras.layers.Input(shape=train.shape[-2:])
        x = inputs
        for units in [1024, 512, 256, 128]:
            x = keras.layers.Bidirectional(keras.layers.LSTM(units, return_sequences=True))(x)
        x = keras.layers.Dense(128, activation='selu')(x)
        outputs = keras.layers.Dense(1)(x)
        outputs = ScaleLayer()(outputs)
        
        model  = keras.Model(inputs=inputs, outputs=outputs)
        model.compile(optimizer="adam", loss='mae') 
        return model
    
    kf = KFold(n_splits=NUM_FOLDS, shuffle=True, random_state=1970)
    test_preds = []
    for fold, (train_idx, test_idx) in enumerate(kf.split(train, targets)):
        print('-'*15, '>', f'Fold {fold+1}', '<', '-'*15)
        X_train, X_valid = train[train_idx], train[test_idx]
        y_train, y_valid = targets[train_idx], targets[test_idx]
        
        model = create_model()

        lr = ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=10, verbose=1)
        es = EarlyStopping(monitor="val_loss", patience=60, verbose=1, 
                           mode="min", restore_best_weights=True)
    
        checkpoint_filepath = f"folds{fold}.hdf5"
        sv = keras.callbacks.ModelCheckpoint(
            checkpoint_filepath, monitor='val_loss', verbose=1, save_best_only=True,
            save_weights_only=False, mode='auto', save_freq='epoch',
            options=None
        )

        model.fit(X_train, y_train, validation_data=(X_valid, y_valid), 
                  epochs=EPOCH, batch_size=BATCH_SIZE, callbacks=[lr, es, sv])
        
        test_preds.append(model.predict(test, batch_size=BATCH_SIZE, verbose=2)
                          .squeeze().reshape(-1, 1).squeeze())

2021-10-04 22:01:39.849151: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-10-04 22:01:39.852060: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
2021-10-04 22:01:39.852095: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-10-04 22:01:39.852119: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (f34fb7d148bb): /proc/driver/nvidia/version does not exist
2021-10-04 22:01:39.854459: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operation

--------------- > Fold 1 < ---------------


2021-10-04 22:02:00.471394: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1086480000 exceeds 10% of free system memory.


Epoch 1/300

Epoch 00001: val_loss improved from inf to 1.24230, saving model to folds0.hdf5
Epoch 2/300

Epoch 00002: val_loss improved from 1.24230 to 0.80458, saving model to folds0.hdf5
Epoch 3/300

Epoch 00003: val_loss improved from 0.80458 to 0.61779, saving model to folds0.hdf5
Epoch 4/300

Epoch 00004: val_loss did not improve from 0.61779
Epoch 5/300

Epoch 00005: val_loss improved from 0.61779 to 0.46415, saving model to folds0.hdf5
Epoch 6/300

Epoch 00006: val_loss did not improve from 0.46415
Epoch 7/300

Epoch 00007: val_loss improved from 0.46415 to 0.40706, saving model to folds0.hdf5
Epoch 8/300

Epoch 00008: val_loss did not improve from 0.40706
Epoch 9/300

Epoch 00009: val_loss did not improve from 0.40706
Epoch 10/300

Epoch 00010: val_loss did not improve from 0.40706
Epoch 11/300

Epoch 00011: val_loss improved from 0.40706 to 0.35807, saving model to folds0.hdf5
Epoch 12/300

Epoch 00012: val_loss did not improve from 0.35807
Epoch 13/300

Epoch 00013: val_loss

2021-10-04 22:33:58.524710: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1086480000 exceeds 10% of free system memory.


Epoch 1/300

Epoch 00001: val_loss improved from inf to 1.14949, saving model to folds1.hdf5
Epoch 2/300

Epoch 00002: val_loss improved from 1.14949 to 0.70061, saving model to folds1.hdf5
Epoch 3/300

Epoch 00003: val_loss improved from 0.70061 to 0.64095, saving model to folds1.hdf5
Epoch 4/300

Epoch 00004: val_loss improved from 0.64095 to 0.49377, saving model to folds1.hdf5
Epoch 5/300

Epoch 00005: val_loss improved from 0.49377 to 0.47728, saving model to folds1.hdf5
Epoch 6/300

Epoch 00006: val_loss improved from 0.47728 to 0.47429, saving model to folds1.hdf5
Epoch 7/300

Epoch 00007: val_loss improved from 0.47429 to 0.41622, saving model to folds1.hdf5
Epoch 8/300

Epoch 00008: val_loss improved from 0.41622 to 0.40848, saving model to folds1.hdf5
Epoch 9/300

Epoch 00009: val_loss improved from 0.40848 to 0.40022, saving model to folds1.hdf5
Epoch 10/300

Epoch 00010: val_loss improved from 0.40022 to 0.36897, saving model to folds1.hdf5
Epoch 11/300

Epoch 00011: val_lo

2021-10-04 23:06:05.460893: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1086480000 exceeds 10% of free system memory.


Epoch 1/300

Epoch 00001: val_loss improved from inf to 1.12292, saving model to folds2.hdf5
Epoch 2/300

Epoch 00002: val_loss improved from 1.12292 to 0.71743, saving model to folds2.hdf5
Epoch 3/300

Epoch 00003: val_loss improved from 0.71743 to 0.58216, saving model to folds2.hdf5
Epoch 4/300

Epoch 00004: val_loss did not improve from 0.58216
Epoch 5/300

Epoch 00005: val_loss improved from 0.58216 to 0.51894, saving model to folds2.hdf5
Epoch 6/300

Epoch 00006: val_loss improved from 0.51894 to 0.49343, saving model to folds2.hdf5
Epoch 7/300

Epoch 00007: val_loss improved from 0.49343 to 0.43897, saving model to folds2.hdf5
Epoch 8/300

Epoch 00008: val_loss improved from 0.43897 to 0.40828, saving model to folds2.hdf5
Epoch 9/300

Epoch 00009: val_loss improved from 0.40828 to 0.39223, saving model to folds2.hdf5
Epoch 10/300

Epoch 00010: val_loss did not improve from 0.39223
Epoch 11/300

Epoch 00011: val_loss improved from 0.39223 to 0.35807, saving model to folds2.hdf5
E

2021-10-04 23:37:54.452645: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1086480000 exceeds 10% of free system memory.


Epoch 1/300

Epoch 00001: val_loss improved from inf to 1.14545, saving model to folds3.hdf5
Epoch 2/300

Epoch 00002: val_loss improved from 1.14545 to 0.73264, saving model to folds3.hdf5
Epoch 3/300

Epoch 00003: val_loss improved from 0.73264 to 0.64152, saving model to folds3.hdf5
Epoch 4/300

Epoch 00004: val_loss improved from 0.64152 to 0.53062, saving model to folds3.hdf5
Epoch 5/300

Epoch 00005: val_loss improved from 0.53062 to 0.49903, saving model to folds3.hdf5
Epoch 6/300

Epoch 00006: val_loss improved from 0.49903 to 0.46188, saving model to folds3.hdf5
Epoch 7/300

Epoch 00007: val_loss improved from 0.46188 to 0.42188, saving model to folds3.hdf5
Epoch 8/300

Epoch 00008: val_loss improved from 0.42188 to 0.41782, saving model to folds3.hdf5
Epoch 9/300

Epoch 00009: val_loss did not improve from 0.41782
Epoch 10/300

Epoch 00010: val_loss improved from 0.41782 to 0.38447, saving model to folds3.hdf5
Epoch 11/300

Epoch 00011: val_loss improved from 0.38447 to 0.355

2021-10-05 00:14:04.352571: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 1086480000 exceeds 10% of free system memory.


Epoch 1/300

Epoch 00001: val_loss improved from inf to 1.50871, saving model to folds4.hdf5
Epoch 2/300

Epoch 00002: val_loss improved from 1.50871 to 0.75686, saving model to folds4.hdf5
Epoch 3/300

Epoch 00003: val_loss improved from 0.75686 to 0.61370, saving model to folds4.hdf5
Epoch 4/300

Epoch 00004: val_loss improved from 0.61370 to 0.57053, saving model to folds4.hdf5
Epoch 5/300

Epoch 00005: val_loss improved from 0.57053 to 0.55607, saving model to folds4.hdf5
Epoch 6/300

Epoch 00006: val_loss improved from 0.55607 to 0.46708, saving model to folds4.hdf5
Epoch 7/300

Epoch 00007: val_loss improved from 0.46708 to 0.41550, saving model to folds4.hdf5
Epoch 8/300

Epoch 00008: val_loss did not improve from 0.41550
Epoch 9/300

Epoch 00009: val_loss improved from 0.41550 to 0.38350, saving model to folds4.hdf5
Epoch 10/300

Epoch 00010: val_loss improved from 0.38350 to 0.37778, saving model to folds4.hdf5
Epoch 11/300

Epoch 00011: val_loss improved from 0.37778 to 0.373

In [13]:
submission["pressure"] = np.median(np.vstack(test_preds), axis=0)
submission.to_csv('submission.csv', index=False)

In [14]:
submission.head()

Unnamed: 0,id,pressure
0,1,6.259304
1,2,5.907794
2,3,7.173232
3,4,7.595045
4,5,9.141692
