# Enhanced Performance using Masked Input for a 3 Branched Bidirectional GRU neural net architecture

The following work is an attempt to create an achitecture better than the standard deep bidirectional LSTM/GRU architecture. The architecture proposed has 3 parallel deep Bidirectional GRU neural nets, all 3 of them work on their own set of inputs and the output of these neural nets are concatenated and passed to a dense layer. Through some experimentation it was found out that the architecture proposed performed better than the standard deep bidirectional GRU neural net. The 3 different inputs passed are as follows:
* **Input 1** : 80 time step time sequence with [ : 36] time step being masked.
* **Input 2** : 80 time step time sequence with [36 : ] time step being masked.
* **Input 1** : 80 time step time sequence with no masking i.e the standard input.

for speeding up the training we will only be using a smaller dataset of about 3000 samples instead of 75450 samples.

In [None]:
import tensorflow as tf
from tensorflow import keras

from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler, RobustScaler

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
train = pd.read_csv('../input/ventilator-pressure-prediction/train.csv')
test = pd.read_csv('../input/ventilator-pressure-prediction/test.csv')

train.drop('id', axis=1, inplace=True)
test.drop('id', axis=1, inplace=True)

X = train.drop(['pressure', 'breath_id'], axis=1)
y = train.pressure.values
test = test.drop('breath_id', axis=1)


scaler = StandardScaler()

X = scaler.fit_transform(X)
X_test = scaler.transform(test)

X = X.reshape(75450, 80, 5)
y = y.reshape(75450, 80)

X_test = X_test.reshape(50300, 80, 5)

## Bidirectional GRUs (model 1)


This is the standard Bidirectional GRU neural net model, it is being trained for bench marking.

In [None]:
def model():
   
    input_ = keras.layers.Input(shape=(80, 5))
    x = keras.layers.Bidirectional(keras.layers.GRU(1024, return_sequences=True))(input_)
    x = keras.layers.Bidirectional(keras.layers.GRU(512, return_sequences=True))(x)
    x = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x)
    x = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x)
    x = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x)

    x = keras.layers.Dense(1000, activation='relu')(x)
    output_ = keras.layers.Dense(1)(x)

    model = keras.models.Model(inputs=input_, outputs=output_)
    
    return model

In [None]:
dnn_model = model()

dnn_model.compile(optimizer=keras.optimizers.Nadam(), loss='mae')

In [None]:
es = keras.callbacks.EarlyStopping(patience=30)
ls = keras.callbacks.ReduceLROnPlateau(patience=15, factor=0.7)

history_1 = dnn_model.fit(X[:2000], y[:2000], validation_split=0.3, batch_size=64, epochs=100, callbacks=[es, ls])

## 2 Branch Bidirectional GRU neural net architecture(model 2)

This neural net architecture just contains the 2 branches and the input passed is of type 1 and 2 as discussed in the beginning.

In [None]:
def get_split_data(X):
    data_A = np.zeros(X.shape)
    data_B = np.zeros(X.shape)
    
    data_A[:, :36, :] = X[:, :36, :]
    data_B[:, 36:, :] = X[:, 36:, :]
    
    return data_A, data_B

In [None]:
input_A = keras.layers.Input(shape=(80, 5))
input_B = keras.layers.Input(shape=(80, 5))
input_C = keras.layers.Input(shape=(80, 5))

input_A_masked = keras.layers.Masking(mask_value=0., input_shape=(80, 5))(input_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(512, return_sequences=True))(input_A_masked)
x_A = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_A)

input_B_masked = keras.layers.Masking(mask_value=0., input_shape=(80, 5))(input_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(512, return_sequences=True))(input_B_masked)
x_B = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_B)

x = keras.layers.Dense(1000, activation='relu')(tf.concat([x_A, x_B], axis=-1))

y_hat = keras.layers.Dense(1)(x)

model = keras.models.Model(inputs=[input_A, input_B], outputs=y_hat)

model.compile(optimizer=keras.optimizers.Nadam(), loss='mae')

data_A, data_B = get_split_data(X[:2000])
es = keras.callbacks.EarlyStopping(patience=30)
ls = keras.callbacks.ReduceLROnPlateau(patience=15, factor=0.7)

history_2 = model.fit(x=[data_A, data_B], y=y[:2000], validation_split=0.3, batch_size=64, epochs=100, callbacks=[es, ls])

# 3 Branch Bidirectional GRU neural net architecture(model 3)

This is the final and the main architecture that we will compare with the other 2 architectures. **The ideology** behind creating this model is the sequence in which we have the pressure, for the first few timesteps the pressure is high compared to final 50 or so time step. By giving neural net to have the flexibility to tune two different branches of bidirectional neural nets specifically on both of these two phases we allow it to learn those phases better independent from the other phase's internal state. The third branch is to emphasize the internal state passed through the RNN as a whole.

In [None]:
input_A = keras.layers.Input(shape=(80, 5))
input_B = keras.layers.Input(shape=(80, 5))
input_C = keras.layers.Input(shape=(80, 5))

input_A_masked = keras.layers.Masking(mask_value=0., input_shape=(80, 5))(input_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(512, return_sequences=True))(input_A_masked)
x_A = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_A)
x_A = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_A)

input_B_masked = keras.layers.Masking(mask_value=0., input_shape=(80, 5))(input_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(512, return_sequences=True))(input_B_masked)
x_B = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_B)
x_B = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_B)

x_C = keras.layers.Bidirectional(keras.layers.GRU(1024, return_sequences=True))(input_C)
x_C = keras.layers.Bidirectional(keras.layers.GRU(512, return_sequences=True))(x_C)
x_C = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_C)
x_C = keras.layers.Bidirectional(keras.layers.GRU(256, return_sequences=True))(x_C)
x_C = keras.layers.Bidirectional(keras.layers.GRU(128, return_sequences=True))(x_C)

x = keras.layers.Dense(1000, activation='relu')(tf.concat([x_A, x_B, x_C], axis=-1))

y_hat = keras.layers.Dense(1)(x)

model = keras.models.Model(inputs=[input_A, input_B, input_C], outputs=y_hat)

model.compile(optimizer=keras.optimizers.Nadam(), loss='mae')

data_A, data_B = get_split_data(X[:2000])
es = keras.callbacks.EarlyStopping(patience=30)
ls = keras.callbacks.ReduceLROnPlateau(patience=15, factor=0.7)


history_3 = model.fit(x=[data_A, data_B, X[:2000]], y=y[:2000], validation_split=0.3, batch_size=64, epochs=100, callbacks=[es, ls])

## Comparison

### Validation loss

In [None]:
dictionary_1 = history_1.history
dictionary_2 = history_2.history
dictionary_3 = history_3.history
model_names = ['standard model', '2 Branch', '3 Branch']

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(16, 8))
ax = ax.reshape(-1,)
for i, loss in enumerate([dictionary_1, dictionary_2, dictionary_3]):
    ax[0].plot(loss['val_loss'], label=f'Val Loss {model_names[i]}')
    ax[0].set_title('Validation Loss', fontsize=20)
    ax[0].set_xlabel('Epochs', fontsize=18)
    ax[0].set_ylabel('Mean Absolute Error', fontsize=18)
    ax[0].legend()
    ax[0].grid(alpha=0.4)
    
for i, loss in enumerate([dictionary_1, dictionary_2, dictionary_3]):
    ax[1].plot(loss['loss'], label=f'Training Loss {model_names[i]}')
    ax[1].set_title('Training Loss', fontsize=20)
    ax[1].set_xlabel('Epochs', fontsize=18)
    ax[1].set_ylabel('Mean Absolute Error', fontsize=18)
    ax[1].legend()
    ax[1].grid(alpha=0.4)

We can clearly see that our model has way better training loss compared to other models. Validation loss is also slightly better but not a difference as good as we can see in the training loss case, This can be attributed to the fact that we did not have many samples to generalize the model on, and having more samples would have made this result better, which is also the case.

Further improvements can be made to this architecture by properly tuning the parameters.

If you have any suggestions or constructive criticism, please leave them in the comments below, it will be highly appreciated.

**If you like the work please upvote!!** :)