# Dealing with variable-length sequences efficiently

This notebook is meant to be run using a GPU.

RNNs can work with variable length sequences. This notebook shows how to do this efficiently with `keras`.

In [None]:
import tensorflow as tf
import numpy as np
from tqdm.auto import tqdm

Let's generate some toy data - consisting of sequences around length 100 with 3 features each (can think of this as e.g. particle tracks with momentum components).

In [None]:
counts = np.random.poisson(100, size=100000)
arrays = [np.random.normal(size=(count, 3)) for count in counts]

In [None]:
arrays[0].shape

We want to feed this through the following 2-layer LSTM model - it takes a batch of sequences of arbitrary length and outputs a batch of numbers:

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(128, input_shape=(None, 3), return_sequences=True),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(1),
])

In [None]:
model.summary()

How do we feed in the variable length sequences? Well, since the first two input dimensions of our model are unspecified `(batch_size, sequence_length)` we can pass each sequence separately. Let's see how fast this is:

In [None]:
for array in tqdm(arrays):
    model(array[np.newaxis, :])

Doesn't seem that bad does it?

Wait! We haven't seen yet how fast it could be ...

If you look at the GPU utilization (e.g. with `nvidia-smi`) while this is running you will see it is rather low. That's because RNNs are inherently sequential - we can't process the different steps of a sequence in parallel.

But what we can do is process each step of the sequence in parallel across all instances of a batch!

Keras will do this if we provide batches that are Tensors of fixed length.

To try this out, let's enlarge the sequences to a fixed length and fill missing values with 0:

In [None]:
padded = tf.keras.preprocessing.sequence.pad_sequences(arrays, padding="post", dtype="float32")

In [None]:
padded.shape

In [None]:
padded[0]

In [None]:
model.predict(padded, batch_size=256, verbose=True)

That should have been **much** faster.

But now the model also processed the 0-padded values. We can see that e.g. the first output is different than what we expect from passing in the first sequence:

In [None]:
model(arrays[0][np.newaxis, :])

In keras we can solve this by a `Masking` layer - subsequent RNN layers will respect this and only process non-masked inputs.

For more info, see https://keras.io/guides/understanding_masking_and_padding/

In [None]:
masked_model = tf.keras.Sequential([
    tf.keras.layers.Masking(mask_value=0.0),
    tf.keras.layers.LSTM(128, input_shape=(None, 3), return_sequences=True),
    tf.keras.layers.LSTM(128),
    tf.keras.layers.Dense(1),
])

In [None]:
masked_model.build(input_shape=(None, None, 3))

In [None]:
# set the weights such that we can compare the outputs of both models
masked_model.set_weights(model.get_weights())

In [None]:
masked_model.predict(padded, batch_size=256, verbose=True)

This time the output is compatible with the one-by-one processing.