# CS671 Deep Learning & Applications - Tutorial III

# RNNs & LSTMs

Date: 2 May 2023 | Instructor: Dr. Dileep A.D. | References: https://keras.io/

### Basics RNNs/LSTMs

In [69]:
import tensorflow as tf
from tensorflow.keras.layers import SimpleRNN, LSTM, RNN

* SimpleRNN is a fully-connected RNN layer, where the output of each time step is fed back into the input of the next time step, just like a traditional RNN. The activation function used in SimpleRNN is typically the hyperbolic tangent (tanh) function, although other activation functions can be used as well.

* RNN is a more general-purpose RNN layer that allows for more flexibility in terms of the architecture of the RNN. Specifically, RNN allows you to specify the type of cell that you want to use for the RNN (e.g. LSTM or GRU), as well as the activation function that you want to use for the cell. This makes RNN more powerful and flexible than SimpleRNN, but also potentially more complex to use.

* RNN provides a return_sequences argument, which allows you to control whether the layer should return the output of all time steps (i.e. a sequence) or just the output of the final time step. SimpleRNN, on the other hand, always returns the output of all time steps.

* The SimpleRNN layer uses a hyperbolic tangent (tanh) activation function by default, while the base RNN layer uses a linear activation function by default. The activation function can be customized for both layers using the activation argument.

* The output shape of the SimpleRNN layer is (batch_size, units), while the output shape of the base RNN layer is (batch_size, timesteps, units) where timesteps refers to the number of time steps in the input sequence.

* Overall, the SimpleRNN layer is a simpler and more lightweight option for simple recurrent tasks, while the base RNN layer is a more general-purpose layer that can handle more complex recurrent tasks with variable sequence lengths.

**SimpleRNN**

In [70]:
inputs = np.random.random([32, 10, 8]).astype(np.float32)
simple_rnn = SimpleRNN(4)

output = simple_rnn(inputs)  # The output has shape `[32, 4]`.

simple_rnn = tf.keras.layers.SimpleRNN(
    4, return_sequences=True, return_state=True)

# whole_sequence_output has shape `[32, 10, 4]`.
# final_state has shape `[32, 4]`.
whole_sequence_output, final_state = simple_rnn(inputs)

**LSTM**

In [64]:
inputs = tf.random.normal([32, 10, 8])

lstm = tf.keras.layers.LSTM(4)
output = lstm(inputs)

lstm = tf.keras.layers.LSTM(4, return_sequences=True, return_state=True)
whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)

### Handling Variable Length Inputs using Keras Masking

*   Keras documentation on recurrent layers: https://keras.io/api/layers/recurrent_layers/simple_rnn/
*   Keras documentation on masking: https://keras.io/api/layers/core_layers/masking/

In [1]:
from keras.models import Sequential
from keras.layers import Masking, SimpleRNN, Dense
import numpy as np
from matplotlib import pyplot as plt

In [35]:
X, seq_lengths = [], []
for i in range(100):
  l = np.random.randint(20, 50)
  X.append(np.random.random((l, )))
  seq_lengths.append(l)

seq_lengths = np.array(seq_lengths)
Y = np.random.randint(2, size=100)

In [36]:
max_len = np.max(seq_lengths)

In [37]:
seq_lengths[:10], max_len

(array([48, 48, 30, 42, 28, 47, 28, 25, 39, 48]), 49)

In [38]:
# Set the mask value
mask_value = 0 #np.nan

In [39]:
X_padded, X_ = [], []
for i in range(100):
  pad = max_len - seq_lengths[i]
  X_.extend(list(X[i]))
  for i in range(pad):
    X_.append(mask_value)
  X_padded.append(X_)
  X_ = []

X_padded = np.array(X_padded)
X_padded = np.expand_dims(X_padded, axis=-1)

In [26]:
X_padded.shape

(100, 49, 1)

In [27]:
# Define the model
model = Sequential()
model.add(Masking(mask_value=mask_value, input_shape=(49, 1)))
model.add(SimpleRNN(units=16))
model.add(Dense(units=1, activation='sigmoid'))

In [28]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [30]:
model.fit(X_padded, Y, batch_size=32, epochs=5, sample_weight=seq_lengths)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fd0f8c28a90>

### Handling Variable Length Input Using Pytorch

Using: `pack_padded_sequence` and `pad_packed_sequence`

PyTorch documentation on pack_padded_sequence() and pad_packed_sequence(): 

*   https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_padded_sequence.html

*   https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pad_packed_sequence.html

PyTorch tutorial on sequence-to-sequence modeling with attention:
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

