# MATH OF LSTM

Suryasatriya Trihandaru

Data Science, FSM UKSW

First, let's define the inputs and outputs to the LSTM cell:

$x_t$: the input vector at time step $t$

$h_{t-1}$: the previous hidden state at time step $t-1$

$c_{t-1}$: the previous cell state at time step $t-1$

$h_t$: the current hidden state at time step $t$

$c_t$: the current cell state at time step $t$

$y_t$: the output vector at time step $t$

Next, we define the intermediate variables used in the LSTM cell:

$i_t$: the input gate at time step $t$

$f_t$: the forget gate at time step $t$

$o_t$: the output gate at time step $t$

$g_t$: the cell gate at time step $t$

The mathematical expression for the LSTM cell can be broken down into the following equations:

Compute the input, forget, and output gates:

$i_t = \sigma(W_{ix} x_t + W_{ih} h_{t-1} + b_i)$

$f_t = \sigma(W_{fx} x_t + W_{fh} h_{t-1} + b_f)$

$o_t = \sigma(W_{ox} x_t + W_{oh} h_{t-1} + b_o)$

where $\sigma$ is the sigmoid function.

Compute the cell gate:

$g_t = \tanh(W_{gx} x_t + W_{gh} h_{t-1} + b_g)$

where $\tanh$ is the hyperbolic tangent function.

Update the cell state:

$c_t = f_t \odot c_{t-1} + i_t \odot g_t$

where $\odot$ is the element-wise multiplication.

Update the hidden state:

$h_t = o_t \odot \tanh(c_t)$

Compute the output:

$y_t = W_{oy} h_t + b_y$

where $W$ and $b$ are weight matrices and bias vectors to be learned during training.

This is the mathematical expression for a single LSTM cell. In practice, multiple LSTM cells are stacked together to form an LSTM layer, and multiple LSTM layers can be stacked to form an LSTM network.





In [1]:
import numpy as np
import tensorflow as tf

# Define the LSTM model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(1, input_shape=(1, 2)),
    tf.keras.layers.Dense(1)
])

# Get the parameters of the LSTM layer
W_i, W_f, W_c, W_o = model.get_weights()[0].T
U_i, U_f, U_c, U_o = model.get_weights()[1].T
b_i, b_f, b_c, b_o = model.get_weights()[2]

# Get the parameters of the dense layer
W_dense = model.get_weights()[3]
b_dense = model.get_weights()[4]

# Define the input
x = np.array([[[1, 2]]])

# Initialize the previous hidden state and cell state to 0
h_tm1 = np.zeros((1, 1))
c_tm1 = np.zeros((1, 1))

# Compute the output of the LSTM layer
i_t = tf.math.sigmoid(np.dot(x, W_i) + np.dot(h_tm1, U_i) + b_i)
f_t = tf.math.sigmoid(np.dot(x, W_f) + np.dot(h_tm1, U_f) + b_f)
o_t = tf.math.sigmoid(np.dot(x, W_o) + np.dot(h_tm1, U_o) + b_o)
g_t = tf.math.tanh(np.dot(x, W_c) + np.dot(h_tm1, U_c) + b_c)
c_t = f_t * c_tm1 + i_t * g_t
h_t = o_t * tf.math.tanh(c_t)

# Compute the output of the dense layer
y = np.dot(h_t, W_dense) + b_dense

# Print the predicted output
print(y)
print(model.predict(x))

[[-0.47541735]]
[[-0.4754174]]
