The LSTM cell has several gates and memory units to control the flow of information. Let's denote:

- \( x_t \) as the input vector at time step \( t \),
- \( h_{t-1} \) as the previous hidden state (output) at time step \( t-1 \),
- \( c_{t-1} \) as the previous cell state at time step \( t-1 \),
- \( h_t \) as the current hidden state (output) at time step \( t \),
- \( c_t \) as the current cell state at time step \( t \).

The LSTM cell consists of the following components:

1. Forget Gate:
   - The forget gate decides what information to discard from the cell state.
   - It takes \( x_t \) and \( h_{t-1} \) as inputs and produces a forget gate activation vector \( f_t \) using a sigmoid activation function.
   - Mathematically, the forget gate is defined as:
     \[ f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \]
     where \( W_f \) is the weight matrix and \( b_f \) is the bias vector for the forget gate.

2. Input Gate:
   - The input gate decides what new information to store in the cell state.
   - It takes \( x_t \) and \( h_{t-1} \) as inputs and produces an input gate activation vector \( i_t \) and a candidate cell state update vector \( \tilde{c}_t \) using sigmoid and tanh activation functions, respectively.
   - Mathematically, the input gate and candidate cell state update are defined as:
     \[ i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \]
     \[ \tilde{c}_t = \tanh(W_c \cdot [h_{t-1}, x_t] + b_c) \]

3. Update Cell State:
   - The update cell state computes the new cell state by combining the previous cell state \( c_{t-1} \) with the information selected by the forget gate and the information to be added by the input gate.
   - Mathematically, the new cell state \( c_t \) is computed as:
     \[ c_t = f_t \cdot c_{t-1} + i_t \cdot \tilde{c}_t \]

4. Output Gate:
   - The output gate decides what information to output from the cell state.
   - It takes \( x_t \) and \( h_{t-1} \) as inputs and produces an output gate activation vector \( o_t \) and the next hidden state \( h_t \) using sigmoid and tanh activation functions, respectively.
   - Mathematically, the output gate and the next hidden state are defined as:
     \[ o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \]
     \[ h_t = o_t \cdot \tanh(c_t) \]


In [1]:
'''
 * Copyright (c) 2018 Radhamadhab Dalai
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
'''

import numpy as np

class LSTMCell:
    def __init__(self, input_size, hidden_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        
        # Initialize weights and biases
        self.W_f = np.random.randn(hidden_size, input_size + hidden_size)
        self.b_f = np.zeros((hidden_size, 1))
        
        self.W_i = np.random.randn(hidden_size, input_size + hidden_size)
        self.b_i = np.zeros((hidden_size, 1))
        
        self.W_c = np.random.randn(hidden_size, input_size + hidden_size)
        self.b_c = np.zeros((hidden_size, 1))
        
        self.W_o = np.random.randn(hidden_size, input_size + hidden_size)
        self.b_o = np.zeros((hidden_size, 1))
        
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def tanh(self, x):
        return np.tanh(x)
    
    def forward(self, x_t, h_prev, c_prev):
        # Concatenate input and previous hidden state
        concat_input = np.vstack((h_prev, x_t))
        
        # Forget gate
        f_t = self.sigmoid(np.dot(self.W_f, concat_input) + self.b_f)
        
        # Input gate
        i_t = self.sigmoid(np.dot(self.W_i, concat_input) + self.b_i)
        
        # Candidate cell state update
        tilde_c_t = self.tanh(np.dot(self.W_c, concat_input) + self.b_c)
        
        # Update cell state
        c_t = f_t * c_prev + i_t * tilde_c_t
        
        # Output gate
        o_t = self.sigmoid(np.dot(self.W_o, concat_input) + self.b_o)
        
        # Update hidden state
        h_t = o_t * self.tanh(c_t)
        
        return h_t, c_t
