# LSTM: Long Short-Term Memory (Part 1: The What?)

## I. Introduction and Importance

*   This video continues the Deep Learning playlist, resuming after a pause.
*   The last topic covered was **Recurrent Neural Networks (RNN)**.
*   **Long Short-Term Memory (LSTM)** is a highly important topic, heavily used in the industry.
*   LSTMs are a foundational step for understanding modern advancements like **ChatGPT, LLMs, Transformers, and Attention mechanisms**.
*   **Note on Difficulty:** LSTMs are generally labelled as a difficult topic. This is because it is often challenging to understand not only *how* LSTMs work but also *why* they function the way they do.
*   **Roadmap for LSTM Series:**
    1.  **What?** (Core idea and concept).
    2.  **How?** (Architecture and underlying mathematics).
    3.  **Why?** (Understanding the mechanism's purpose).
    4.  Practical coding project.

## II. Recap: The Need for LSTMs

### A. Artificial Neural Networks (ANNs)

*   ANNs have a simple architecture consisting of input layers, hidden layers, and an output layer, all fully connected.
*   **Limitation:** ANNs cannot handle **sequential data** (like text or time series) where chronology matters and past data points impact future points.
*   ANNs process an entire sequence (e.g., a sentence converted into vectors/numbers) simultaneously, meaning the order of words loses its significance.

### B. Recurrent Neural Networks (RNNs)

*   RNNs were developed to solve the sequential data problem by introducing the concept of **State**.
*   **RNN Architecture:** Uses a similar structure to ANNs but incorporates a state concept where each hidden unit provides feedback to itself and to others in the next time step.
*   Data is processed one word/token at a time.
*   The output of one time step ($H_t$) acts as an input for the next time step.

## III. The Problem with RNNs and the Origin of LSTMs

*   The main issue arises when the sequence (or chain) becomes **very long** (e.g., 50 words).
*   When a decision needs to be made late in the sequence (e.g., filling in a blank at the end of a long paragraph), the RNN often **forgets** the initial inputs upon which the decision depends.
*   This forgetting is due to a mathematical issue called the **Vanishing/Exploding Gradient Problem**.
*   **Origin of LSTMs:** LSTMs were created specifically because RNNs were **not able to handle long sequences** due to the Vanishing/Exploding Gradient Problem.

## IV. Core Idea of LSTM: Separating Memory

*   In RNNs, the system has only **one path (line)** available to retain information about past data.
*   This single line is tasked with maintaining both **Short-Term Context** (recent inputs) and **Long-Term Context** (information from the distant past).
*   Mathematically, this combined responsibility fails, and the Short-Term Context typically **dominates**, causing the system to forget the important old information.
*   **LSTM Solution:** Introduce a second, separate path to handle long-term context.
    *   **Lower Path:** Maintains the **Short-Term Memory** (the hidden state, $H_t$).
    *   **Upper Path:** Maintains the **Long-Term Memory** (the **Cell State**, $C_t$).
*   If a piece of information is determined to be important in an early step, it is placed on the Long-Term Memory chain and will carry forward until the end of the sequence, unless it is explicitly removed.

### Analogy: The Human Brain

*   When processing sequential data (like a story), the mind processes it word-by-word (Short-Term Context).
*   The mind constantly works to build **Long-Term Context** by deciding which currently processed events should be deemed important enough to be retained over time (e.g., the primary hero, the time period, the location).
*   Simultaneously, the mind removes previously stored long-term context that is no longer relevant (e.g., if a hero dies, they are removed from the important list).

## V. LSTM Architecture and Communication

*   The LSTM architecture is designed to manage the communication between the Short-Term Memory ($H_t$) and the Long-Term Memory (Cell State, $C_t$).
*   If the Short-Term Memory detects new important information, it communicates this to the Long-Term Memory for addition.
*   If the Short-Term Memory decides something needs to be removed from the Long-Term Memory, it communicates this request as well.
*   This necessary communication makes the LSTM architecture more **complex/difficult** than the simple RNN architecture.
<img src="https://media.geeksforgeeks.org/wp-content/uploads/20250404172141987003/gate_of_lstm.webp">

### The Three Gates

The complex mechanism within the LSTM cell handles this communication using three primary components, called **Gates**:

1.  **Forget Gate**
2.  **Input Gate**
3.  **Output Gate**

#### Gate Functionality (One-Line Description)

| Gate | Function |
| :--- | :--- |
| **Forget Gate** | Decides, based on the current input and short-term context, what information to **remove** (or forget) from the Long-Term Memory. |
| **Input Gate** | Decides, based on the current input, what **new information** should be **added** to the Long-Term Memory. |
| **Output Gate** | Produces the final output based on the current input and the contents of the Long-Term Memory. It is also responsible for creating the Short-Term Memory state for the next step. |

## VI. Summary: LSTM as a Computer

LSTMs can be viewed functionally like a computer (Input $\rightarrow$ Processing $\rightarrow$ Output).

### Inputs (at Time $t$)

The system receives three pieces of input:

1.  **$C_{t-1}$:** The Long-Term Memory (Cell State) from the previous time step.
2.  **$H_{t-1}$:** The Short-Term Memory from the previous time step.
3.  **Current Input Word** ($X_t$).

### Outputs (at Time $t$)

The system provides two outputs:

1.  **$C_t$:** The Long-Term Memory (Cell State) for the next state.
2.  **$H_t$:** The Short-Term Memory (Hidden State) for the next state.

### Internal Processing (Two Main Steps)

The processing inside the cell involves two crucial tasks:

1.  **Update the Long-Term Memory** (removing old information and adding new information).
2.  **Create the Short-Term Memory** for the next state.