# ðŸ§  Introduction to Recurrent Neural Networks (RNNs)

### Welcome to Your First Look at RNNs! ðŸ‘‹

Welcome! In this 2-hour session, we'll dive into the exciting world of Recurrent Neural Networks (RNNs). These are special types of neural networks that have a 'memory,' making them perfect for understanding sequences like text, speech, and time-series data.

Think about how you read a sentence: you understand each word based on the words that came before it. RNNs work in a similar way! 

#### ðŸ“˜ Learning Objectives for Today:

1.  **What is an RNN?** Understand the core idea of 'memory' in neural networks.
2.  **How do they work?** Learn about the simple math behind an RNN's feedback loop.
3.  **Types of RNNs:** Discover different RNN architectures for different problems (e.g., sentiment analysis, text generation).
4.  **Common Challenges:** Learn about the famous 'vanishing gradient' problem.
5.  **Practice:** Get hands-on with simple coding tasks to solidify your understanding.

--- 
## Topic 1: The Core Idea - A Network with Memory

A regular neural network processes each input independently. If you show it two pictures, it doesn't remember the first picture when it sees the second one.

**Recurrent Neural Networks (RNNs) are different.** They have a **loop** that allows them to pass information from one step to the next. This loop acts as a form of memory, called the **'hidden state'**.

This memory allows an RNN to understand context in sequential data. For example, to understand the sentence "The weather was great!", the network needs to remember the word "weather" when it gets to the word "great!".

![RNN Loop](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Recurrent_neural_network_unfold.svg/1024px-Recurrent_neural_network_unfold.svg.png)

In [1]:
# Let's see a simple comparison with code.

# A normal function has no memory of past calls
def normal_function(input_value):
    # It only knows about the current input
    return input_value * 2

print(f"Normal function with input 5: {normal_function(5)}")
print(f"Normal function with input 10: {normal_function(10)}") # This call knows nothing about the previous one

print("\n---------------------\n")

# An RNN-like function that keeps a 'memory' (or state)
memory = 0
def rnn_like_function(input_value):
    global memory # Use the global memory variable
    # The new output depends on the new input AND the old memory
    new_output = input_value + memory 
    # Update the memory for the next call
    memory = new_output
    return new_output

print(f"RNN-like function with input 5: {rnn_like_function(5)}")
print(f"RNN-like function with input 10: {rnn_like_function(10)}") # This output is influenced by the previous call!

Normal function with input 5: 10
Normal function with input 10: 20

---------------------

RNN-like function with input 5: 5
RNN-like function with input 10: 15


### ðŸŽ¯ Practice Task 1

Think about your daily life. Can you describe one real-world task where remembering previous steps is crucial? (e.g., following a recipe). Write your answer in the cell below.

*(Double-click here to write your answer)*

**My Answer:**

--- 
## Topic 2: How RNNs Work - The Recurrent Formula

So how does this 'memory' actually work? It's all about a simple formula that gets repeated at each step (or *time step*) in the sequence.

The core of an RNN is the calculation of the **hidden state**, `h_t`, at a time `t`.

The formula looks like this:

**`h_t = f(W_xh * x_t + W_hh * h_{t-1} + b_h)`**

Let's break that down:
- **`h_t`**: The **new hidden state** (the memory for this step).
- **`x_t`**: The **input** at the current time step (e.g., a vector for a word).
- **`h_{t-1}`**: The **previous hidden state** (the memory from the last step).
- **`W_xh`, `W_hh`**: These are the **weight matrices**. Think of them as knobs that the RNN learns to tune. `W_xh` adjusts the importance of the new input, and `W_hh` adjusts the importance of the old memory.
- **`b_h`**: The **bias**, just another learned parameter.
- **`f`**: An **activation function** (like `tanh` or `ReLU`) that adds non-linearity, helping the network learn complex patterns.
![RNN Loop](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Recurrent_neural_network_unfold.svg/1024px-Recurrent_neural_network_unfold.svg.png)

In [2]:
# Let's calculate one step of an RNN with Python!
import numpy as np

# Let's assume our inputs and states are just single numbers for simplicity.
# In a real RNN, these would be vectors!

# Input at time t=1
x_1 = 10

# Hidden state from the previous step (t=0). Let's start with 0.
h_0 = 0

# Let's define some fixed weights and bias (in a real RNN, these are learned)
W_xh = 0.5
W_hh = 0.2
b_h = 0.1

# We will use the tanh activation function
# In numpy, it's np.tanh()

# Calculate the new hidden state h_1
h_1 = np.tanh(W_xh * x_1 + W_hh * h_0 + b_h)

print(f"Previous Hidden State (h_0): {h_0}")
print(f"Current Input (x_1): {x_1}")
print(f"New Hidden State (h_1): {h_1:.4f}") # .4f formats to 4 decimal places

Previous Hidden State (h_0): 0
Current Input (x_1): 10
New Hidden State (h_1): 0.9999


### ðŸŽ¯ Practice Task 2

Now it's your turn! Using the code cell below, calculate the **next hidden state, `h_2`**. 

Use the `h_1` we just calculated as your new `h_{t-1}` and assume the next input in the sequence, `x_2`, is `20`.

In [None]:
# Your turn! Calculate h_2

# The new input at time t=2
x_2 = 20

# The previous hidden state is the h_1 we calculated above
h_1 = 0.9999 # Let's use the rounded value from the previous output

# The weights and bias are the same
W_xh = 0.5
W_hh = 0.2
b_h = 0.1

# Calculate h_2 using the formula. Replace '...' with your calculation.
h_2 = ...

print(f"New Hidden State (h_2): {h_2:.4f}")

# ðŸ§ª Try changing the values of x_2 or the weights and see how h_2 changes!

--- 
## Topic 3: Training an RNN - Backpropagation Through Time (BPTT)

So we have this formula, but how does the network *learn* the right values for the weights (`W_xh`, `W_hh`)?

It uses a special version of backpropagation called **Backpropagation Through Time (BPTT)**.

**The Idea:**
1.  **Unroll the Network:** Imagine the sequence has 3 time steps. We can 'unroll' the RNN loop into a chain of 3 identical networks. The first network passes its hidden state to the second, the second to the third, and so on.
2.  **Calculate Error:** The network makes predictions at each step, and we calculate the total error for the whole sequence.
3.  **Backpropagate:** The error is sent backward through this unrolled chain, from the last step all the way to the first.
4.  **Update Weights:** As the error travels back, it tells each weight how to adjust itself to reduce the error. Since the weights are shared across all time steps, the updates are combined.

This allows the network to learn from its mistakes across the entire sequence!

### ðŸ’¡ Fun Fact
BPTT is the reason training RNNs can be computationally expensive and slow, especially for very long sequences. You are essentially training one very deep neural network!

--- 
## Topic 4: A Common Challenge - Vanishing & Exploding Gradients

During BPTT, we repeatedly multiply gradients by the `W_hh` weight matrix. This can lead to two major problems:

1.  **ðŸ’¥ Exploding Gradients:** If the weights in `W_hh` are large (e.g., > 1.0), the gradients can get bigger and bigger as they flow backward in time. This is like a snowball rolling down a hill. The training becomes unstable, and the model can't learn.
    *   **Solution:** *Gradient Clipping* - if a gradient gets too big, we just chop it off at a maximum value.

2.  **ðŸ’§ Vanishing Gradients:** If the weights in `W_hh` are small (e.g., < 1.0), the gradients can get smaller and smaller until they effectively disappear. This is like a whisper in a long line of people â€“ by the end, the message is gone. This makes it very hard for the network to learn connections between distant events in a sequence (long-term dependencies).
    *   **Solution:** Use more advanced RNN architectures like **LSTMs** and **GRUs**, which have special 'gates' to control the flow of information and prevent gradients from vanishing.

### ðŸŽ¯ Practice Task 3

Imagine you are training an RNN, and after a few iterations, the weights suddenly become `NaN` (Not a Number). Which problem are you most likely facing?

a) Vanishing Gradients
b) Exploding Gradients
c) Overfitting

*(Double-click here to write your answer, 'a' or 'b' or 'c')*

--- 
## Topic 5: Types of RNN Architectures

RNNs are flexible and can be built in different ways depending on the task. Here are the most common architectures:

1.  **Many-to-One:**
    - **Input:** A sequence (many)
    - **Output:** A single value (one)
    - **Example:** Sentiment analysis. You read a whole movie review (sequence) and classify it as 'positive' or 'negative' (single output).

2.  **One-to-Many:**
    - **Input:** A single value (one)
    - **Output:** A sequence (many)
    - **Example:** Image captioning. You give it one image and it generates a descriptive sentence (sequence of words).

3.  **Many-to-Many (Equal Length):**
    - **Input:** A sequence (many)
    - **Output:** A sequence of the same length (many)
    - **Example:** Named Entity Recognition. For each word in a sentence, you label it as a 'Person', 'Place', etc.

4.  **Many-to-Many (Unequal Length):**
    - **Input:** A sequence (many)
    - **Output:** A sequence of a different length (many)
    - **Example:** Machine translation. You translate a sentence from English to French, and the output sentence will likely have a different number of words.

### ðŸŽ¯ Practice Task 4

Match the problem to the correct RNN architecture. Write your answers in the code cell below.

**Problems:**
1.  Predicting the next word in a sentence while you type.
2.  Classifying an audio clip of a spoken digit (0-9).
3.  Generating a piece of music from a single starting note.

**Architectures:**
- Many-to-One
- One-to-Many
- Many-to-Many

In [None]:
# Fill in your answers here!
answer_1 = "..." # e.g., "Many-to-One"
answer_2 = "..."
answer_3 = "..."

print(f"1. Text prediction is: {answer_1}")
print(f"2. Audio classification is: {answer_2}")
print(f"3. Music generation from a note is: {answer_3}")

âœ… Well done! Understanding these architectures is key to applying RNNs correctly.

--- 
## Topic 6: Example in Action - Sentiment Analysis

Let's walk through how a **Many-to-One** RNN would classify the sentiment of the sentence: **"The movie was great!"**

1.  **Tokenize:** The sentence is split into words (tokens): `["The", "movie", "was", "great!"]`
2.  **Embed:** Each word is converted into a numerical vector (a word embedding). Let's pretend we have a simple embedding:
    - `The` -> `[0.1, 0.2]`
    - `movie` -> `[0.3, 0.5]`
    - `was` -> `[0.1, 0.1]`
    - `great!` -> `[0.9, 0.8]` (This vector might represent positivity)
3.  **Process Sequentially:** The RNN processes one word vector at a time.
    - **Step 1:** Input is `[0.1, 0.2]` (`The`). The RNN calculates a hidden state `h_1`.
    - **Step 2:** Input is `[0.3, 0.5]` (`movie`). The RNN uses this input and `h_1` to calculate `h_2`.
    - **Step 3:** Input is `[0.1, 0.1]` (`was`). The RNN uses this and `h_2` to calculate `h_3`.
    - **Step 4:** Input is `[0.9, 0.8]` (`great!`). The RNN uses this and `h_3` to calculate the final hidden state `h_4`.
4.  **Final Output:** The final hidden state `h_4` now contains information about the entire sentence. It is fed into a final classification layer, which outputs a single number (e.g., `0.95`), indicating a 95% probability that the review is positive!

In [3]:
# This is a pseudo-code example to show the logic.
# We won't build a full RNN, but we can simulate the flow.

def simple_rnn_sentiment(sentence):
    # Step 1 & 2: Tokenize and Embed (let's use a simple score)
    word_scores = {"the": 0.0, "movie": 0.1, "was": 0.0, "great": 1.0, "bad": -1.0}
    tokens = sentence.lower().replace("!", "").split()
    
    # Step 3: Process Sequentially (we'll just sum the scores)
    hidden_state = 0.0
    for token in tokens:
        # The new hidden state is influenced by the old state and the new word
        # This is a massive simplification of the real RNN formula!
        hidden_state += word_scores.get(token, 0.0)
        print(f"Processed '{token}', New Hidden State is: {hidden_state:.1f}")
    
    # Step 4: Final Output
    if hidden_state > 0.5:
        return "Positive"
    else:
        return "Negative"

# Let's test it!
review_1 = "The movie was great"
print(f"Analyzing: '{review_1}'")
print(f"Final Sentiment: {simple_rnn_sentiment(review_1)}\n")

review_2 = "The movie was bad"
print(f"Analyzing: '{review_2}'")
print(f"Final Sentiment: {simple_rnn_sentiment(review_2)}")

Analyzing: 'The movie was great'
Processed 'the', New Hidden State is: 0.0
Processed 'movie', New Hidden State is: 0.1
Processed 'was', New Hidden State is: 0.1
Processed 'great', New Hidden State is: 1.1
Final Sentiment: Positive

Analyzing: 'The movie was bad'
Processed 'the', New Hidden State is: 0.0
Processed 'movie', New Hidden State is: 0.1
Processed 'was', New Hidden State is: 0.1
Processed 'bad', New Hidden State is: -0.9
Final Sentiment: Negative


--- 
## ðŸ§  Final Revision Assignment

Time to put it all together! These tasks are designed for you to practice at home to reinforce what you've learned. Try to solve them on your own.

### Task 1: Calculate a Full Sequence

You are given an input sequence of numbers: `[5, 10, 15]`. 
Your simple RNN has the following parameters:
- Initial hidden state `h_0 = 0`.
- Weights: `W_xh = 0.6`, `W_hh = 0.3`
- Bias: `b_h = 0.0`
- Activation function: `ReLU` (which is `max(0, x)`)

In the code cell below, calculate `h_1`, `h_2`, and `h_3`. The formula is `h_t = max(0, W_xh * x_t + W_hh * h_{t-1} + b_h)`.

In [None]:
# Task 1 Code
import numpy as np

sequence = [5, 10, 15]
h_0 = 0
W_xh = 0.6
W_hh = 0.3
b_h = 0.0

# Define the ReLU function
def relu(x):
    return np.maximum(0, x)

# Calculate h_1 (for x_1 = 5)
x_1 = sequence[0]
h_1 = ... # Your calculation here
print(f"h_1 is: {h_1}")

# Calculate h_2 (for x_2 = 10, using h_1)
x_2 = sequence[1]
h_2 = ... # Your calculation here
print(f"h_2 is: {h_2}")

# Calculate h_3 (for x_3 = 15, using h_2)
x_3 = sequence[2]
h_3 = ... # Your calculation here
print(f"h_3 is: {h_3}")

### Task 2: Explain in Your Own Words

In the markdown cell below, explain the **Vanishing Gradient Problem**. Why does it make it hard for an RNN to learn from words at the beginning of a very long sentence?

*(Double-click to write your explanation here)*

### Task 3: Multiple Choice Review

1. What is the primary advantage of RNNs over traditional feedforward neural networks?
    a) They are faster to train.
    b) They can handle sequential data by maintaining a memory of past inputs.
    c) They require less data to train.

2. More advanced RNNs like LSTMs use 'gates' primarily to solve what problem?
    a) Overfitting
    b) The need for more data
    c) The vanishing and exploding gradient problems

### Task 4: Propose an Architecture

A company wants to build a chatbot that answers customer questions. The chatbot will read a customer's question (a sequence of words) and generate a relevant answer (another sequence of words).

What RNN architecture would be most suitable for this, and why? 

*(Write your answer in the markdown cell below.)*

*(Double-click to write your answer here)*

### Task 5: What's the Role of the Hidden State?

Briefly explain the role of the hidden state in a Recurrent Neural Network. What information does it carry?

*(Double-click to write your answer here)*

--- 
## Summary & Further Learning

### ðŸŽ¯ Key Takeaways

- **RNNs are for Sequential Data:** Their defining feature is the ability to process sequences by maintaining a 'memory' or **hidden state**.
- **The Power of the Loop:** The recurrent connection allows information to be passed from one time step to the next.
- **Training with BPTT:** RNNs are trained using Backpropagation Through Time.
- **Beware of Gradients:** Simple RNNs are susceptible to the vanishing and exploding gradient problems.
- **Architectural Diversity:** RNNs come in various forms (many-to-one, one-to-many, etc.) to suit different problems.

Congratulations on completing this introduction to RNNs! This is a foundational concept in modern AI, and understanding it well will help you learn more advanced models like LSTMs, GRUs, and Transformers.

### ðŸ”— Related Study Resources

*   **[Understanding LSTM Networks by Christopher Olah](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)**: A classic and highly intuitive blog post explaining the inner workings of LSTMs.
*   **[Coursera: "Sequence Models" by Andrew Ng](https://www.coursera.org/learn/nlp-sequence-models)**: A comprehensive online course covering RNNs, LSTMs, GRUs, and their applications.
*   **[TensorFlow RNN Guide](https://www.tensorflow.org/guide/keras/rnn)**: Official tutorial for implementing RNNs in TensorFlow.
*   **[PyTorch RNN Tutorial](https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)**: Official tutorial for implementing RNNs in PyTorch.