### Recurrent Neural Networks (RNNs) - Explained

### What is an RNN?
A **Recurrent Neural Network (RNN)** is a type of neural network that is designed to handle **sequential data**, such as **time series, speech, text, and video**. Unlike traditional neural networks, RNNs have **memory** that enables them to process and learn from previous inputs.

---

###  **Architecture of an RNN**
### **1️ Standard Neural Network vs. RNN**
| Feature            | Standard Neural Network | Recurrent Neural Network |
|--------------------|------------------------|--------------------------|
| Input Type        | Independent Data Points | Sequential Data (Time-dependent) |
| Memory            | No memory               | Has memory (stores past information) |
| Suitable For      | Image Classification    | Text, Speech, Time-Series Data |

### **2️ RNN Structure**
- **Recurrent Connection:** Unlike a feedforward network, an RNN loops back on itself, allowing information to persist.
- **Hidden State:** Maintains information from previous time steps.
- **Mathematical Representation:**
  \[
  h_t = f(W_h h_{t-1} + W_x X_t + b)
  \]
  where:
  - \( h_t \) = current hidden state
  - \( X_t \) = current input
  - \( W_h \), \( W_x \) = weight matrices
  - \( b \) = bias
  - \( f \) = activation function (e.g., **tanh**)

---

###  **How an RNN Works**
###  **Example: Predicting the next word in a sentence**
1️⃣ **Input:** "The cat sat on the" → Predicts "mat"  
2️⃣ **Processing:** Each word is fed sequentially into the RNN.  
3️⃣ **Memory:** Hidden states store past word relationships.  
4️⃣ **Output:** The network predicts the next word.  

---

###  **Types of RNN Architectures**
### **1️⃣ Many-to-One (Sentiment Analysis)**
- Example: Given a **sentence**, predict a **sentiment** (Positive/Negative).
- Input: `"I love this movie"` → Output:  (Positive)

### **2️⃣ One-to-Many (Music Generation)**
- Example: Given a **note**, generate an entire **music sequence**.
- Input: 🎵 → Output: 🎶🎶🎶

### **3️⃣ Many-to-Many (Machine Translation)**
- Example: Translating English to French.
- Input: `"Hello, how are you?"`
- Output: `"Bonjour, comment ça va?"`

---

### **Challenges in RNNs**
### 🔹 **1. Vanishing Gradient Problem**
- When backpropagating through time, gradients **become too small**, causing earlier layers to learn very slowly.
- **Solution:** Use **LSTMs or GRUs** instead of vanilla RNNs.

### 🔹 **2. Exploding Gradient Problem**
- Gradients **become too large**, making the model unstable.
- **Solution:** Apply **gradient clipping**.

---

### **LSTMs (Long Short-Term Memory)**
LSTMs are an advanced type of RNN that solves the **vanishing gradient problem** by introducing **gates**:
1️⃣ **Forget Gate:** Decides what information to discard.  
2️⃣ **Input Gate:** Determines which new information to add.  
3️⃣ **Output Gate:** Controls the final output.  

###  **LSTM Cell Diagram**
⬜ Input (Xt) → 🟨 Forget Gate → 🟩 Input Gate → 🔵 Output Gate → 🟠 Hidden State


### Applications of RNNs
-Speech Recognition (e.g., Siri, Google Assistant)
- Machine Translation (e.g., Google Translate)
- Stock Market Prediction (e.g., Forecasting Prices)
- Chatbots & Conversational AI (e.g., ChatGPT)
- Music & Text Generation (e.g., AI-generated stories, songs)

### Summary
- RNNs process sequential data by using hidden states.
- They suffer from vanishing gradients, which LSTMs and GRUs solve.
- Used in speech, text, finance, and forecasting tasks.


In [2]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras import layers, models


In [3]:
# Load and preprocess data
max_features = 10000
max_len = 500
(X_train,y_train),(X_test, y_test) = imdb.load_data(num_words=max_features)
X_train = sequence.pad_sequences(X_train, maxlen=max_len)
X_test = sequence.pad_sequences(X_test, maxlen=max_len)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In [4]:
# Build model
model = models.Sequential()
model.add(layers.Embedding(max_features, 32))
model.add(layers.SimpleRNN(32))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 32)          320000    
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                2080      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 322113 (1.23 MB)
Trainable params: 322113 (1.23 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [5]:
# compile and train the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

2025-02-27 15:28:24.230937: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 40000000 exceeds 10% of free system memory.


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7f5aebe65820>

In [6]:
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test,y_test)
print(f"Test accuracy: {test_acc}")

  3/782 [..............................] - ETA: 22s - loss: 0.6996 - accuracy: 0.8333

2025-02-27 15:34:51.971803: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 50000000 exceeds 10% of free system memory.


Test accuracy: 0.8086400032043457
