# Recurrent Neural Networks (RNNs) and LSTM

---

## RNN Basics and Applications

**Definition:**  
A **Recurrent Neural Network (RNN)** is a type of neural network designed to process sequential data by maintaining a memory of previous inputs. Unlike traditional feedforward networks, RNNs have loops that allow information to persist across time steps.

**Key Features:**
- Handles sequences of variable length (e.g., text, time series).  
- Maintains hidden states to capture temporal dependencies.  

**Applications:**
- Time series prediction (e.g., stock prices, weather forecasting)  
- Natural language processing (e.g., language modeling, text generation)  
- Speech recognition  
- Anomaly detection in sequences  

**Limitations:**
- Struggles with **long-term dependencies** due to vanishing or exploding gradients.

---

## Long Short-Term Memory (LSTM) Networks

**Definition:**  
**LSTM** is a type of RNN designed to overcome the limitations of standard RNNs. LSTM networks introduce **gates** to control the flow of information, making it easier to learn long-term dependencies.

**Key Components of LSTM:**
1. **Forget Gate:** Decides which information to discard from the cell state.  
2. **Input Gate:** Decides which new information to store in the cell state.  
3. **Cell State:** Acts as a memory that carries relevant information across time steps.  
4. **Output Gate:** Determines the output based on the cell state.

**Applications:**
- Sequence prediction  
- Language translation  
- Sentiment analysis  
- Speech recognition  

---

## Time Series Forecasting with RNNs

**Definition:**  
Time series forecasting involves predicting future values based on previously observed sequential data. RNNs and LSTMs are particularly effective because they can capture temporal dependencies in sequential datasets.

**Key Steps:**
1. Preprocess the time series data (normalization, windowing).  
2. Use RNN/LSTM layers to learn temporal patterns.  
3. Train the model on historical data.  
4. Predict future values.

**Applications:**
- Stock market prediction  
- Weather and climate forecasting  
- Energy demand prediction  
- Sales and inventory forecasting  

**Advantages of LSTM over RNN:**
- Handles long-term dependencies  
- Avoids vanishing/exploding gradient problem  

---

## NLP with LSTM & GRU

**Definition:**  
LSTM and GRU (Gated Recurrent Unit) networks are widely used in **Natural Language Processing (NLP)** tasks because they can capture sequential dependencies in text.

**Key Concepts:**
- **GRU:** A simplified version of LSTM with fewer gates (reset and update), often faster to train with comparable performance.  
- Both LSTM and GRU can remember context in sequences, making them suitable for tasks requiring understanding of word order and context.

**Applications in NLP:**
- Text generation and summarization  
- Language translation  
- Sentiment analysis  
- Named Entity Recognition (NER)  

**Comparison:**
| Feature | LSTM | GRU |
|---------|------|-----|
| Gates | 3 (Input, Forget, Output) | 2 (Update, Reset) |
| Memory | Separate cell state | Combined hidden state |
| Training Speed | Slower | Faster |
| Performance | Often slightly better | Competitive for smaller datasets |


In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

# Example sequence data: y = sum of previous two numbers
X = np.array([[0,1],[1,2],[2,3],[3,4],[4,5]], dtype=float)
y = np.array([1,3,5,7,9], dtype=float)

# Reshape input for RNN [samples, time_steps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))

# Build RNN model
model = Sequential()
model.add(SimpleRNN(10, activation='relu', input_shape=(X.shape[1], 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# Train
model.fit(X, y, epochs=200, verbose=0)

# Predict
test_input = np.array([5,6]).reshape((1,2,1))
print("Prediction:", model.predict(test_input))


In [2]:
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Sample corpus
corpus = [
    "I love machine learning",
    "I love deep learning",
    "I enjoy natural language processing",
    "Deep learning is amazing"
]

# Tokenize text
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1

# Create input sequences
input_sequences = []
for line in corpus:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

# Pad sequences
max_seq_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre'))

# Split into predictors and label
X = input_sequences[:, :-1]
y = input_sequences[:, -1]

# One-hot encode output
from tensorflow.keras.utils import to_categorical
y = to_categorical(y, num_classes=total_words)

# Build LSTM model
model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_seq_len-1))
model.add(LSTM(50))
model.add(Dense(total_words, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

# Train
model.fit(X, y, epochs=300, verbose=0)

# Predict next word
seed_text = "I love"
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
predicted = model.predict(token_list, verbose=0)
predicted_word = tokenizer.index_word[np.argmax(predicted)]
print("Next word prediction:", predicted_word)




Next word prediction: deep
