# What is an RNN? (Theory)

RNN is a neural network used for sequential data where order matters. Recurrent Neural Networks (RNNs) are deep learning models designed for sequential data (text, speech, time-series) that use internal memory (hidden states) to retain information from previous inputs, allowing them to understand context and relationships in sequences, unlike feedforward networks that process inputs independently.

Examples:

Text (sentence, words)

Time series (weather, stock price)

Speech

Sensor data

üìå Key idea:
RNN remembers previous information and uses it for current prediction.

# Why RNN? (Problem with ANN & CNN)
‚ùå ANN

No memory

Treats each input separately

‚ùå CNN

Best for images

Cannot understand sequence order

‚úÖ RNN

Has memory

Uses past information

Works with sequences

# How RNN Works (Theory)

At each time step:

ùë°
ùëé
ùëõ
‚Ñé
(
ùëä
ùë•
ùë•
ùë°
+
ùëä
‚Ñé
‚Ñé
ùë°
‚àí
1
+
ùëè
)

Where:

x_t ‚Üí current input

h_{t-1} ‚Üí previous memory

h_t ‚Üí current memory (hidden state)

üìå Hidden state = memory of RNN

* Hidden State (Memory): At each step, an RNN takes the current input and the previous hidden state to compute the new hidden state, creating a "memory" of past information.
* Shared Weights: They use the same weights across all time steps, enabling them to process sequences of varying lengths and learn patterns.
* Cyclic Connections: This internal loop allows information to persist, making them powerful for temporal data. 

# Key Applications
* Natural Language Processing (NLP): Text generation, sentiment analysis, machine translation, language modeling, and named entity recognition.
* Speech Recognition: Converting spoken language into text.
* Time-Series Forecasting: Predicting future values based on historical data, like stock prices or weather.
* Image Captioning: Generating descriptions for images

# Challenges & Solutions
* Vanishing/Exploding Gradients: Gradients can become too small (vanish) or too large (explode) during backpropagation through long sequences, making learning difficult.
* Long-Term Dependencies: Difficulty capturing relationships between distant events in a sequence.
* Solutions:
* LSTMs/GRUs: Gated mechanisms (forget, input, output gates) control information flow, effectively solving the long-term dependency problem.
* Bidirectional RNNs (BiRNNs): Process sequences in both forward and backward directions for richer context.
* Deep RNNs: Stacking multiple RNN layers to learn complex representations

# RNN Architecture (Simple)

* Input sequence:  x1 ‚Üí x2 ‚Üí x3
*                  ‚Üì    ‚Üì    ‚Üì
* Hidden state:    h1 ‚Üí h2 ‚Üí h3
*                 ‚Üì
* Output:                 y             

In [1]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense


# RNN Input Shape

* (samples, time_steps, features)
* (100, 5, 1)   # 100 sequences, Each sequence has 5 time steps, 1 feature per step

In [2]:
X = np.array([
    [[1], [2], [3]],
    [[2], [3], [4]],
    [[3], [4], [5]]
])

y = np.array([4, 5, 6])

In [3]:
X.shape = (3, 3, 1)

In [4]:
model = Sequential()

model.add(SimpleRNN(
    units=10,
    activation='tanh',
    input_shape=(3, 1)
))

model.add(Dense(1))

  super().__init__(**kwargs)


In [5]:
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

In [6]:
model.fit(X, y, epochs=200, verbose=0)

<keras.src.callbacks.history.History at 0x2755c83f4a0>

In [7]:
test_input = np.array([[[4], [5], [6]]])
prediction = model.predict(test_input)

print("Predicted value:", prediction)

[1m1/1[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 149ms/step
Predicted value: [[2.856124]]
