```{contents}
```

## Workflows

RNNs model **sequential or time-dependent data** such as text, audio, and signals.
They process one element of a sequence at a time and retain **context** using a **hidden state**.

Goal example:
Input = “The food is good” → Output = Sentiment = Positive.

---

### End-to-End Workflow

#### Step 1 – Data Preparation

1. **Collect** sequential data $text, audio, series).
2. **Clean & tokenize** it into ordered elements $words, timesteps).
3. **Vectorize / embed** each token into numeric form:

   * One-Hot Encoding, TF-IDF, or pretrained Embeddings $Word2Vec, GloVe, etc.).
4. **Pad / truncate** to equal sequence length if batched.
5. **Split** into train / validation / test sets.

---

#### Step 2 – Define Network Architecture

1. **Input Layer:** shape = $sequence_length, input_dim).
2. **RNN Layer's:**

   * Simple RNN / LSTM / GRU cells.
   * Each cell computes
     $$
     h_t = f(W_x x_t + W_h h_{t-1} + b_h)
     $$
3. **Output Layer:**

   * Dense + activation (sigmoid/softmax) for classification or regression.

Optional:

* Dropout for regularization.
* Bidirectional RNN for context from both directions.

---

#### Step 3 – Forward Propagation (Computation Phase)

For each time step $t$:

1. **Receive** input $x_t$.
2. **Combine** with previous hidden state $h_{t-1}$:
   $$
   h_t = f(W_x x_t + W_h h_{t-1} + b_h)
   $$
3. **Generate** output:
   $$
   y_t = g(W_y h_t + b_y)
   $$
4. **Store** $h_t$ → used by the next time step.

At the final step, output or loss is computed.

---

#### Step 4 – Loss Computation

Compute error between predictions and true labels:
$$
L = \sum_{t=1}^{T} \ell(y_t, \hat{y_t})
$$
Common losses: cross-entropy (classification), MSE (regression).

---

#### Step 5 – Backward Propagation Through Time (BPTT)

1. **Unroll** the RNN across all timesteps.
2. Apply the **chain rule** backward through each step:

   * Derivatives of loss w.r.t. weights $W_x, W_h, W_y$.
   * Accumulate gradients across timesteps.
3. **Handle gradient issues:**

   * Clip exploding gradients.
   * Use LSTM/GRU for vanishing gradients.

---

#### Step 6 – Parameter Update

Use an optimizer (SGD, Adam, RMSprop) to update weights:
$$
W \leftarrow W - \eta \frac{\partial L}{\partial W}
$$

---

#### Step 7 – Iteration

Repeat Steps 3 → 6 for many epochs until convergence (loss stabilizes or accuracy saturates).

---

#### Step 8 – Evaluation & Inference

1. Evaluate on validation/test data (accuracy, F1-score, perplexity).
2. For inference, feed one input sequence and obtain predictions step-wise.

---

**Conceptual Flow Summary**

| Phase         | Function                       | Details                             |
| ------------- | ------------------------------ | ----------------------------------- |
| Data Prep     | Convert sequence → numeric     | Tokenize, embed                     |
| Forward Pass  | Compute predictions            | Hidden = f(Input + Previous Hidden) |
| Loss          | Compare output to label        | Cross-entropy, MSE                  |
| Backward Pass | Compute gradients through time | BPTT                                |
| Update        | Adjust parameters              | SGD / Adam                          |
| Evaluate      | Measure model performance      | Accuracy, loss                      |
| Deploy        | Predict on new sequences       | Stepwise inference                  |

---

**Challenges and Mitigations**

| Problem                   | Cause                                    | Solution                       |
| ------------------------- | ---------------------------------------- | ------------------------------ |
| Vanishing gradient        | Small derivatives through many timesteps | LSTM / GRU / gradient clipping |
| Exploding gradient        | Large derivatives                        | Gradient clipping              |
| Long-term dependency      | Limited memory of simple RNN             | Use LSTM / GRU                 |
| Sequential training speed | Non-parallel nature                      | Truncated BPTT / transformers  |

---

**Compact Mathematical Recap**

$$
\begin{align*}
h_t &= f(W_x x_t + W_h h_{t-1} + b_h) \
y_t &= g(W_y h_t + b_y) \
L &= \sum_t \ell(y_t, \hat{y_t})
\end{align*}
$$

Gradients:
$$
\frac{\partial L}{\partial W_x},;
\frac{\partial L}{\partial W_h},;
\frac{\partial L}{\partial W_y}
\Rightarrow \text{updated via gradient descent.}
$$

---

**In short**

RNN workflow =
**Data prep → Sequence input → Forward pass → Compute loss → Backprop through time → Update weights → Evaluate.**

It is the same loop as a feedforward network, but **extended through time** using a **shared hidden state** that carries past context.