[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shang-vikas/series1-coding-exercises/blob/main/exercises/blog-07/exercise-00.ipynb)

# üß™ Exercise 1 ‚Äî Sequential Bottleneck vs Parallel Access

## Goal

Make time dependency visible.

## üîπ What We Demonstrate

- RNN processes tokens one-by-one.
- Self-attention processes all tokens together.
- Hardware utilization differs.
- Architecture dictates compute pattern.

## üß† Code

In [1]:
import torch
import torch.nn as nn
import time

device = "cuda" if torch.cuda.is_available() else "cpu"

batch_size = 32
seq_len = 200
dim = 128

x = torch.randn(batch_size, seq_len, dim).to(device)

# --- RNN ---
rnn = nn.RNN(dim, dim, batch_first=True).to(device)

start = time.time()
_ = rnn(x)
print("RNN forward time:", time.time() - start)

# --- Self-Attention ---
attn = nn.MultiheadAttention(embed_dim=dim, num_heads=1, batch_first=True).to(device)

start = time.time()
_ = attn(x, x, x)
print("Attention forward time:", time.time() - start)

RNN forward time: 0.18280863761901855
Attention forward time: 0.10730290412902832


## üîé What to Observe

- RNN must walk through sequence.
- Attention processes whole sequence at once.
- On GPU, attention is often faster for moderate sequence lengths.

## üí° Conceptual Link

**RNN:**

```
t1 ‚Üí t2 ‚Üí t3 ‚Üí t4 ‚Üí ...
```

**Transformer:**

```
t1
t2
t3   ‚Üê all computed together
t4
```

This proves:

**Parallelism wasn't optimization ‚Äî it was architectural freedom.**