# Module 4 — Program A: Basic Sequence Modelling with RNN (Character-level)

**Covers:** Sequence modelling, SimpleRNN, backpropagation through time, sequence generation, vanishing gradients (short vs long sequences).

**Question (Apply – L3):** Implement a character-level RNN to learn next-character prediction on a small corpus and generate text.

**Question (Analyze – L4):** Analyze how sequence length affects prediction quality and discuss vanishing gradients observed in training.


In [None]:
import numpy as np, tensorflow as tf, matplotlib.pyplot as plt
tf.random.set_seed(42)
np.random.seed(42)

# Toy corpus (public‑domain snippets). Feel free to replace with your own text.
corpus = (
    "deep learning enables computers to learn from data. "
    "recurrent neural networks process sequences step by step. "
    "lstm and gru mitigate vanishing gradients in long sequences. "
    "attention and transformers capture long range dependencies."
)
chars = sorted(list(set(corpus)))
stoi = {c:i for i,c in enumerate(chars)}
itos = {i:c for c,i in stoi.items()}
vocab_size = len(chars)
vocab_size

In [None]:
# Build (X, y) for next-char prediction
def make_xy(text, seq_len=40, step=3):
    X, y = [], []
    for i in range(0, len(text)-seq_len-1, step):
        seq = text[i:i+seq_len]
        nxt = text[i+seq_len]
        X.append([stoi[c] for c in seq])
        y.append(stoi[nxt])
    X = np.array(X, dtype=np.int32)
    y = np.array(y, dtype=np.int32)
    return X, y

SEQ_LEN = 40
X, y = make_xy(corpus, seq_len=SEQ_LEN, step=1)
X.shape, y.shape

In [None]:
# One-hot encode inputs for SimpleRNN (or use Embedding + RNN). We'll use Embedding for efficiency.
from tensorflow.keras import layers, models

embed_dim = 32
rnn_units = 64

inputs = layers.Input(shape=(SEQ_LEN,), dtype='int32')
x = layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)(inputs)
x = layers.SimpleRNN(rnn_units, return_sequences=False)(x)
outputs = layers.Dense(vocab_size, activation='softmax')(x)
model = models.Model(inputs, outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
# Train/val split
n = len(X)
idx = np.arange(n)
np.random.shuffle(idx)
tr = int(0.85*n)
X_train, y_train = X[idx[:tr]], y[idx[:tr]]
X_val, y_val = X[idx[tr:]], y[idx[tr:]]

hist = model.fit(X_train, y_train, validation_data=(X_val, y_val),
                 epochs=25, batch_size=64, verbose=0)

fig, ax = plt.subplots()
ax.plot(hist.history['loss'], label='train')
ax.plot(hist.history['val_loss'], label='val')
ax.set_title('Training vs Validation Loss')
ax.set_xlabel('Epoch'); ax.set_ylabel('Loss'); ax.legend(); plt.show()

print('Final val accuracy:', round(hist.history['val_accuracy'][-1], 4))

In [None]:
# Text generation helper
def sample(probs, temperature=1.0):
    probs = np.asarray(probs).astype('float64')
    if temperature != 1.0:
        probs = np.log(probs + 1e-9) / temperature
        probs = np.exp(probs)
        probs = probs / np.sum(probs)
    return np.random.choice(len(probs), p=probs)

def generate_text(prefix, length=200, temperature=0.8):
    s = prefix
    for _ in range(length):
        # ensure length SEQ_LEN input
        seed = s[-SEQ_LEN:]
        seed_idx = np.array([[stoi.get(c, 0) for c in seed.ljust(SEQ_LEN)[:SEQ_LEN]]])
        p = model.predict(seed_idx, verbose=0)[0]
        idx = sample(p, temperature)
        s += itos[idx]
    return s

print(generate_text('deep learning ', length=200, temperature=0.7))

### Analyze (L4)
- Re-run with **SEQ_LEN = 20 vs 60** and compare loss/accuracy and generated text coherence.
- Observe if longer sequences make optimization harder (hint: vanishing gradients), and justify your findings.
