# Day 27: LSTM for Text Sequence Prediction


Today you’ll learn:
1. How text is converted into numbers for neural networks
2. What a character-level text model is
3. How LSTM processes text one character at a time
4. How to build an LSTM model in PyTorch
5. How sequence → memory → next-character prediction works
6. Why LSTM is better than RNN for text

By the end of this notebook, you will understand how language models begin.

If you found this notebook helpful, your **<b style="color:orange;">UPVOTE</b>** would be greatly appreciated! It helps others discover the work and supports continuous improvement.

---

# Import Libraries

In [1]:
import torch
import torch.nn as nn
import numpy as np

# Problem Setup

We are training a next-character prediction model.

Rule:
> Given characters up to time t, predict the character at time t+1.

Example:

- Input text: "hello"
- Training input: "hell"
- Training target: "ello"

Meaning:
- Given characters so far → predict the next character
- This is the foundation of text generation


# Prepare Text Data

In [2]:
text = "hello"

chars = list(set(text))
vocab_size = len(chars)

# Character ↔ index mappings
char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for ch, i in char_to_idx.items()}

print("number of unique characters in our dataset.: ",vocab_size)
print("Index mapping: ", char_to_idx)


number of unique characters in our dataset.:  4
Index mapping:  {'l': 0, 'h': 1, 'e': 2, 'o': 3}


Explanation:

- Each character becomes a class
- This is a classification problem, not regression

# Encode Input & Target Sequences

In [3]:
# Input: h e l l
# Target: e l l o
input_seq = torch.tensor([[char_to_idx[c] for c in text[:-1]]]) # text[:-1] → all except last
target_seq = torch.tensor([[char_to_idx[c] for c in text[1:]]]) # text[1:] → all except first

print("Input sequence line: ", input_seq)
print("Target sequence line: ", target_seq)


Input sequence line:  tensor([[1, 2, 0, 0]])
Target sequence line:  tensor([[2, 0, 0, 3]])


In [4]:
print("Input Shape: ", input_seq.size())
print("Target Shape: ", target_seq.size())

Input Shape:  torch.Size([1, 4])
Target Shape:  torch.Size([1, 4])


Shape meaning:

- Batch size = 1
- Sequence length = 4

# One-Hot Encoding

In [5]:
# (batch_size, sequence_length, vocab_size)
input_onehot = torch.zeros(1, input_seq.size(1), vocab_size) # (1, 4, 4)

for t in range(input_seq.size(1)):
    input_onehot[0, t, input_seq[0, t]] = 1

input_onehot.size()


torch.Size([1, 4, 4])

- 1 → batch size (number of sequences fed at once)
- 4 → sequence length (number of time steps in the sequence)
- 4 → vocab_size (dimension of each input vector at a time step)

  - Time step 0 ('h') → `[1,0,0,0]`
  - Time step 1 ('e') → `[0,1,0,0]`
  - Time step 2 ('l') → `[0,0,1,0]`
  - Time step 3 ('l') → `[0,0,1,0]`

Why one-hot?

- Characters are categorical
- No numerical ordering exists

# Define LSTM Model

In [6]:
class CharLSTM(nn.Module):
    def __init__(self, vocab_size, hidden_size):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=vocab_size,
            hidden_size=hidden_size,
            batch_first=True
        )
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, x):
        out, (h_n, c_n) = self.lstm(x)
        out = self.fc(out)
        return out


Key idea:

- LSTM returns output at every time step
- We predict a character at each step

# Initialize Model, Loss, Optimizer

In [7]:
model = CharLSTM(vocab_size=vocab_size, hidden_size=16)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)


# Forward Pass

In [8]:
outputs = model(input_onehot)

outputs.shape


torch.Size([1, 4, 4])

Output shape:

(batch_size, sequence_length, vocab_size)


Each time step predicts a probability distribution over characters.

# Compute Loss

In [9]:
loss = criterion(
    outputs.view(-1, vocab_size),
    target_seq.view(-1)
)

loss.item()


1.4816588163375854

Explanation:

- Flatten sequence dimension
- Compare predicted vs true characters

# Training Loop

In [10]:
for epoch in range(300):
    optimizer.zero_grad()
    outputs = model(input_onehot)
    
    loss = criterion(
        outputs.view(-1, vocab_size),
        target_seq.view(-1)
    )
    
    loss.backward()
    optimizer.step()

    if epoch % 50 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")


Epoch 0, Loss: 1.4817
Epoch 50, Loss: 0.0770
Epoch 100, Loss: 0.0085
Epoch 150, Loss: 0.0040
Epoch 200, Loss: 0.0025
Epoch 250, Loss: 0.0018


This demonstrates:

- Backpropagation Through Time
- LSTM learning character transitions

# Text Prediction

In [11]:
with torch.no_grad():
    outputs = model(input_onehot)
    predicted_indices = torch.argmax(outputs, dim=2)

predicted_text = "".join(idx_to_char[i.item()] for i in predicted_indices[0])
predicted_text


'ello'

Expected behavior:

- Output should converge toward "ello"

# Why LSTM Works Here

- Text has long-term dependencies
- Vanilla RNN forgets early characters
- LSTM cell state preserves memory
- Gates control information flow


# Key Takeaways from Day 2

- Text modeling is a sequence prediction problem
- Characters are treated as classes
- LSTM processes text step by step
- Outputs predict the next character
- This is the foundation of text generation

---

<p style="text-align:center; font-size:18px;">
© 2026 Mostafizur Rahman
</p>
