
# LSTM Language Model (Character-Level)

**Task:** Language Modeling using LSTM  
**Dataset:** Small Shakespeare excerpt (domain-specific)  
**Framework:** PyTorch  

This notebook demonstrates how to build, train, and evaluate a character-level LSTM language model.
We focus on:
- Clean implementation
- Clear explanation
- Hyperparameter choices



## 1. Imports and Hyperparameters


In [2]:

import torch
import torch.nn as nn
import torch.optim as optim
from collections import Counter
import numpy as np



### Hyperparameters
- `seq_length`: length of input character sequence
- `embedding_dim`: size of character embeddings
- `hidden_dim`: number of LSTM hidden units
- `num_layers`: stacked LSTM layers
- `lr`: learning rate


In [6]:

seq_length = 40
embedding_dim = 64
hidden_dim = 128
num_layers = 2
lr = 0.003
epochs = 50
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")



## 2. Dataset (Shakespeare Excerpt)


In [7]:

text = (
    "To be, or not to be, that is the question:\n"
    "Whether 'tis nobler in the mind to suffer\n"
    "The slings and arrows of outrageous fortune,"
)

chars = sorted(list(set(text)))
vocab_size = len(chars)

char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for ch, i in char_to_idx.items()}

encoded = torch.tensor([char_to_idx[ch] for ch in text], dtype=torch.long)



## 3. Create Input Sequences


In [8]:

def create_sequences(data, seq_length):
    inputs = []
    targets = []
    for i in range(len(data) - seq_length):
        inputs.append(data[i:i+seq_length])
        targets.append(data[i+1:i+seq_length+1])
    return torch.stack(inputs), torch.stack(targets)

X, y = create_sequences(encoded, seq_length)
X, y = X.to(device), y.to(device)



## 4. LSTM Language Model


In [9]:

class LSTMLanguageModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.lstm(x)
        out = self.fc(out)
        return out

model = LSTMLanguageModel(vocab_size, embedding_dim, hidden_dim, num_layers).to(device)



## 5. Training Setup


In [10]:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)



## 6. Training Loop


In [None]:
for epoch in range(1, epochs + 1):
    model.train()        
    optimizer.zero_grad()

    output = model(X)
    loss = criterion(output.view(-1, vocab_size), y.view(-1))

    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}/{epochs}, Loss: {loss.item():.4f}")


Epoch 10/50, Loss: 0.1434
Epoch 20/50, Loss: 0.1013
Epoch 30/50, Loss: 0.0816
Epoch 40/50, Loss: 0.0713
Epoch 50/50, Loss: 0.0652



## 7. Text Generation


In [17]:

def generate_text(model, start_str, length=200):
    model.eval()
    input_seq = torch.tensor([char_to_idx[ch] for ch in start_str], dtype=torch.long).unsqueeze(0).to(device)
    generated = list(start_str)

    with torch.no_grad():
        for _ in range(length):
            output = model(input_seq)
            last_char_logits = output[:, -1, :]
            prob = torch.softmax(last_char_logits, dim=-1)
            next_char = torch.multinomial(prob, 1).item()

            generated.append(idx_to_char[next_char])
            input_seq = torch.cat([input_seq[:, 1:], torch.tensor([[next_char]]).to(device)], dim=1)

    return "".join(generated)

print(generate_text(model, "To be", 100))


To be, that is the mind to suffer
The slings and to suffer
The slings and to suffer
The slings and to sbl



## 8. Conclusion

- LSTM captures long-term character dependencies
- Domain-specific data improves coherence
- Hyperparameters strongly affect performance

This fulfills the **Language Modeling using LSTM** task.
