## Tutorial: GRU Language Model with Subword Tokenization (BPE)
 
**Target Audience:** MSc Computer Science Students  
**Topic:** NLP, Recurrent Neural Networks, Tokenization

## 1. Introduction
 
In the previous tutorial, we predicted text character-by-character. While simple, character-level models suffer from long sequence lengths (vanishing gradients) and lack semantic density in individual inputs.

In this tutorial, we advance to **Subword Tokenization** using **Byte-Pair Encoding (BPE)**.
 
### Why Subword Tokenization?
1.  **Efficiency:** Sequences are much shorter than character sequences, allowing the GRU to capture context over a longer text span.
2.  **Open Vocabulary:** Unlike strict word-level models (which fail on "Unknown" words), subword models can construct unknown words from known sub-parts (e.g., "unfriendly" $\rightarrow$ "un", "friend", "ly").

## 2. Setup and Dependencies
 
We will use the `tokenizers` library from Hugging Face for efficient BPE training.


In [1]:
# !pip install tokenizers

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import matplotlib.pyplot as plt
import time
from tokenizers import Tokenizer, models, pre_tokenizers, trainers, decoders

In [2]:
# Set seed for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

# --- CRITICAL IMPORTS FOR TPU/XLA ---
# Note: torch_xla is only needed for Google Cloud TPU environments
# This notebook will work fine without it on CPU/CUDA/MPS
XLA_AVAILABLE = False
xm = None

# Uncomment the following lines if running on TPU:
# try:
#     import torch_xla.core.xla_model as xm
#     XLA_AVAILABLE = True
# except ImportError:
#     XLA_AVAILABLE = False
#     print("WARNING: torch_xla not found. Running on CPU/CUDA fallback.")
# --- END XLA IMPORTS ---

# Set device for PyTorch operations
if XLA_AVAILABLE:
    # Use xm.xla_device() to get the primary TPU core device
    DEVICE = xm.xla_device()
    N_DEVICES = 1 # Force single device count
    print(f"Using Single XLA Device: {DEVICE}")
elif torch.cuda.is_available():
    torch.backends.cudnn.benchmark = True
    torch.backends.cudnn.enabled = True
    torch.cuda.manual_seed_all(SEED)
    DEVICE = torch.device('cuda')
elif torch.backends.mps.is_available():
    DEVICE = torch.device('mps')
else:
    DEVICE = torch.device('cpu')

print(f'Using device: {DEVICE}')


# Uncomment the following lines if running on Google Colab
# from google.colab import drive
# drive.mount('/content/drive')

Using device: mps



## 3. The Dataset & Tokenizer Training
 
#### We use the same short stories dataset. However, we must now **train** a tokenizer to learn the most frequent subword patterns in this specific text.


In [3]:
RAW_TEXT = """
Title: The Wolf and the Lamb.
Wolf, meeting with a Lamb astray from the fold, resolved not to lay violent hands on him, but to find some plea to justify to the Lamb the Wolf's right to eat him. 
He thus addressed him: "Sirrah, last year you grossly insulted me." 
"Indeed," bleated the Lamb in a mournful tone of voice, "I was not then born." 
Then said the Wolf, "You feed in my pasture." 
"No, good sir," replied the Lamb, "I have not yet tasted grass." 
Again said the Wolf, "You drink of my well." 
"No," exclaimed the Lamb, "I never yet drank water, for as yet my mother's milk is both food and drink to me.
" Upon which the Wolf seized him and ate him up, saying, "Well! I won't remain supperless, even though you refute every one of my imputations.
" The tyrant will always find a pretext for his tyranny.

Title: The Bat and the Weasels.
A Bat who fell upon the ground and was caught by a Weasel pleaded to be spared his life. 
The Weasel refused, saying that he was by nature the enemy of all birds. 
The Bat assured him that he was not a bird, but a mouse, and thus was set free. 
Shortly afterwards the Bat again fell to the ground and was caught by another Weasel, whom he likewise entreated not to eat him. 
The Weasel said that he had a special hostility to mice. 
The Bat assured him that he was not a mouse, but a bird, and thus escaped. 
It is wise to turn circumstances to good account.

Title: The Ass and the Grasshopper.
An Ass having heard some Grasshoppers chirping, was highly enchanted; and, desiring to possess the like charms of melody, demanded what sort of food they lived on to give them such beautiful voices. 
They replied, "The dew." The Ass resolved that he would live only upon dew, and in a short time died of hunger.

Title: The Lion and the Mouse.
A Lion was awakened from sleep by a Mouse running over his face. 
Rising up angrily, he caught him and was about to kill him, when the Mouse piteously entreated, saying: "If you would only spare my life, I would be sure to repay your kindness." 
The Lion laughed and let him go. 
It happened shortly after this that the Lion was caught by some hunters, who bound him by strong ropes to the ground. 
The Mouse, recognizing his roar, came and gnawed the rope with his teeth, and set him free, exclaiming: "You ridiculed the idea of my ever being able to help you, not expecting to receive from me any repayment of your favor; now you know that it is possible for even a Mouse to con benefits on a Lion."

Title: The Gruffalo.
A mouse took a stroll through the deep dark wood.
A fox saw the mouse, and the mouse looked good.
"Where are you going to, little brown mouse?
Come and have lunch in my underground house.”

"It's terribly kind of you, Fox, but no- I'm going to have lunch with a Gruffalo."
"A Gruffalo? What's a Gruffalo?"
A Gruffalo! Why, didn't you know?
“He has terrible tusks,
and terrible claws, and terrible teeth in his terrible jaws."
"Where are you meeting him?"
"Here, by these rocks, And his favourite food is roasted fox."
"Roasted fox! I'm off!" Fox said.
"Goodbye, little mouse," and away he sped.

"Silly old Fox! Doesn't he know
There's no such thing as a Gruffalo?"

On went the mouse through the deep dark
wood. An owl saw the mouse, and the mouse looked good.
"Where are you going to, little brown mouse?
Come and have tea in my treetop house."
"It's terribly kind of you, Owl, but no
I'm going to have tea with a Gruffalo."
"A Gruffalo? What's a Gruffalo?"
"A Gruffalo! Why, didn't you know?
He has knobbly knees,and turned out toes,
"Where are you meeting him?"
"Here, by this stream,
And his favourite food is owl ice cream."
"Owl ice cream! Too whit too whoo!"
"Goodbye, little mouse,"
and away Owl flew.
"Silly old Owl! Doesn't he know,
There's no such thing as a Gruffalo?"

On went the mouse through the deep dark wood.
A snake saw the mouse, and the mouse looked good.
"Where are you going to, little brown mouse?
Come for a feast in my log pile house."
"It's wonderfully good of you, Snake, but no I'm having a feast with a Gruffalo."
"A Gruffalo? What's a Gruffalo?"
" A Gruffalo! Why, didn't you know?
His eyes are orange, his tongue is black,
He has purple prickles all over his back."
"Where are you meeting him?"
"Here, by this lake,
And his favourite food is scrambled snake."
"Scrambled snake! It's time I hid!"
"Goodbye, little mouse,"
and away Snake slid.
"Silly old Snake! Doesn't he know,
There's no such thing as a Gruffal…..?“Oh!”

But who is this creature with terrible claws
And terrible teeth in his terrible jaws?
He has knobbly knees, and turned
out toes,
And a poisonous wart at the end of his nose.
His eyes are orange, his tongue is black,
He has purple prickles all over his back.
"Oh help! Oh no! It's a Gruffalo!

"My favourite food !" the Gruffalo
"You'll taste good on a slice of bread!"
"Good?" said the mouse. "Don't call me good!
I'm the scariest creature in this wood.
Just walk behind me and soon you'll see,
Everyone is afraid of me."
"All right," said the Gruffalo, bursting with laughter.
"You go ahead and I'll follow after."
They walked and walked till the Gruffalo said,
"I hear a hiss in the leaves ahead."

"It's Snake," said the mouse.
"Why, Snake,hello!"
Snake took one look at the Gruffalo.
"Oh crumbs!" he said, "Goodbye, little mouse!"
And off he slid to his log pile house.
"You see?" said the mouse. "I told you so."
"Amazing!" said the Gruffalo.
They walked some more till the Gruffalo said,
“I hear a hoot in the trees ahead."

"It's Owl," said the mouse. "Why, Owl, hello!"
Owl took one look at the Gruffalo.
"Oh dear!" he said, "Goodbye, little mouse!" 
And off he flew to his treetop house.
"You see?" said the mouse. "I told you so."
"Astounding!" said the Gruffalo.
They walked some more till
the Gruffalo said,
"I can hear feet on the path ahead."

"It's Fox," said the mouse.
"Why, Fox, hello!"
Fox took one look at the Gruffalo.
"Oh help!" he said, "Goodbye, little mouse!"
And off he ran to his underground house.
"Well, Gruffalo," said the mouse. "You see?
Everyone is afraid of me!
But now my tummy's beginning to rumble.
My favourite food is Gruffalo crumble!"
"Gruffalo crumble!" the Gruffalo said,
And quick as the wind he turned and fled.

All was quiet in the deep dark wood.
The mouse found a nut and the nut was good. The End.

Title: Twinkle, Twinkle Little Star.
Twinkle, twinkle, little star,
How I wonder what you are,
Up above the world so high,
Like a diamond in the sky, twinkle, twinkle, little star,
How I wonder what you are?

Title: I'm a Little Tea Pot.
I’m a little teapot, short and stout
Here’s my handle (place hand on hip)
Here’s my spout (stick your other arm out straight)
When I get all steamed up, hear me shout
Just tip me over and pour me out (lean over with your spout arm.

Title: London Bridge is Falling Down (Short Version)
London Bridge is falling down,
Falling down, falling down.
London Bridge is falling down,
My fair lady.

Build it up with wood and clay,
Wood and clay, wood and clay,
Build it up with wood and clay,
My fair lady.

Wood and clay will wash away,
Wash away, wash away,
Wood and clay will wash away,
My fair lady.

Build it up with iron and steel,
Iron and steel, iron and steel,
Build it up with iron and steel,
My fair lady.

Iron and steel will bend and bow,
Bend and bow, bend and bow,
Iron and steel will bend and bow,
My fair lady.

Build it up with silver and gold,
Silver and gold, silver and gold,
Build it up with silver and gold,
My fair lady.

Title: Mary Had a Little Lamb.
Mary had a little lamb,
His fleece was white as snow,
And everywhere that Mary went,
The lamb was sure to go

He followed her to school one day,
Which was against the rule,
It made the children laugh and play,
To see a lamb at school.

And so the teacher turned him out,
But still he lingered near,
And waited patiently about,
Till Mary did appear.

"What makes the lamb love Mary so?"
The eager children cry;
"Why, Mary loves the lamb, you know,"
The teacher did reply.

Title: Humpty Dumpty.
Humpty Dumpty sat on a wall,
Humpty Dumpty had a great fall,
All the king’s horses and all the king’s men,
Couldn’t put Humpty together again.

Title: Hey Diddle Diddle, Mother Goose.
Hey diddle diddle, the cat and the fiddle,
The cow jumped over the moon.
The little dog laughed to see such fun
And the dish ran away with the spoon!

Title: Baa Baa Black Sheep.
Baa baa black sheep, have you any wool?
Yes sir, yes sir, three bags full!
One for the master, one for the dame,
And one for the little boy who lives down the lane.

Title: One, Two, Three, Four.
One, two, three, four, five
Once I caught a fish alive.
Six, seven, eight, nine, ten
Then I let it go again.
Why did you let it go?
Because it bit my finger so.
Which finger did it bite?
This little finger on my right.

Title: Hickory Dickory Dock.
Hickory dickory dock (Gently bounce baby to the beat)
The mouse ran up the clock (run your fingers from your baby's toes to their chin)
The clock struck one (clap once)
The mouse ran down (run your fingers down to your baby's toes)
Hickory dickory dock.

Hickory dickory dock (Gently bounce baby to the beat)
The mouse ran up the clock (run your fingers from your baby's toes to their chin)
The clock struck two (clap twice)
The mouse went "boo!" (cover baby's eyes with your hands then pull them away on boo!)
Hickory dickory dock.

Three… the mouse went weeee (lift baby in the air on weeee)
Four…The mouse went "no more!" (shake your finger no more!)

 
Title: Polly Put the Kettle On.
Polly put the kettle on,
Polly put the kettle on,
Polly put the kettle on,
We’ll all have tea.

Sukey take it off again,
Sukey take it off again,
Sukey take it off again,
They’ve all gone away.
    
Pop! Goes the Weasel.
Half a pound of tuppenny rice,
Half a pound of treacle,
That’s the way the money goes,
Pop! goes the weasel.

Up and down the City road,
In and out the Eagle,
That’s the way the money goes,
Pop! goes the weasel.

Title: Ring-a-Ring O’Roses.
Ring-a-ring o’roses
A pocketful of posies
Atishoo, atishoo
We all fall down.

Title: Jack and Jill.
Jack and Jill went up the hill
To fetch a pail of water.
Jack fell down and broke his crown,
And Jill came tumbling after.

Up Jack got, and home did trot,
As fast as he could caper,
He went to bed to mend his head,
With vinegar and brown paper.
    
Title: This Old Man.
This old man, he played one
He played knick-knack on my thumb
With a knick knack paddywhack give the dog a bone
This old man cam rolling home…

Two… on my shoe
Three… on my knee
Four… on my door
Five… on my hive
Six… on my sticks
Seven…up to heaven
Eight… on my gate
Nine… on my spine
Ten… once again

Title: Round and Round the Garden.
Round and round the garden, like a Teddy Bear (draw a circle with your finger on baby’s palm)
One step, two step, (walk your finger up baby’s arm)
Tickle you under there! (tickle baby under the chin)

Title: Sing a Song of Sixpence.
Sing a song of sixpence a pocket full of rye,
Four and twenty blackbirds baked in a pie,
When the pie was opened the birds began to sing,
Oh wasn't that a dainty dish to set before the king?

The king was in his counting house counting out his money,
The queen was in the parlour eating bread and honey,
The maid was in the garden hanging out the clothes,
When down came a blackbird and pecked off her nose!
    
Title: This Little Piggy.
This little piggy went to market (touch baby’s biggest toe)
This little piggy stayed at home (touch the next toe)
This little piggy had roast beef (and the next)
This little piggy had none (and the next)
And this little piggy went...Wee wee wee all the way home... (touch the little toe and then run your hand up baby tickling gently as you go)

Title: Little Miss Muffet.
Little Miss Muffet sat on a tuffet,
Eating her curds and whey,
Along came a spider, who sat down beside her,
And frightened Miss Muffet away!


Title: Duke of York.
Oh, the grand old Duke of York
He had ten thousand men
He marched them up to the top of the hill
And he marched them down again

And when they were up, they were up
And when they were down, they were down
And when they were only half-way up
They were neither up nor down

Oh, the grand old Duke of York
He had ten thousand men
He marched them up to the top of the hill
And he marched them down again

And when they were up, they were up
And when they were down, they were down
And when they were only half-way up
They were neither up nor down

Oh, the grand old Duke of York
He had ten thousand men
He marched them up to the top of the hill
And he marched them down again

And when they were up, they were up
And when they were down, they were down
And when they were only half-way up
They were neither up nor down


Title: Wheels on the Bus.
The wheels on the bus go round and round
Round and round
Round and round
The wheels on the bus go round and round
All day long
No, it started to rain
Oh no, we need to make the wipers go swish, swish, swish
Are you ready? Here we go
The wipers on the bus go swish, swish, swish
Swish, swish, swish
Swish, swish, swish
The wipers on the bus go swish, swish, swish
All day long
Wow, it suddenly got very noisy on the bus
Lots of people have gone on and started to chat
Are you ready?
The people on the bus go chat, chat, chat
Chat, chat, chat
Chat, chat, chat
The people on the bus go chat, chat, chat
All day long
Alright everyone, it's time to beat the horn on the bus
Get ready with your 'uh, uh, uh'
Here we go
The horn on the bus goes beep, beep, beep
Beep, beep, beep
Beep, beep, beep
The horn on the bus goes beep, beep, beep
All day long
Yeah, well done everyone
Great singing
Come on, let's ride the bus one more time
Ready to sing? Here we go
The wheels on the bus go round and round
Round and round
Round and round
The wheels on the bus go round and round
All day long

 
Title: Little Bo Beep.
Little Bo Peep has lost her sheep
And doesn’t know where to find them;
Leave them alone, and they’ll come home,
Bringing their tails behind them.
 
Little Bo Peep fell fast asleep
And dreamt she heard them bleating;
But when she awoke, she found it a joke,
For they were still a-fleeting.

Then up she took her little crook,
Determined for to find them;
She found them indeed, but it made her heart bleed,
For they’d left their tales behind them.
 
It happened one day, as Bo Peep did stray
Into a meadow hard by,
There she espied their tales side by side,
All hung on a tree to dry.

She heaved a sigh and wiped her eye,
And over the hillocks went rambling,
And tried what she could, as a shepherdess should,
To tack each again to its lambkin.

Title: I’m a Little Teapot.
I'm a little teapot,
Short and stout,
Here is my handle
Here is my spout
When I get all steamed up,
Hear me shout,
Tip me over and pour me out!
I'm a very special teapot,
Yes, it's true,
Here's an example of what I can do,
I can turn my handle into a spout,
Tip me over and pour me out!

Title: If You’re Happy And You Know It.
If you're happy and you know it clap your hands
If you're happy and you know it clap your hands
If you're happy and you know it and you really want to show it
If you're happy and you know it clap your hands
If you're happy and you know it turn around
If you're happy and you know it turn around
If you're happy and you know it and you really want to show it
If you're happy and you know it turn around
 
Title: Round and Round the Garden.
Round and round the garden
Like a teddy bear.
One step, two step,
Tickle you under there.
"""

In [4]:
# Save text to a temp file for the tokenizer trainer

# Uncommented the following lines if running on Google Colab
#FILEPATH = "/content/drive/MyDrive/Colab Notebooks/corpus.txt"
FILEPATH = "corpus.txt"

with open(FILEPATH, "w", encoding="utf-8") as f:
    f.write(RAW_TEXT)

# --- Build BPE Tokenizer ---
# 1. Initialize a BPE model
#tokenizer = Tokenizer(models.BPE())

# 2. Pre-tokenization
# We use Punctuation pre-tokenizer. This splits text at punctuation marks
# but does not split on whitespace. This is important for preserving spaces
# as distinct characters in the BPE model.
# By avoiding whitespace splitting, the BPE model treats the space character ' '
# as a literal character. It will learn ' ' as a distinct token or merge it 
# into words (e.g., "The" + " " -> "The ").

#tokenizer.pre_tokenizer = pre_tokenizers.Punctuation()
#tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel()
"""
tokenizer.pre_tokenizer = pre_tokenizers.Sequence([
    pre_tokenizers.BertPreTokenizer(),
    pre_tokenizers.Punctuation(),
])
"""

# 3. Trainer: Learn the vocabulary
# vocab_size=500 is small, but suitable for this tiny dataset.
#trainer = trainers.BpeTrainer(vocab_size=2000, special_tokens=["[UNK]", "[PAD]"])

# 4. Train
#tokenizer.train(["corpus.txt"], trainer)

# 5. Add a decoder to merge subwords back into text later
#tokenizer.decoder = decoders.BPEDecoder()

# --- Load Pre-trained GPT-2 Tokenizer ---
# We fetch the tokenizer definition directly from Hugging Face Hub.
# This tokenizer uses a Byte-Level BPE model.
try:
    tokenizer = Tokenizer.from_pretrained("gpt2")
except Exception as e:
    print(f"Error loading GPT-2 tokenizer. Ensure internet access or local file. {e}")
    # Fallback/Exit handling would go here in production
    raise e


vocab_size = tokenizer.get_vocab_size()
print(f"Tokenizer trained. Vocab Size: {vocab_size}")

# Test encoding
sample = "If you're happy and you know it turn around."
encoded = tokenizer.encode(sample)
print(f"Sample: '{sample}'")
print(f"IDs: {encoded.ids}")
print(f"Tokens: {encoded.tokens}")

Tokenizer trained. Vocab Size: 50257
Sample: 'If you're happy and you know it turn around.'
IDs: [1532, 345, 821, 3772, 290, 345, 760, 340, 1210, 1088, 13]
Tokens: ['If', 'Ġyou', "'re", 'Ġhappy', 'Ġand', 'Ġyou', 'Ġknow', 'Ġit', 'Ġturn', 'Ġaround', '.']


## 4. Preparing the Data for PyTorch
 
### Now we convert the entire text into a sequence of token IDs.

**Difference from Character-Level:**
* **Input:** Sequence of Subword IDs.
* **Output:** The next Subword ID.
* **Sequence Length:** We can use a shorter `seq_len` (e.g., 10-20) because each token represents more information than a single character.


In [5]:
# Encode entire corpus
full_encoding = tokenizer.encode(RAW_TEXT)
data_ids = torch.tensor(full_encoding.ids, dtype=torch.long)

print(f"Total tokens in dataset: {len(data_ids)}")

class SubwordDataset(Dataset):
    def __init__(self, data_tensor, seq_len):
        self.data_tensor = data_tensor
        self.seq_len = seq_len
        
    def __len__(self):
        return len(self.data_tensor) - self.seq_len
    
    def __getitem__(self, idx):
        # Input: tokens [0, 1, ... N]
        input_seq = self.data_tensor[idx : idx + self.seq_len]
        # Target: tokens [1, 2, ... N+1]
        target_seq = self.data_tensor[idx + 1 : idx + self.seq_len + 1]
        return input_seq, target_seq

# Hyperparameters
SEQ_LEN = 50  # Context window (approx 10-15 words)
BATCH_SIZE = 16

dataset = SubwordDataset(data_ids, SEQ_LEN)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)

print(f"Number of batches: {len(dataloader)}")

# Verify a batch
for x_batch, y_batch in dataloader:
    print("Input batch shape:", x_batch.shape)  # Expected: (BATCH_SIZE, SEQ_LEN)
    print("Target batch shape:", y_batch.shape)  # Expected: (BATCH_SIZE, SEQ_LEN)
    print("First input sequence IDs:", x_batch[0].tolist())
    print("First target sequence IDs:", y_batch[0].tolist())
    print("First input sequence Tokens:", [tokenizer.id_to_token(id.item()) for id in x_batch[0]])
    print("First target sequence Tokens:", [tokenizer.id_to_token(id.item()) for id in y_batch[0]])
    break

Total tokens in dataset: 4536
Number of batches: 280
Input batch shape: torch.Size([16, 50])
Target batch shape: torch.Size([16, 50])
First input sequence IDs: [13, 198, 198, 19160, 25, 770, 5706, 1869, 13, 198, 1212, 1468, 582, 11, 339, 2826, 530, 198, 1544, 2826, 638, 624, 12, 15418, 441, 319, 616, 15683, 198, 3152, 257, 638, 624, 47868, 279, 13218, 1929, 441, 1577, 262, 3290, 257, 9970, 198, 1212, 1468, 582, 12172, 10708, 1363]
First target sequence IDs: [198, 198, 19160, 25, 770, 5706, 1869, 13, 198, 1212, 1468, 582, 11, 339, 2826, 530, 198, 1544, 2826, 638, 624, 12, 15418, 441, 319, 616, 15683, 198, 3152, 257, 638, 624, 47868, 279, 13218, 1929, 441, 1577, 262, 3290, 257, 9970, 198, 1212, 1468, 582, 12172, 10708, 1363, 1399]
First input sequence Tokens: ['.', 'Ċ', 'Ċ', 'Title', ':', 'ĠThis', 'ĠOld', 'ĠMan', '.', 'Ċ', 'This', 'Ġold', 'Ġman', ',', 'Ġhe', 'Ġplayed', 'Ġone', 'Ċ', 'He', 'Ġplayed', 'Ġkn', 'ick', '-', 'kn', 'ack', 'Ġon', 'Ġmy', 'Ġthumb', 'Ċ', 'With', 'Ġa', 'Ġkn', 'ick', '

## 5. GRU Model Implementation
 
### The architecture remains similar, but the `vocab_size` is now larger (from ~60 chars to 500 subwords), and the `embedding_dim` handles semantic vectorization of these subwords.

### $$ h_t = \text{GRU}(x_t, h_{t-1}) $$
### $$ y = \text{Linear}(h_t) $$

In [6]:
class GRUSubwordLM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=1, dropout=0.1):
        super(GRUSubwordLM, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers
        
        # Padding index is 1 because [UNK]=0, [PAD]=1 in this tokenizer setup usually,
        # but strictly we should check tokenizer.token_to_id("[PAD]").
        # For this simple loop, we don't strictly need padding logic as we drop_last=True
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        
        self.gru = nn.GRU(
            input_size=embedding_dim,
            hidden_size=hidden_dim,
            num_layers=num_layers,
            dropout=dropout if num_layers > 1 else 0.0,
            batch_first=True
        )
        
        self.fc = nn.Linear(hidden_dim, vocab_size)
        
    def forward(self, x, hidden):
        # x: (batch, seq_len)
        embeds = self.embedding(x) # (batch, seq, embed_dim)
        
        # output: (batch, seq, hidden)
        output, hidden = self.gru(embeds, hidden)
        
        # Flatten for classification
        # We reshape to (batch * seq, hidden)
        output = output.reshape(-1, self.hidden_dim)
        
        # Project to vocab
        logits = self.fc(output)
        
        return logits, hidden
    
    def init_hidden(self, batch_size):
        return torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(DEVICE)


## 6. Training
 
### We train using Cross Entropy Loss. Because we are using subwords, the loss represents the perplexity of predicting the next subword.


In [None]:
# Model Hyperparameters
EMBEDDING_DIM = 128
HIDDEN_DIM = 256
NUM_LAYERS = 2
DROPOUT = 0.15
LR = 0.001
GRAD_THRESH = 1.0
EPOCHS = 10

# Instantiate Model, Loss, and Optimizer
model = GRUSubwordLM(vocab_size, EMBEDDING_DIM, HIDDEN_DIM, NUM_LAYERS, dropout=DROPOUT).to(DEVICE)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LR)
print(model)

# Calculate total trainable parameters
total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"GRU Subword LM Model Parameters: {total_params:,}")

GRUSubwordLM(
  (embedding): Embedding(50257, 128)
  (gru): GRU(128, 256, num_layers=2, batch_first=True, dropout=0.15)
  (fc): Linear(in_features=256, out_features=50257, bias=True)
)
GRU Subword LM Model Parameters: 20,040,145


In [8]:
# Training Loop
loss_history = []
model.train()

print("Starting training...")
start_t = time.time()

for epoch in range(EPOCHS):
    
    total_loss = 0
    
    for x, y in dataloader:
        x, y = x.to(DEVICE), y.to(DEVICE)

        # Important: Since batches are shuffled and not sequential,
        # we must initialize a fresh hidden state for each batch.
        # We cannot carry over state from a previous unrelated batch.
        h = model.init_hidden(BATCH_SIZE)
        
        # Detach hidden state
        h = h.detach()
        
        optimizer.zero_grad()
        
        output, h = model(x, h)

        # Target shape must be flat: (batch * seq)
        loss = criterion(output, y.view(-1))
        loss.backward()
        
        # Gradient clipping to prevent exploding gradients
        nn.utils.clip_grad_norm_(model.parameters(), GRAD_THRESH)

        if XLA_AVAILABLE:
            # XLA specific optimization step
            xm.optimizer_step(optimizer)
            xm.mark_step() # Signal end of computation step to XLA
        else:
            optimizer.step()
        
        total_loss += loss.item()
        
    avg_loss = total_loss / len(dataloader)
    loss_history.append(avg_loss)
    
    if (epoch+1) % 5 == 0:
        print(f"Epoch {epoch+1} | Loss: {avg_loss:.4f}")

print(f"Done. Time: {time.time()-start_t:.2f}s")

# Save the model
# Use xm.save if XLA is available, otherwise standard torch.save
save_model = xm.save if XLA_AVAILABLE else torch.save
save_model(model.state_dict(), "gru_subword_lm.pth")

# Plot loss history
#plt.figure(figsize=(8,5))
#plt.plot(loss_history)
#plt.title("Training Loss (Subword Level)")
#plt.show()


Starting training...
Epoch 5 | Loss: 0.3293
Epoch 10 | Loss: 0.0926
Done. Time: 164.51s



## 7. Generation
 
### We generate tokens one by one and decode them using the BPE decoder.

### **Process:**
### 1. Tokenize prompt string -> `[IDs]`
### 2. Feed `[IDs]` to GRU, get hidden state.
### 3. Predict next ID.
### 4. Append next ID to sequence.
### 5. Decode complete sequence -> String.

In [None]:
def generate_text(model, tokenizer, seed_text="The Lion", length=50, temp=0.8):
    model.eval()
    
    # 1. Encode Seed
    input_ids = tokenizer.encode(seed_text).ids
    input_tensor = torch.tensor(input_ids).unsqueeze(0).to(DEVICE) # (1, seq_len)
    
    # Init hidden
    hidden = model.init_hidden(1)
    
    generated_ids = input_ids.copy()
    
    with torch.no_grad():
        # Priming: Run seed through to update hidden state
        # We just want the state after the last token of the seed
        _, hidden = model(input_tensor, hidden)
        
        # The input to the loop is the last token of the seed
        last_token = input_tensor[:, -1].unsqueeze(1) # (1, 1)
        
        for _ in range(length):
            # Forward pass
            logits, hidden = model(last_token, hidden) # logits: (1, vocab_size)
            
            # Sampling
            probs = torch.softmax(logits / temp, dim=1).squeeze()
            next_token_id = torch.multinomial(probs, 1).item()
            
            generated_ids.append(next_token_id)
            
            # Update input
            last_token = torch.tensor([[next_token_id]]).to(DEVICE)
            
    # Decode back to text
    return tokenizer.decode(generated_ids)

# Test Generation Part

# Model Hyperparameters (must match training)
EMBEDDING_DIM = 128
HIDDEN_DIM = 256
NUM_LAYERS = 2
DROPOUT = 0.15

# Instantiate and load the model
model = GRUSubwordLM(vocab_size, EMBEDDING_DIM, HIDDEN_DIM, NUM_LAYERS, dropout=DROPOUT).to(DEVICE)
model.load_state_dict(torch.load("gru_subword_lm.pth"))
model.to(DEVICE)
model.eval()

print("--- Generated Text ---")
print(generate_text(model, tokenizer, seed_text="Title: The Loin", length=60, temp=0.5))

print("\n--- Generated Text ---")
print(generate_text(model, tokenizer, seed_text="Title: The Wolf and the Lamb.", length=60, temp=0.5))

print("\n--- Generated Text ---")
print(generate_text(model, tokenizer, seed_text="Title: Hickory Dickory Dock.", length=60, temp=0.5))

print("\n--- Generated Text ---")
print(generate_text(model, tokenizer, seed_text="Title: The Gruffalo", length=260, temp=0.5))

--- Generated Text ---
Title: The Loin a his favourite food is roasted fox."
"Roasted fox! I'm off!" Fox said.
"Goodbye, little mouse," and away he sped.

"Silly old Fox! Doesn't he know
There's no such thing as a Gruffalo?"



--- Generated Text (More Creative) ---
Title: The Wolf and the Lamb.
A Bat who fell upon the ground and was caught by a Weasel pleaded to be spared his life. 
The Weasel refused, saying that he was by nature the enemy of all birds. 
The Bat assured him that he was not a bird, but a mouse, and thus

--- Generated Text (More Creative) ---
Title: Hickory Dickory Dock.
Hickory dickory dock (Gently bounce baby to the beat)
The mouse ran up the clock (run your fingers from your baby's toes to their chin)
The clock struck two (clap twice)
The mouse went "boo!" (cover baby's eyes with

--- Generated Text (More Creative) ---
Title: The Gruffalo.
A mouse took a stroll through the deep dark wood.
A fox saw the mouse, and the mouse looked good.
"Where are you going to, litt