# ðŸ§  Tiny Untrained GPT â€” Notebook Demo

This notebook wires up a small, untrained GPT-style model to:
- Encode a couple of sample texts
- Run a forward pass to inspect logits
- Generate a few tokens autoregressively (expect nonsense â€” it's untrained!)

> Requirements:
> - `torch`
> - `tiktoken`
> - Local modules: `GPTModel.py`, `TransformerBlock.py`, `GPTConfigs.py` (must be in the same directory)


### Notes & Troubleshooting

- Outputs from an **untrained** model are incoherent by design. Training is required for meaningful text.
- Make sure `GPTModel.py`, `TransformerBlock.py`, and `GPTConfigs.py` are accessible to the notebook (same directory recommended).
- If you hit `ModuleNotFoundError` for local modules, ensure your working directory is correct or append the path with `sys.path.append('...')`.
- GPU usage is automatic if CUDA/MPS is available; otherwise, CPU is used.

**Common fixes**
- `pip install tiktoken` (or run the commented `%pip install` cell above).
- Restart kernel after installing new packages.
- If your tokenizer differs, adjust padding logic accordingly.


In [2]:
# If needed, install tiktoken (uncomment the next line)
# %pip install tiktoken --quiet

import os
import sys
import torch
import tiktoken

# Ensure local modules are importable if the notebook is not in the same dir
# (Optional) sys.path.append('/path/to/your/modules')

from GPTModel import GPTModel
from TransformerBlock import TransformerBlock
from GPTConfigs import GPT_CONFIG_124M

# Device selection (GPU if available)
device = (
    torch.device('cuda') if torch.cuda.is_available()
    else torch.device('mps') if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available()
    else torch.device('cpu')
)
print('Using device:', device)


  cpu = _conversion_method_template(device=torch.device("cpu"))


Using device: cpu


  assert(d_out % num_heads == 0, \


**The core of autoregression** is where the input tensor of shape (B, T, vocab_size) is kept in a window of context_size and feeded to the model to predict the next token to **append** to the input

In [13]:
torch.no_grad()
def generate_text_simple(model, idx, max_new_tokens, context_size):
    """
    Greedy generation (argmax) from an untrained model.
    idx: LongTensor of shape (B, T)
    """
    model.eval()
    for _ in range(max_new_tokens):
        # Keep only the last 'context_size' tokens
        idx_cond = idx[:, -context_size:]

        # Forward pass
        logits = model(idx_cond)  # (B, T, vocab_size)

        # Focus on the last time-step
        logits_last = logits[:, -1, :]  # (B, vocab_size)

        # Convert to probabilities and pick argmax
        probs = torch.softmax(logits_last, dim=-1)
        idx_next = torch.argmax(probs, dim=-1, keepdim=True)  # (B, 1)

        # Append to sequence
        idx = torch.cat((idx, idx_next), dim=1)
    return idx


***create the sample model (GPT2)***, map it to the correct device and calculate the amount of parameters ... 

In [4]:
# Build the model
torch.manual_seed(254)
model = GPTModel(GPT_CONFIG_124M).to(device)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f'Total number of parameters: {total_params:,}')


Total number of parameters: 163,009,536


In [10]:
tokenizer = tiktoken.get_encoding("gpt2")

# Batch encode a couple of samples & forward pass
text1 = 'Every effort moves you'
text2 = 'Every day holds a'

# Encode
batch = [
    torch.tensor(tokenizer.encode(text1), dtype=torch.long),
    torch.tensor(tokenizer.encode(text2), dtype=torch.long),
]

# Pad to the max length (use 0 as padding id for simplicity)
max_len = max(x.size(0) for x in batch)
padded = []
for x in batch:
    if x.size(0) < max_len:
        pad = torch.zeros(max_len - x.size(0), dtype=torch.long)
        x = torch.cat([x, pad], dim=0)
    padded.append(x)

batch_tensor = torch.stack(padded, dim=0).to(device)  # (B, T)
print('Input batch shape:', batch_tensor.shape)
print('Input batch (token IDs):', batch_tensor)

# Forward pass
with torch.no_grad():
    out = model(batch_tensor)  # (B, T, vocab_size)

print('Output logits shape:', out.shape)


Input batch shape: torch.Size([2, 4])
Input batch (token IDs): tensor([[6109, 3626, 6100,  345],
        [6109, 1110, 6622,  257]])
Output logits shape: torch.Size([2, 4, 50257])


In [11]:

# Simple generation from a start prompt (expect incoherent output â€” model is untrained)
start_context = 'Hello, I am'
encoded = tokenizer.encode(start_context)
print('Encoded:', encoded)

encoded_tensor = torch.tensor(encoded, dtype=torch.long).unsqueeze(0).to(device)  # (1, T)
print('Encoded tensor shape:', encoded_tensor.shape)

model.eval()
out_tokens = generate_text_simple(
    model=model,
    idx=encoded_tensor,
    max_new_tokens=6,
    context_size=GPT_CONFIG_124M['context_length'],
)

print('Generated token IDs:', out_tokens)
print('Sequence length:', out_tokens.shape[1])

decoded_text = tokenizer.decode(out_tokens.squeeze(0).tolist())
print('Decoded text:', decoded_text)


Encoded: [15496, 11, 314, 716]
Encoded tensor shape: torch.Size([1, 4])
Generated token IDs: tensor([[15496,    11,   314,   716, 23200, 43056, 37848,  9157, 35539,  9338]])
Sequence length: 10
Decoded text: Hello, I am suburbabidingswers provenGroundERE
