<a href="https://colab.research.google.com/github/neelsoumya/intro_to_LMMs/blob/main/LLM_demo_tokenlevel_inspection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 1 — LLM demo & token-level inspection
This notebook demonstrates a small causal LM (`distilgpt2`), shows tokenization, inspects logits to compute next-token probabilities, and performs a simple generation loop that prints token probabilities as it generates.

**Notes**
- Run the installation cell first.
- Optional: Runtime → Change runtime type → GPU for faster generation.


In [1]:
# Colab cell: install dependencies (run once)
!pip install -q transformers torch


In [2]:
# Colab cell: imports and helper utilities
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import torch.nn.functional as F

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)


Using device: cpu


In [2]:
# Colab cell: load tokenizer and model (small, fast)
MODEL = "distilgpt2"  # small GPT-like model useful for demos
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForCausalLM.from_pretrained(MODEL, output_attentions=False).to(device)
model.eval()


In [3]:
# Colab cell: helper to show top-k next-token probs
def topk_next_token_probs(text, k=8):
    # Tokenize
    enc = tokenizer(text, return_tensors="pt").to(device)
    input_ids = enc["input_ids"]
    with torch.no_grad():
        outputs = model(input_ids)
        # logits shape: (batch, seq_len, vocab)
        logits = outputs.logits
    last_logits = logits[0, -1]  # final token logits
    probs = F.softmax(last_logits, dim=-1)
    topk = torch.topk(probs, k)
    tokens = [tokenizer.decode([int(t)]) for t in topk.indices]
    scores = [float(s) for s in topk.values]
    return list(zip(tokens, scores))

# Example
prompt = "In the near future, scientists discovered"
print("Prompt:", prompt)
for token, score in topk_next_token_probs(prompt, k=10):
    print(f"{token!r}: {score:.4f}")


Prompt: In the near future, scientists discovered


NameError: name 'tokenizer' is not defined

In [4]:
# Colab cell: step-by-step generation with probabilities (greedy sampling)
def generate_stepwise(prompt, max_new_tokens=10):
    input_ids = tokenizer(prompt, return_tensors="pt").to(device)["input_ids"]
    generated = input_ids.clone()
    for step in range(max_new_tokens):
        with torch.no_grad():
            outputs = model(generated)
            logits = outputs.logits
        last_logits = logits[0, -1]
        probs = F.softmax(last_logits, dim=-1)
        topk = torch.topk(probs, 5)
        # Show top-5 choices for this step
        choices = [(tokenizer.decode([int(idx)]), float(prob)) for idx, prob in zip(topk.indices, topk.values)]
        print(f"\nStep {step+1} — top 5 candidates:")
        for tkn, p in choices:
            print(f"  {tkn!r}: {p:.4f}")
        # greedy next token (for demo stability)
        next_token = torch.argmax(last_logits).unsqueeze(0).unsqueeze(0)
        generated = torch.cat([generated, next_token], dim=1)
    out = tokenizer.decode(generated[0], skip_special_tokens=True)
    return out

# Run demo generation
print("\n=== Generation Demo ===")
print(generate_stepwise("The study shows that", max_new_tokens=8))



=== Generation Demo ===


NameError: name 'tokenizer' is not defined

## Exercises / classroom prompts
1. Change the prompt and observe how top-k probabilities change.  
2. Replace greedy selection with sampling (use `torch.multinomial`) to see varied outputs.  
3. Print token ids and decoded tokens for the input to see how tokenization works:
   - `tokenizer.encode(prompt)` and `tokenizer.decode([id1, id2, ...])`.
