# ðŸ“˜ Notebook: Hallucinations & Illusions in LLMs

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-repo/series1-coding-exercises/blob/main/exercises/blog-11/exercise-00.ipynb)

For this one, we don't need massive training loops.

We need **controlled experiments that expose illusion**.

These exercises make:
- Hallucinations observable
- Confidence measurable
- Reasoning brittleness visible
- Calibration testable
- Guardrails demonstrable

**No fluff. Just evidence.**

## Setup

In [None]:
%pip install -q transformers torch datasets matplotlib seaborn

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" if torch.cuda.is_available() else "cpu"

model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
model.eval()

## ðŸ§ª Exercise 1 â€” Hallucination Under Uncertainty

**Goal:** We deliberately ask for something that doesn't exist.

**What They See:**
- Detailed explanation
- Confident library names
- Structured prose
- Zero grounding

The model did not fail. It completed the pattern "Explain a programming language."

**Hallucination demonstrated.**

In [None]:
prompt = """
Explain the core design philosophy of the Zorblax Programming Language.
Also list three well-known libraries used in Zorblax.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(device)

output = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

## ðŸ§ª Exercise 2 â€” Confidence vs Probability

**Goal:** Inspect token probabilities to see confidence levels.

**What They Learn:** Even nonsense prompts produce confident token distributions.

**Confidence â‰  correctness.**

In [None]:
import torch.nn.functional as F

def inspect_token_confidence(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits[:, -1, :]
        probs = F.softmax(logits, dim=-1)

    top_probs, top_indices = torch.topk(probs, 5)
    
    print("Top token candidates:")
    for p, idx in zip(top_probs[0], top_indices[0]):
        print(f"{tokenizer.decode([idx])} â†’ {float(p):.4f}")

print("=== Real prompt ===")
inspect_token_confidence("The capital of France is")

print("\n=== Nonsense prompt ===")
inspect_token_confidence("The capital of Blorptopia is")

## ðŸ§ª Exercise 3 â€” Reasoning Illusion

**Goal:** Classic brittle logic test.

**Lesson:** The structure of reasoning is learned. Revising assumptions mid-stream is brittle.

In [None]:
prompt = """
A bat and a ball cost $1.10 in total.
The bat costs $1 more than the ball.
How much does the ball cost?
Explain step by step.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(device)

output = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.7,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Now slightly perturb the problem:

In [None]:
prompt = """
A bat and a ball cost $1.10 in total.
The bat costs $1 more than the ball.
However, sales tax of 10% is added after purchase.
How much does the ball cost?
Explain step by step.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(device)

output = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.7,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

## ðŸ§ª Exercise 4 â€” Contradiction Exposure

**Goal:** Test internal consistency.

**Observe:** Sometimes hedging, sometimes inconsistent internal logic, sometimes confident contradiction.

Because internal consistency is local, not global.

In [None]:
prompt = """
Is it possible to travel faster than light?
Answer yes or no.

Now explain why.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(device)

output = model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.8,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Run multiple times to observe different behaviors:

In [None]:
for i in range(3):
    print(f"\n=== Run {i+1} ===")
    output = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.8,
        do_sample=True
    )
    print(tokenizer.decode(output[0], skip_special_tokens=True))

## ðŸ§ª Exercise 5 â€” Calibration Check (Real Data)

**Goal:** Measure whether the model's probability correlates with correctness.

**What They Will See:** Similar confidence behavior. No built-in "unknown detection."

This demonstrates poor calibration under OOD prompts.

In [None]:
questions = [
    ("The capital of Germany is", "Berlin"),
    ("The capital of Italy is", "Rome"),
    ("The capital of Blorptopia is", None)
]

for q, correct in questions:
    inputs = tokenizer(q, return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits[:, -1, :]
        probs = F.softmax(logits, dim=-1)

    top_probs, top_indices = torch.topk(probs, 5)

    print("\nPrompt:", q)
    for p, idx in zip(top_probs[0], top_indices[0]):
        token = tokenizer.decode([idx])
        print(f"{token} â†’ {float(p):.4f}")

## ðŸ§ª Exercise 6 â€” Long Reasoning Drift

**Goal:** Show how style dominates truth in long-form generation.

**What It Will Produce:** Equations, technical tone, pseudo-physics.

Because style dominates truth.

In [None]:
prompt = """
Explain in detail how a perpetual motion machine works.
Give equations.
Be very technical.
"""

inputs = tokenizer(prompt, return_tensors="pt").to(device)

output = model.generate(
    **inputs,
    max_new_tokens=300,
    temperature=0.9,
    do_sample=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

## ðŸ§ª Exercise 7 â€” Simple Guardrail Demonstration

**Goal:** Simulate grounding with external constraints.

**What This Shows:** Hallucination drops dramatically when you constrain the probability field.

In [None]:
def grounded_answer(question, allowed_facts):
    context = "\n".join(allowed_facts)
    prompt = f"""
Use ONLY the following facts to answer the question.

Facts:
{context}

Question:
{question}

If the answer is not in the facts, say 'Not enough information.'
"""
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    output = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.decode(output[0], skip_special_tokens=True))

facts = [
    "The capital of France is Paris.",
    "The capital of Germany is Berlin."
]

print("=== Question with answer in facts ===")
grounded_answer("What is the capital of France?", facts)

print("\n=== Question without answer in facts ===")
grounded_answer("What is the capital of Italy?", facts)

## ðŸ”¬ What These Exercises Demonstrate

After running this notebook, your audience will understand:

- **Hallucination is default behavior under uncertainty**
- **Confidence is stylistic, not epistemic**
- **Reasoning is pattern continuation**
- **Contradictions are statistical, not logical**
- **Calibration is weak under distribution shift**
- **Guardrails must be external**

This moves the blog from philosophical warning to operational clarity.