# 🧠 Advanced QA Lab: Using BERT and T5 for Question Answering

In this lab, you'll explore two powerful approaches for Question Answering (QA):
- **Extractive QA** using BERT
- **Generative QA** using FLAN-T5

You'll:
- Analyze tokenization
- Run manual QA with both models
- Investigate hallucinations
- Visualize model limitations
- Solve real-world examples



###📦 1. Setup


In [None]:
!pip install transformers datasets --quiet


In [None]:
from transformers import (
    AutoTokenizer,
    AutoModelForQuestionAnswering,
    AutoModelForSeq2SeqLM,
    pipeline
)
import torch
import numpy as np


## 🔍 How Pretrained QA Models Work

### 1. Extractive QA (BERT)
- Input: `question + context`
- Output: Start and end indices of answer **within the context**
- Model: `AutoModelForQuestionAnswering`

### 2. Generative QA (T5)
- Input: `"question: ... context: ..."`
- Output: A generated answer from the model
- Model: `AutoModelForSeq2SeqLM`

---


In [None]:
bert_model_name = "distilbert-base-uncased-distilled-squad"
bert_tokenizer = AutoTokenizer.from_pretrained(bert_model_name)
bert_model = AutoModelForQuestionAnswering.from_pretrained(bert_model_name)

t5_model_name = "google/flan-t5-small"
t5_tokenizer = AutoTokenizer.from_pretrained(t5_model_name)
t5_model = AutoModelForSeq2SeqLM.from_pretrained(t5_model_name)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/451 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

## 🔤 Tokenizer Internals

Understanding how text is split into tokens is crucial for QA models.

We’ll explore how BERT tokenizes a `question + context` pair and aligns it to the original text.

---


In [None]:
question = "What is photosynthesis?"
context = "Photosynthesis is the process by which green plants use sunlight to produce energy."

inputs = bert_tokenizer(question, context, return_offsets_mapping=True, return_tensors="pt")
tokens = bert_tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for tok, offset in zip(tokens, inputs["offset_mapping"][0]):
    print(f"{tok:15s} → {offset}")


[CLS]           → tensor([0, 0])
what            → tensor([0, 4])
is              → tensor([5, 7])
photos          → tensor([ 8, 14])
##yn            → tensor([14, 16])
##thesis        → tensor([16, 22])
?               → tensor([22, 23])
[SEP]           → tensor([0, 0])
photos          → tensor([0, 6])
##yn            → tensor([6, 8])
##thesis        → tensor([ 8, 14])
is              → tensor([15, 17])
the             → tensor([18, 21])
process         → tensor([22, 29])
by              → tensor([30, 32])
which           → tensor([33, 38])
green           → tensor([39, 44])
plants          → tensor([45, 51])
use             → tensor([52, 55])
sunlight        → tensor([56, 64])
to              → tensor([65, 67])
produce         → tensor([68, 75])
energy          → tensor([76, 82])
.               → tensor([82, 83])
[SEP]           → tensor([0, 0])


### 🧪 Exercise: Token Boundaries
1. What token separates question and context?
2. What does `[CLS]` and `[SEP]` represent?

---


In [None]:
# Myabe Google it ?

## 🧠 Extractive QA with BERT

We'll manually run inference to extract the answer span from the context.

---


In [None]:
# convert to tokens id + attention mask (to ignore padding )
inputs = bert_tokenizer(question, context, return_tensors="pt")
with torch.no_grad():
    outputs = bert_model(**inputs)
#finds the highest start token (when the answer starts)
start = torch.argmax(outputs.start_logits)
#finds the highest end token (when the answer ends)
end = torch.argmax(outputs.end_logits)
answer = bert_tokenizer.decode(inputs["input_ids"][0][start:end+1])
print("📘 BERT Extracted Answer:", answer)


📘 BERT Extracted Answer: photosynthesis is the process by which green plants use sunlight to produce energy. [SEP]


### 🧪 Challenge: Use BERT on Your Own Paragraph

1. Write your own context (2–3 sentences).
2. Write a question.
3. Use BERT to extract the answer.

---


In [None]:
#Start Your Work Here

## ✨ Generative QA with T5

We now feed the model a structured prompt and let it **generate** the answer.

---


In [None]:
prompt = f"question: {question} context: {context}"
inputs = t5_tokenizer(prompt, return_tensors="pt")
outputs = t5_model.generate(**inputs)

print("📝 T5 Generated Answer:", t5_tokenizer.decode(outputs[0], skip_special_tokens=True))


📝 T5 Generated Answer: green plants use sunlight to produce energy


### 🧪 Exercise: Prompt Design

1. Modify the prompt to make the question more detailed.
2. Try putting the context before the question.
3. Try removing the context — how does T5 behave?

---


In [None]:
#Try it here

## 🌍 Real-World QA Example

Let's use a paragraph from a news-like text and ask questions with both models.

---


In [None]:
article = "Apple announced iPhone 17 on September 10, 2025. Shipping begins on September 20."

question = "When will iPhone 17 ship?"

# BERT
inputs = bert_tokenizer(question, article, return_tensors="pt")
with torch.no_grad():
    outputs = bert_model(**inputs)

start = torch.argmax(outputs.start_logits)
end = torch.argmax(outputs.end_logits) +1
print("📘 BERT Answer:", bert_tokenizer.decode(inputs["input_ids"][0][start:end]))

# T5
prompt = f"question: {question} context: {article}"
inputs = t5_tokenizer(prompt, return_tensors="pt")
outputs = t5_model.generate(**inputs)
print("📝 T5 Answer:", t5_tokenizer.decode(outputs[0], skip_special_tokens=True))


📘 BERT Answer: september 10, 2025
📝 T5 Answer: September 20


## 💣 Hallucination Test

We'll test what happens when the answer isn't present in the context.

---


In [None]:
context = "Narnia is a fictional realm created by C.S. Lewis."
question = "What is the capital of Narnia?"

# BERT
inputs = bert_tokenizer(question, context, return_tensors="pt")
with torch.no_grad():
    outputs = bert_model(**inputs)

s = torch.argmax(outputs.start_logits)
e = torch.argmax(outputs.end_logits) + 1
print("📘 BERT:", bert_tokenizer.decode(inputs["input_ids"][0][s:e]))

# T5
prompt = f"question: {question} context: {context}"
inputs = t5_tokenizer(prompt, return_tensors="pt")
outputs = t5_model.generate(**inputs)
print("📝 T5:", t5_tokenizer.decode(outputs[0], skip_special_tokens=True))


📘 BERT: narnia
📝 T5: Narnia


## 🧾 T5 with Explanation

We now ask the model to justify its answer.

---


In [None]:
q = "Why are plants green?"
ctx = "Chlorophyll in plants reflects green light and absorbs red/blue light."

prompt = f"question: {q} context: {ctx}\nExplain your answer."
inputs = t5_tokenizer(prompt, return_tensors="pt")
outputs = t5_model.generate(**inputs, max_length=256)

print(t5_tokenizer.decode(outputs[0], skip_special_tokens=True))


Green light is a source of energy for plants.


## 🔢 Token Budget Visualization (BERT)

What happens when context exceeds 512 tokens?

---


In [None]:
# BERT - Token Truncation
long_ctx = "foo " * 600 + "END. The secret code is 42."
question = "What is the secret code?"

bert_inputs = bert_tokenizer(question, long_ctx, return_tensors="pt", truncation=True)
print("🧮 BERT Token count:", bert_inputs["input_ids"].shape[1])

with torch.no_grad():
    outputs = bert_model(**bert_inputs)

s = torch.argmax(outputs.start_logits)
e = torch.argmax(outputs.end_logits) + 1
bert_answer = bert_tokenizer.decode(bert_inputs["input_ids"][0][s:e])
print("📘 BERT Answer:", bert_answer)

# T5 - Handles longer context
prompt = f"question: {question} context: {long_ctx}"
t5_inputs = t5_tokenizer(prompt, return_tensors="pt", truncation=True)
print("🧮 T5 Token count:", t5_inputs["input_ids"].shape[1])

t5_output = t5_model.generate(**t5_inputs, max_length=50)
print("📝 T5 Answer:", t5_tokenizer.decode(t5_output[0], skip_special_tokens=True))


🧮 BERT Token count: 512
📘 BERT Answer: 
🧮 T5 Token count: 512
📝 T5 Answer: foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo fo


###✅ Interpretation
####Model	Observed Behavior	Why?
* BERT -->	Empty answer	The answer was beyond token 512, so it never reached it.
* T5   -->	Repetitive "foo" hallucination

## 🧠 Multiple Choice QA with T5

Let's let T5 decide between multiple options.

---


In [None]:
# MCQA with T5
question = "Which planet is known as the Red Planet?"
choices = ["A) Earth", "B) Mars", "C) Jupiter", "D) Venus"]
prompt = question + "\n" + "\n".join(choices)

t5_inputs = t5_tokenizer(prompt, return_tensors="pt")
t5_output = t5_model.generate(**t5_inputs)
print("📝 T5 chose:", t5_tokenizer.decode(t5_output[0], skip_special_tokens=True))

# MCQA with BERT (simulate by searching each option in context) not fair tho
#try to remove the choices, lol
choices_str = " ".join(choices)
context =choices_str+" Mars is often called the Red Planet due to its reddish appearance."
bert_inputs = bert_tokenizer(question, context, return_tensors="pt")
with torch.no_grad():
    out = bert_model(**bert_inputs)
s = torch.argmax(out.start_logits)
e = torch.argmax(out.end_logits) + 1
bert_answer = bert_tokenizer.decode(bert_inputs["input_ids"][0][s:e])
print("📘 BERT span answer:", bert_answer)


📝 T5 chose: C
📘 BERT span answer: venus mars


In [None]:
question = "What is the capital of Germany?"
context = "Germany is a country in central Europe. It is known for its culture and science."

# BERT
bert_inputs = bert_tokenizer(question, context, return_tensors="pt")

with torch.no_grad():
    outputs = bert_model(**bert_inputs)

start = torch.argmax(outputs.start_logits)
end = torch.argmax(outputs.end_logits) + 1
bert_answer = bert_tokenizer.decode(bert_inputs["input_ids"][0][start:end])
print("[BERT Answer]:", bert_answer)

# T5
prompt = f"question: {question} context: {context}"
t5_inputs = t5_tokenizer(prompt, return_tensors="pt")
t5_output = t5_model.generate(**t5_inputs)
print("[T5 Answer]:", t5_tokenizer.decode(t5_output[0], skip_special_tokens=True))


[BERT Answer]: germany is a country in central europe
[T5 Answer]: Germany


# 🧪 Final Mini-Project Challenge

Build a mini QA system using both BERT and T5. Follow the steps:

---

### 🔶 Step 1: Choose or write a paragraph (4–5 sentences)
It could be about:
- A tech product (e.g., iPhone, Tesla)
- A scientific concept (e.g., gravity, DNA)
- A historical figure (e.g., Einstein, Cleopatra)

---

### 🔶 Step 2: Write 3 different questions
Each question should vary in:
- Type: Who/What/When/Why
- Difficulty: Easy → Tricky
- Presence of answer: One should have no answer

---

### 🔶 Step 3: Answer using both BERT and T5
Use manual inference cells to extract answers.
Then compare:
- Are they accurate?
- Did T5 hallucinate?
- Did BERT miss or get cut off?

---

### 🔶 Step 4: Evaluation (Bonus)
If time permits, write:
- Which model was more reliable?
- Which handled ambiguity better?
- Any improvement suggestions?


In [None]:
context = """
Ada Lovelace was an English mathematician and writer who lived in the 19th century.
She is known for her work on Charles Babbage's proposed mechanical general-purpose computer, the Analytical Engine.
Her notes on the engine include what is considered the first algorithm intended to be carried out by a machine.
Ada is often regarded as the first computer programmer.
"""


In [None]:
questions = [
    "Who was Ada Lovelace?",
    "Why is Ada Lovelace considered the first programmer?",
    "What is Ada Lovelace's favorite color?"
]


In [None]:
print("📘 BERT Answers\n" + "="*30)

for q in questions:
    inputs = bert_tokenizer(q, context, return_tensors="pt")
    with torch.no_grad():
        out = bert_model(**inputs)
    start = torch.argmax(out.start_logits)
    end = torch.argmax(out.end_logits) + 1
    answer = bert_tokenizer.decode(inputs["input_ids"][0][start:end])
    print(f"Q: {q}\n→ A: {answer}\n")


📘 BERT Answers
Q: Who was Ada Lovelace?
→ A: an english mathematician and writer

Q: Why is Ada Lovelace considered the first programmer?
→ A: first algorithm intended to be carried out by a machine

Q: What is Ada Lovelace's favorite color?
→ A: english mathematician and writer who lived in the 19th century. she is known for her work on charles babbage ' s proposed mechanical general - purpose computer, the analytical engine. her notes on the engine include what is considered the first algorithm intended to be carried out by a machine. ada is often regarded as the first computer programmer. [SEP]



In [None]:
print("📝 T5 Answers\n" + "="*30)

for q in questions:
    prompt = f"question: {q} context: {context}"
    inputs = t5_tokenizer(prompt, return_tensors="pt")
    outputs = t5_model.generate(**inputs)
    t5_answer = t5_tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Q: {q}\n→ A: {t5_answer}\n")


📝 T5 Answers
Q: Who was Ada Lovelace?
→ A: Charles Babbage

Q: Why is Ada Lovelace considered the first programmer?
→ A: her work on Charles Babbage's proposed mechanical general-purpose computer, the Analytical

Q: What is Ada Lovelace's favorite color?
→ A: color

