# üìò Phase 5: Understanding LLM Hallucinations  
*A practical guide to identifying, understanding, and reducing hallucinations in LLMs*

---

## üö® **1. What Are Hallucinations?**

A **hallucination** occurs when an LLM produces:

- Incorrect information  
- Fabricated facts  
- Fake citations  
- Overconfident‚Äîbut wrong‚Äîanswers  

### üîç **Examples**
- Claiming a person won an award they never received  
- Inventing a research paper that doesn‚Äôt exist  
- Giving incorrect code outputs  
- Making up statistics  

LLMs don‚Äôt ‚Äúlie.‚Äù They generate the *most probable* next tokens, which sometimes leads to confident errors.

---


## üß† **2. Why Do LLMs Hallucinate?**

Hallucinations occur for several key reasons:

---

### **1Ô∏è‚É£ Statistical Prediction ‚â† Truth**

LLMs generate text based on probabilities, not factual databases.

They don't "know"; they **predict**.

---

### **2Ô∏è‚É£ Training Data Gaps**

- Missing information  
- Outdated data  
- Biased or incorrect sources  

---

### **3Ô∏è‚É£ Ambiguous or Underspecified Prompts**

If your prompt is unclear, too short, or missing constraints‚Ä¶  
‚û°Ô∏è the model *fills in the gaps*.

---

### **4Ô∏è‚É£ Over-Generalization**

The model may wrongly infer patterns from examples.

---

### **5Ô∏è‚É£ Long Context ‚Üí Memory Drift**

As context windows get large, models may lose track of details.

---


## üõ°Ô∏è **3. Mitigating Hallucinations**

Here are practical strategies to reduce hallucinations in real-world systems.

---

## ‚úÖ **1. Grounding Responses in Context (RAG)**  
**RAG = Retrieval-Augmented Generation**

Feed the model *verified documents* to answer from.

Example pipeline:

1. Retrieve relevant context  
2. Provide it to the LLM  
3. Instruct the model to answer *only from the context*  

This significantly reduces hallucination chances.

---

## ‚úÖ **2. Use Verifiable Sources**

Tell the model explicitly:

- "Answer using the provided text only"  
- "Cite sources line-by-line"  
- "If unknown, respond with 'I don't know.'"  

---

## ‚úÖ **3. Adjust Temperature and Top‚Äëp**

High values ‚Üí more creativity ‚Üí more hallucinations  
Low values ‚Üí more deterministic ‚Üí fewer hallucinations  

---

## ‚úÖ **4. Fact-Checking Loops**

You can force the model to:

- Reevaluate its own answer  
- Verify claims  
- Cross-check using a second model  

---


## üß™ **Hands-On: Simple Hallucination Test**

We'll test how a model *might* hallucinate by simulating a misleading prompt.

(We cannot call external APIs here, so this is a teaching demo.)

In [None]:
# Demonstration: potential hallucination scenario (simulation)

def mock_model(prompt):
    if "who discovered ai" in prompt.lower():
        # Incorrect but confident output (hallucination)
        return "AI was discovered by Dr. Helena Carson in 1923."
    return "I don't know."

prompt = "Who discovered AI?"
mock_model(prompt)

## üéØ **Summary**

In this phase you learned:

### ‚úî What hallucinations are  
### ‚úî Why LLMs hallucinate  
### ‚úî Practical mitigation strategies  
### ‚úî RAG fundamentals  
### ‚úî How to constrain and verify model outputs  

