```{contents}
```
## Hallucination

### 1. Definition

**Hallucination** is the phenomenon where a machine learning model — especially a **Large Language Model (LLM)** — generates **fluent, confident, but factually incorrect or unsupported content** that is not grounded in its training data or the provided context.

> In short: **The model sounds right, but is wrong.**

---

### 2. Why Hallucination Occurs

LLMs are **probabilistic sequence models**:

[
P(\text{next token} \mid \text{previous tokens})
]

They optimize for **likelihood of text**, **not truth**.

| Root Cause           | Explanation                                                    |
| -------------------- | -------------------------------------------------------------- |
| Training Objective   | Models learn to predict *plausible text*, not verified facts   |
| Missing Knowledge    | When the answer is unknown, the model fills gaps with patterns |
| Overgeneralization   | Training correlations are mistaken for rules                   |
| Context Ambiguity    | Weak or vague prompts increase error                           |
| Long-Range Reasoning | Errors accumulate over multi-step generation                   |

---

### 3. Types of Hallucination

| Type         | Description                                 | Example                                     |
| ------------ | ------------------------------------------- | ------------------------------------------- |
| Intrinsic    | False statements contradict known facts     | "Einstein won a Nobel Prize for Relativity" |
| Extrinsic    | Unsupported claims outside provided context | Fabricating citations                       |
| Entity       | Non-existent people, papers, companies      | "Smith & Johnson 2023 study"                |
| Logical      | Internally inconsistent reasoning           | Contradicting earlier steps                 |
| Mathematical | Incorrect calculations or derivations       | Wrong proofs                                |
| Citation     | Fake or incorrect references                | Fabricated DOIs                             |
| Contextual   | Ignores given document                      | Answers outside source material             |

---

### 4. Hallucination vs Error

| Property      | Error                | Hallucination                  |
| ------------- | -------------------- | ------------------------------ |
| Confidence    | May appear uncertain | Often highly confident         |
| Source        | Computation mistake  | Knowledge or grounding failure |
| Detectability | Easier               | Harder                         |
| Severity      | Local                | Can corrupt entire output      |

---

### 5. Demonstration

**Prompt**

> "Who is the current Prime Minister of Atlantis?"

**Model Output (Hallucinated)**

> "The current Prime Minister of Atlantis is Alexander Maris..."

**Reality**
Atlantis does not exist → hallucination.

---

### 6. Mathematical View

LLM outputs:

[
\hat{y} = \arg\max_y P(y \mid x, \theta)
]

But **truthfulness** is not directly optimized:

[
P(\text{true}) \neq P(\text{likely})
]

Thus hallucination arises when:

[
P(\text{false but plausible}) > P(\text{unknown})
]

---

### 7. Hallucination in ML Workflows

```text
User Query
   ↓
Prompt Encoding
   ↓
Neural Generation (next-token prediction)
   ↓
Decoding Strategy (temperature, top-k, top-p)
   ↓
Surface Text
   ↓
Possible Hallucination
```

Higher **temperature** → more hallucination risk.

---

### 8. Code Illustration

```python
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

prompt = "The capital of Atlantis is"
print(generator(prompt, max_length=30))
```

Typical output:

```
"The capital of Atlantis is Oriona..."
```

Model invents content because no factual grounding exists.

---

### 9. Why Hallucination Is Hard to Eliminate

| Reason                                   |
| ---------------------------------------- |
| No explicit knowledge base               |
| Lack of real-time fact checking          |
| Training data is incomplete              |
| Neural models lack symbolic verification |
| Text fluency ≠ correctness               |

---

### 10. Mitigation Techniques

| Method                               | Purpose                   |
| ------------------------------------ | ------------------------- |
| Retrieval-Augmented Generation (RAG) | Inject verified documents |
| Tool Use / Search                    | Real-time grounding       |
| Chain-of-Thought + Verification      | Catch logical errors      |
| Constrained Decoding                 | Reduce fabrication        |
| Self-Consistency Checks              | Sample multiple answers   |
| Confidence Estimation                | Detect uncertainty        |
| Human-in-the-loop                    | Final validation          |

---

### 11. Hallucination in Production Systems

| Risk       | Impact                   |
| ---------- | ------------------------ |
| Legal      | False advice             |
| Medical    | Dangerous misinformation |
| Scientific | Invalid research         |
| Business   | Wrong decisions          |
| Trust      | Loss of credibility      |

---

### 12. Key Takeaways

* Hallucination is **structural**, not a bug.
* It emerges from **probabilistic text generation**.
* Fluency **does not imply truth**.
* Effective systems **must add external grounding and verification**.

---

**One-Line Summary**

> **Hallucination is the inevitable byproduct of predicting language without grounding in reality.**
