-
Notifications
You must be signed in to change notification settings - Fork 7
CS7545_Sp24_Lecture_25: Hallucination
Notes based on Lecture 25 and the paper "Calibrated Language Models Must Hallucinate"
A Language Model is simply a probability distribution
Hallucination is "a plausible but false or misleading response generated by an artificial intelligence algorithm." - Merriam-Webster (2023) dictionary.
Why is hallucination not a major problem for Trigram LM? It only predicts the next token's probability using the previous 2 tokens instead of capturing a complicated semantic level using a probability distribution over pieces of information.
We define 2 types of fact: arbitrary fact and systematic fact. Arbitrary fact is the truth usually undetermined from the training set. Systematic fact is predictable from a training set by learning the underlying correctness rules.
We also define factoids which are arbitrary pieces of information that are either true (facts) or false (hallucinations) and whose truth is statistically hard to determine from the training data. We assume that each document has at most 1 factor and all training documents are factoids.
Some factoids assumptions:
- 1 factoid for one doc:
$f: X \rightarrow Y$ where$X$ is documents,$Y$ is factoids. Let$D_{L}\in\Delta(X)$ be the ground-truth distribution and denote the induced factoid distribution by$p(y):=\sum_{x:f(x)=y}D_{L}(x)$ . Correspondingly Let$g(y)=\sum_{x:f(x)=y}D_{LM}(x)$ be the factoid distributed induced from Language model. - Good training data:
$F := supp(p) \cup \{ \perp \}$ where$F$ is facts,$supp(p) = \{y \in Y | p(y) > 0 \}$ . That is, training dataset only contains facts. The set of hallucinations is$H := Y \backslash F$ . - Sparse facts: There exists
$s \in R$ such that, with probability 1 over$D_L ∼ D_{world}$ :$$|F| \leq e^{-s}|H|, s \geq 0$$ - Regularity:
$D_{world}$ has regular facts if for all$x_{train} \in supp(D_{train})$ :$$Pr[y \in F |x_{train}] = Pr[y′ \in F | x_{train}], \forall y, y' \in U$$
For
(Coarsening) Given any set
The average probability over each bin that shares a common value of
(Approx Calibration)
(Miscalibration) The miscalibration of
Fix
Proof. Recall Markov's inequality