```{contents}
```

## Sampling Stratergies


### Why Sampling Strategies Are Needed

During generation, an LLM outputs a **probability distribution** over its entire vocabulary (50k–100k tokens).
For example:

```
Next-token probabilities:
mat    → 0.62
floor  → 0.18
sofa   → 0.09
dog    → 0.02
…
```

Sampling strategies determine **which token to pick** from this distribution.

Different strategies control:

* **creativity**
* **stability**
* **coherence**
* **randomness**
* **repetition**

---

### The Main Sampling Strategies

---

#### Greedy Sampling (Argmax)

##### How it works

Pick the token with the **highest probability**.

```
token = argmax(probabilities)
```

**Pros**

* Deterministic
* Simple

**Cons**

* Repetitive
* Low creativity
* Can get stuck (“the the the the…”)

**Use case**

* Factual extraction
* Classification
* Low-risk outputs

---

### Temperature Scaling

**How it works**

Adjusts the “sharpness” of the probability distribution using a **temperature value T**:

* **T < 1** → makes distribution sharper → more predictable
* **T > 1** → makes distribution flatter → more random

**Example**

* T = 0.7 → safer, more focused
* T = 1.0 → neutral
* T = 1.5 → creative, risky

**Use case**

Control level of creativity.

---

### Top-k Sampling

**How it works**

Keep only the **top k** tokens with highest probability.

Example: k=5
Keep:

```
mat, floor, sofa, dog, wall
```

Drop the rest → re-normalize → sample from these.

**Pros**

* Avoids low-quality tokens
* Good control over randomness

**Cons**

* k is fixed → may include too many or too few tokens

### Use case

General text generation with moderate creativity.

---

### Top-p Sampling (Nucleus Sampling)

**How it works**

Choose the **smallest set of tokens** whose total probability ≥ p (e.g. 0.9).

Example with p = 0.9:

```
mat (0.62)
floor (0.18)
sofa (0.09)
----------------
sum = 0.89 → add one more token
```

**Pros**

* Adaptive (dynamic set size)
* Keeps meaningful tokens
* Higher-quality output than top-k

**Cons**

* Slightly more complex

**Use case**

Most modern chat and creativity tasks (default in many LLMs).

---

### Typical Sampling

**How it works**

Keeps tokens that fall within a **typical entropy range** of language.

Meaning:

* Removes tokens that are too predictable
* Removes tokens that are too surprising

**Pros**

* Very natural sentences
* Good balance of creativity + coherence

**Use case**

Story writing, dialogue, long-form content.

---

### Repetition Penalty / No-Repeat N-Gram

**How it works**

Penalizes tokens (or sequences) that were generated recently.

Example penalty:

```
If token "cat" appeared many times → reduce its probability next time.
```

**Pros**

* Reduces loops or stuck patterns

**Use case**

Long documents, storytelling, chatbot conversations.

---

### Beam Search (Less common in LLMs)

**How it works**

Keeps multiple candidate sequences (“beams”) and expands them in parallel.

**Pros**

* Useful in translation tasks
* Tries to find best global sequence

**Cons**

* Slow
* Produces bland text
* Not used in modern LLM conversation

---

### What Modern LLMs Actually Use

Most production LLMs (GPT, Claude, LLaMA, Mistral) use a combination:

* **Top-p sampling**
* **Temperature**
* **Repetition penalty**

Example default configuration:

```
temperature = 0.7
top_p = 0.9
repetition_penalty = 1.1
```

---

**Summary Table**

| Strategy               | Randomness  | Creativity  | Quality    | Notes                     |
| ---------------------- | ----------- | ----------- | ---------- | ------------------------- |
| **Greedy**             | None        | Very low    | Low–medium | Deterministic, repetitive |
| **Temperature**        | Adjustable  | Adjustable  | Good       | Soft adjustments          |
| **Top-k**              | Medium      | Medium      | Good       | Limits candidate tokens   |
| **Top-p**              | Medium–High | Medium–High | Excellent  | Adaptive, most used       |
| **Typical**            | Medium      | High        | Very High  | Human-like phrasing       |
| **Repetition Penalty** | N/A         | N/A         | Higher     | Prevents loops            |
| **Beam Search**        | Low         | Low         | Medium     | Structured tasks only     |

---

**One-Sentence Explanation**

**Sampling strategies decide how an LLM chooses its next token, balancing randomness, coherence, and creativity to produce high-quality text.**