
---

# 🎲 **Sampling** – *Control Randomness in LLM Responses*

---

## 📌 What It Does

**Sampling** lets you fine-tune the **creativity vs. consistency** of LLM-generated outputs using parameters like `temperature`, `top_p`, and `max_tokens`.

It’s useful when you want to **balance creativity, reliability, and diversity** in completions across different use-cases.

---

## 🚀 Common Use-Cases

| Scenario               | Why Use It                                        |
| ---------------------- | ------------------------------------------------- |
| ✍️ Creative writing    | Use higher randomness for stories, poems          |
| 🤖 Chatbots            | Balanced sampling ensures varied but safe replies |
| 🧠 Brainstorming ideas | Generates diverse concepts using higher sampling  |
| ✅ Data extraction      | Use low randomness for repeatable structure       |

---

## ⚙️ Key Parameters

| Parameter     | Type    | Description                                                               |
| ------------- | ------- | ------------------------------------------------------------------------- |
| `temperature` | `float` | Controls randomness: lower = consistent, higher = more creative (0.2–1.0) |
| `top_p`       | `float` | Nucleus sampling: only top % of tokens are considered (e.g. 0.9)          |
| `max_tokens`  | `int`   | Limits length of output to avoid runaway generations                      |
| `stop`        | `list`  | List of tokens where generation should halt (e.g., \["\n", "###"])        |

---

## 🎛️ When to Adjust

| Use-Case                 | Recommended Settings                     |
| ------------------------ | ---------------------------------------- |
| ✅ Deterministic task     | `temperature=0.1`, `top_p=1.0`           |
| 💡 Brainstorm ideas      | `temperature=0.8+`, `top_p=0.9`          |
| 📄 Structured outputs    | `temperature=0.3`, include stop tokens   |
| 🤹 Conversational agents | `temperature=0.6–0.9`, `top_p=0.85–0.95` |

---

## 🔍 Sample Usage

```python
from mcp.completion import complete

complete(
    prompt="Suggest three fun activities in Tokyo for a weekend trip",
    temperature=0.8,
    top_p=0.9,
    max_tokens=100
)
```

---

## 💡 Tip: Deterministic vs. Creative

| Mode            | Behavior                        |
| --------------- | ------------------------------- |
| `temperature=0` | Always same output              |
| `temperature=1` | Random, exploratory answers     |
| `top_p=1.0`     | All possible tokens considered  |
| `top_p=0.8`     | Top 80% probability tokens only |

---

## ✅ Summary

| Feature            | Description                                    |
| ------------------ | ---------------------------------------------- |
| Control randomness | Tune creativity with temperature and top\_p    |
| Safer outputs      | Use stop tokens and max\_tokens                |
| Versatile usage    | Creative, chat, data extraction, summarization |

---
