# 🎯 Verbalized Sampling: Complete Framework Demo

## Introduction

**Ask for a distribution, not a sample.**

This notebook demonstrates the complete `verbalized-sampling` framework. Instead of asking an LLM for one answer, we ask it for a small **distribution**—`k` candidates with **weights**—then deterministically **filter → normalize → order**, and choose via **`dist.argmax()`** or **`dist.sample()`**.

### What you'll learn:
1. **Quick Start** - One-liner usage with `verbalize()`
2. **Core API** - `DiscreteDist`, `Item`, selection methods
3. **Transforms** - `map()`, `filter_items()`, `reweight()`
4. **Practical Recipes** - Creative writing, QA, synthetic data
5. **Advanced Features** - Weight modes, serialization, metadata inspection

In [None]:
#@title Setup: Install and Import
# !pip install verbalized-sampling

import os
from verbalized_sampling import verbalize, select, DiscreteDist, Item
from IPython.display import display, Markdown
import json

# Set your API key (or use environment variables)
os.environ["OPENAI_API_KEY"] = ''

---

## 1. Quick Start (plan.md §0)

The simplest way to use verbalized sampling:

In [2]:
# Learn a tiny distribution from the model
dist = verbalize(
    "Write an opening line for a mystery novel",
    k=5,
    tau=0.12,
    temperature=0.9,
    seed=42,
)

# Quick view of items & normalized masses
print(dist.to_markdown())
print()

# Deterministic top item
best = dist.argmax()
print(f"Best (argmax): {best.text}")
print()

# Seeded weighted sample
choice = dist.sample(seed=7)
print(f"Sampled (seed=7): {choice.text}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': "The morning Harold found the silver locket on his doorstep, he realized, too late, that yesterday's shadows had followed him home.", 'probability': 0.22}, {'response': 'The clock struck midnight when the scream shattered the silence, echoing through the empty halls of Wrenwood Manor.', 'probability': 0.19}, {'response': 'No one noticed the missing painting at first, except for the one person who knew exactly why it had to disappear.', 'probability': 0.21}, {'response': 'As the sun melted into the horizon, Detective Marsh opened the letter that would unravel the quiet town’s darkest secret.', 'probability': 0.17}, {'response': 'They say the dead can’t speak, but the message scrawled in dust on the old piano begged to differ.', 'probability': 0.21}]
# verbalized-sampling
k=5  τ=0.12  Σp=1.000  model=gpt-4.1

1. 0.220  "The morning Harold found the silver lo

### 🎯 Key Takeaway

**What you get:** A `DiscreteDist[Item]` that's:
- ✅ **Auditable** - See all candidates with their weights
- ✅ **Serializable** - Save/load distributions
- ✅ **Composable** - Transform with `map()`, `filter()`, `reweight()`
- ✅ **Ergonomic** - Simple selection methods on the distribution itself

---

## 2. Core API: Understanding the Objects

### 2.1 The `Item` Object

In [3]:
# Get the top item
top_item = dist.argmax()

print("Item attributes:")
print(f"  text: {top_item.text[:60]}...")
print(f"  p (normalized): {top_item.p:.4f}")
print(f"  meta keys: {list(top_item.meta.keys())}")
print()

# Inspect metadata
print("Metadata details:")
print(f"  p_raw: {top_item.meta.get('p_raw')}")
print(f"  p_clipped: {top_item.meta.get('p_clipped')}")
print(f"  repairs: {top_item.meta.get('repairs')}")
print(f"  idx_orig: {top_item.meta.get('idx_orig')}")

Item attributes:
  text: The morning Harold found the silver locket on his doorstep, ...
  p (normalized): 0.2200
  meta keys: ['p_raw', 'p_clipped', 'repairs', 'idx_orig']

Metadata details:
  p_raw: 0.22
  p_clipped: 0.22
  repairs: []
  idx_orig: 0


### 2.2 The `DiscreteDist` Object

In [4]:
# DiscreteDist is a sequence
print(f"Length: {len(dist)}")
print(f"First item: {dist[0].text[:40]}...")
print()

# Access all items and probabilities
print("All items (descending by p):")
for i, item in enumerate(dist, 1):
    print(f"  {i}. p={item.p:.3f} - {item.text[:50]}...")
print()

# Get probability list
print(f"Probabilities: {[round(p, 3) for p in dist.p]}")
print(f"Sum of probabilities: {sum(dist.p):.6f}")

Length: 5
First item: The morning Harold found the silver lock...

All items (descending by p):
  1. p=0.220 - The morning Harold found the silver locket on his ...
  2. p=0.210 - No one noticed the missing painting at first, exce...
  3. p=0.210 - They say the dead can’t speak, but the message scr...
  4. p=0.190 - The clock struck midnight when the scream shattere...
  5. p=0.170 - As the sun melted into the horizon, Detective Mars...

Probabilities: [0.22, 0.21, 0.21, 0.19, 0.17]
Sum of probabilities: 1.000000


### 2.3 Trace Metadata

Every distribution includes detailed trace information for debugging and auditing:

In [5]:
print("Trace metadata:")
for key, value in dist.trace.items():
    print(f"  {key}: {value}")

Trace metadata:
  model: gpt-4.1
  provider: openai
  latency_ms: 4613.29984664917
  k: 5
  tau: 0.12
  temperature: 0.9
  seed: 42
  use_strict_json: True
  probability_definition: explicit
  tau_relaxed: False
  tau_final: 0.12
  num_filtered: 0
  weight_mode: elicited


---

## 3. Selection Methods

Two ways to select from a distribution:

In [6]:
dist = verbalize("Write a creative greeting", k=5, temperature=0.8, seed=100)

# Method 1: Direct methods on distribution
best = dist.argmax()
sampled = dist.sample(seed=42)

print("Direct methods:")
print(f"  argmax(): {best.text}")
print(f"  sample(seed=42): {sampled.text}")
print()

# Method 2: Using select() helper
best2 = select(dist, strategy="argmax")
sampled2 = select(dist, strategy="sample", seed=42)

print("Using select() helper:")
print(f"  select(dist, 'argmax'): {best2.text}")
print(f"  select(dist, 'sample', seed=42): {sampled2.text}")
print()

# Demonstrate sampling variability
print("Multiple samples (different seeds):")
for i in range(5):
    item = dist.sample(seed=200 + i)
    print(f"  {i+1}. {item.text}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Salutations, fellow traveler of the cosmos! May your journey today be filled with stardust and serendipity.', 'probability': 0.12}, {'response': 'Hey there, radiant soul! Ready to make some magic happen today?', 'probability': 0.18}, {'response': 'Greetings and jubilations! May your day shimmer with possibilities and cheerful surprises.', 'probability': 0.15}, {'response': 'Ahoy, adventurer! The universe just rolled out a red carpet for your arrival.', 'probability': 0.2}, {'response': 'Waves a wand✨ Hello and welcome! May your day unfold like a storybook of wonders.', 'probability': 0.13}]
Direct methods:
  argmax(): Ahoy, adventurer! The universe just rolled out a red carpet for your arrival.
  sample(seed=42): Greetings and jubilations! May your day shimmer with possibilities and cheerful surprises.

Using select() helper:
  select(dist, 'argmax'): Aho

---

## 4. Transforms: Functional Operations (plan.md §13.4)

All transforms return new `DiscreteDist` objects and preserve invariants (Σp=1, descending order).

### 4.1 `map()` - Transform text while preserving probabilities

In [7]:
dist = verbalize("Write a color name", k=5, temperature=0.8, seed=300)

print("Original:")
for item in dist[:3]:
    print(f"  {item.p:.3f} - {item.text}")
print()

# Transform: uppercase
uppercased = dist.map(lambda it: it.text.upper())
print("After map(uppercase):")
for item in uppercased[:3]:
    print(f"  {item.p:.3f} - {item.text}")
print()

# Transform: add prefix
prefixed = dist.map(lambda it: f"Color: {it.text.strip()}")
print("After map(add prefix):")
for item in prefixed[:3]:
    print(f"  {item.p:.3f} - {item.text}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Cerulean', 'probability': 0.14}, {'response': 'Mauve', 'probability': 0.11}, {'response': 'Amber', 'probability': 0.16}, {'response': 'Turquoise', 'probability': 0.17}, {'response': 'Chartreuse', 'probability': 0.09}]
Original:
  0.362 - Turquoise
  0.340 - Amber
  0.298 - Cerulean

After map(uppercase):
  0.362 - TURQUOISE
  0.340 - AMBER
  0.298 - CERULEAN

After map(add prefix):
  0.362 - Color: Turquoise
  0.340 - Color: Amber
  0.298 - Color: Cerulean


### 4.2 `filter_items()` - Remove items and renormalize

In [8]:
dist = verbalize("Write a product name (1-2 words)", k=8, temperature=0.9, seed=400)

print(f"Original ({len(dist)} items):")
for item in dist:
    print(f"  {item.p:.3f} - {item.text} (length={len(item.text)})")
print()

# Filter: keep only short responses
short = dist.filter_items(lambda it: len(it.text) < 20)
print(f"After filter (length < 20): {len(short)} items")
for item in short:
    print(f"  {item.p:.3f} - {item.text}")
print()

# Verify renormalization
total = sum(item.p for item in short)
print(f"Total probability after filter: {total:.6f}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'AquaBliss', 'probability': 0.11}, {'response': 'ZenCharge', 'probability': 0.1}, {'response': 'SwiftSync', 'probability': 0.14}, {'response': 'GlowMate', 'probability': 0.13}, {'response': 'PureNest', 'probability': 0.09}, {'response': 'EcoScribe', 'probability': 0.12}, {'response': 'Lumeo', 'probability': 0.17}, {'response': 'SnapLink', 'probability': 0.14}]
Original (5 items):
  0.243 - Lumeo (length=5)
  0.200 - SwiftSync (length=9)
  0.200 - SnapLink (length=8)
  0.186 - GlowMate (length=8)
  0.171 - EcoScribe (length=9)

After filter (length < 20): 5 items
  0.243 - Lumeo
  0.200 - SwiftSync
  0.200 - SnapLink
  0.186 - GlowMate
  0.171 - EcoScribe

Total probability after filter: 1.000000


### 4.3 `reweight()` - Recompute probabilities and renormalize

In [9]:
dist = verbalize("Write a city name", k=6, temperature=0.8, seed=500)

print("Original probabilities:")
for item in dist:
    print(f"  {item.p:.3f} - {item.text} (length={len(item.text)})")
print()

# Reweight: favor longer city names
reweighted = dist.reweight(lambda it: len(it.text))
print("After reweight(by length):")
for item in reweighted:
    print(f"  {item.p:.3f} - {item.text} (length={len(item.text)})")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Tokyo', 'probability': 0.18}, {'response': 'San Francisco', 'probability': 0.13}, {'response': 'Cairo', 'probability': 0.12}, {'response': 'Paris', 'probability': 0.2}, {'response': 'Sydney', 'probability': 0.17}, {'response': 'Berlin', 'probability': 0.1}]
Original probabilities:
  0.250 - Paris (length=5)
  0.225 - Tokyo (length=5)
  0.212 - Sydney (length=6)
  0.163 - San Francisco (length=13)
  0.150 - Cairo (length=5)

After reweight(by length):
  0.382 - San Francisco (length=13)
  0.176 - Sydney (length=6)
  0.147 - Paris (length=5)
  0.147 - Tokyo (length=5)
  0.147 - Cairo (length=5)


### 4.4 Chained Transforms (plan.md §13.4)

In [10]:
dist = verbalize("Write a sentence about the ocean", k=8, temperature=0.9, seed=600)

print(f"Original ({len(dist)} items):")
for item in dist[:3]:
    print(f"  {item.p:.3f} - {item.text[:60]}...")
print()

# Chain: clean → filter → reweight
cleaned = (dist
           .map(lambda it: it.text.strip())
           .filter_items(lambda it: len(it.text) < 100)
           .reweight(lambda it: it.meta.get("p_raw", 0.1)))

print(f"After chain ({len(cleaned)} items):")
for item in cleaned:
    print(f"  {item.p:.3f} - {item.text[:60]}...")
print()

# Verify invariants
total = sum(item.p for item in cleaned)
print(f"Total probability: {total:.6f}")
print(f"Descending order: {all(cleaned[i].p >= cleaned[i+1].p for i in range(len(cleaned)-1))}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'The vast, blue ocean stretches endlessly towards the horizon, its waves shimmering under the sunlight.', 'probability': 0.19}, {'response': "Beneath the ocean's surface, a diverse world of marine life thrives in mysterious depths.", 'probability': 0.14}, {'response': 'The soothing sound of ocean waves brings a sense of calm to all who listen.', 'probability': 0.12}, {'response': "The ocean covers more than seventy percent of the Earth's surface, connecting continents and cultures.", 'probability': 0.16}, {'response': 'Gentle sea breezes carry the salty scent of the ocean across sandy shores.', 'probability': 0.09}, {'response': 'Every day, the tides of the ocean rise and fall, shaping coastlines and ecosystems.', 'probability': 0.11}, {'response': "The ocean's deep blue waters hide countless secrets yet to be discovered.", 'probability': 0.12}, {'response

---

## 5. Practical Recipes (plan.md §13)

### 5.1 Creative Writing - Diversity Without Quality Loss

In [11]:
dist = verbalize("Write five first lines for a cozy mystery", k=5, tau=0.12, seed=42)

print(dist.to_markdown())
print()

best = dist.argmax()
print(f"Best opening (argmax): {best.text}")
print(f"Probability: {best.p:.3f}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Maggie had never expected her quiet life above the bakery to be interrupted by the distant clang of a murder weapon dropped in the alleyway.', 'probability': 0.17}, {'response': 'The teapot whistled shrilly as Violet peered through her lace curtains, watching the new neighbor bury something suspicious in the garden.', 'probability': 0.22}, {'response': "Snow fell softly over Maplewood Lane the morning Mrs. Dalloway found a pair of muddy footprints leading straight to the library's forbidden shelf.", 'probability': 0.19}, {'response': "As the sun rose over Pinegrove, the sleepy town woke to discover the mayor's prize-winning marmalade missing from the county fair tent.", 'probability': 0.15}, {'response': "Eleanor's cat had never brought her anything more alarming than a daisy—until this morning, when it dropped a blood-stained glove at her doorstep.", 'pr

### 5.2 Open-Ended QA - Multi-Valid Answers

In [12]:
dist = verbalize("Name a US state", k=20, tau=0.10, seed=100)

# Calculate unique coverage
unique_states = {it.text.lower().strip() for it in dist}
coverage = len(unique_states)

print(f"Generated {len(dist)} items")
print(f"Unique states: {coverage}")
print(f"Top-1: {dist.argmax().text} (p={dist.argmax().p:.3f})")
print()

print("All states generated:")
for i, item in enumerate(dist, 1):
    print(f"  {i:2d}. {item.p:.3f} - {item.text}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'California is a US state located on the west coast, known for its diverse landscapes, large population, and major cities such as Los Angeles, San Francisco, and San Diego.', 'probability': 0.12}, {'response': 'Texas is a US state in the south-central region, recognized for its size, oil industry, and distinct culture.', 'probability': 0.09}, {'response': 'Florida is a southeastern US state famous for its beaches, theme parks, and warm climate.', 'probability': 0.09}, {'response': 'New York is a northeastern US state, home to the city of New York and known for its cultural and financial influence.', 'probability': 0.08}, {'response': 'Illinois is a midwestern US state, known for the city of Chicago and its contribution to American industry.', 'probability': 0.07}, {'response': 'Ohio is a US state located in the Midwest, notable for its major cities and ric

### 5.3 Negative Synthetic Data - Plausible But Wrong

In [13]:
problem = """
A store sells apples for $2 each and oranges for $3 each.
If you buy 5 apples and 4 oranges, how much do you spend in total?
"""

prompt = f"""Give five plausible but incorrect solution sketches for this math problem:
{problem.strip()}

Each should be a common mistake students make (e.g., wrong operation, calculation error, misreading)."""

dist = verbalize(prompt, k=5, tau=0.12, temperature=0.9, seed=200)

print("Plausible negative examples (for contrastive training):")
print()
for i, item in enumerate(dist, 1):
    print(f"{i}. (p={item.p:.3f})")
    print(f"   {item.text[:150]}...")
    print()

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'First, add the prices together: $2 + $3 = $5. Then, multiply by the total number of fruits: 5 apples + 4 oranges = 9 fruits. $5 × 9 = $45. So, you spend $45 in total.', 'probability': 0.21}, {'response': 'Multiply the number of apples by the number of oranges: 5 × 4 = 20. Then, multiply by the price of apples: 20 × $2 = $40. So, the total is $40.', 'probability': 0.17}, {'response': 'Multiply the number of apples and oranges by their prices separately: 5 × $2 = $10 and 4 × $3 = $12. Then add the prices of apples and oranges together first: $2 + $3 = $5. Then, multiply by the total number of fruits: 9 × $5 = $45. So, you spend $45 in total.', 'probability': 0.23}, {'response': 'Take the price of one apple, $2, and multiply it by the total number of fruits bought: 9 × $2 = $18. So, you spend $18 in total.', 'probability': 0.18}, {'response': 'Add the number

---

## 6. Weight Modes

Three strategies for handling weights: `elicited`, `uniform`, `softmax`

In [14]:
prompt = "Name an occupation that requires a college degree"

# Elicited: use model's probabilities (if valid)
dist_elicited = verbalize(prompt, k=5, weight_mode="elicited", seed=42)
print("Elicited weights:")
for item in dist_elicited:
    print(f"  {item.p:.3f} - {item.text}")
print()

# Uniform: equal weights (bias mitigation)
dist_uniform = verbalize(prompt, k=5, weight_mode="uniform", seed=42)
print("Uniform weights (equal probability):")
for item in dist_uniform:
    print(f"  {item.p:.3f} - {item.text}")
print()

# Softmax: smooth model's probabilities
dist_softmax = verbalize(prompt, k=5, weight_mode="softmax", seed=42)
print("Softmax weights:")
for item in dist_softmax:
    print(f"  {item.p:.3f} - {item.text}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': "An occupation that requires a college degree is a civil engineer. Civil engineers typically need at least a bachelor's degree in civil engineering or a related field to enter the profession. Their work involves designing, building, and maintaining infrastructure such as roads, bridges, and water systems, often requiring specialized education and training that a college degree provides.", 'probability': 0.34}, {'response': 'A registered nurse often requires a college degree, such as a Bachelor of Science in Nursing (BSN), to practice professionally in hospitals and clinics.', 'probability': 0.27}, {'response': 'An architect is an occupation that requires a college degree, typically a Bachelor’s or Master’s in Architecture, to be licensed and practice professionally.', 'probability': 0.15}, {'response': "A software developer is a profession that usually req

---

## 7. Serialization

Save and load distributions for caching or auditing:

In [15]:
# Generate distribution
dist = verbalize("Write a one-sentence joke", k=3, seed=42)

# Serialize to dict
data = dist.to_dict()
print("Serialized to dict:")
print(f"  Items: {len(data['items'])}")
print(f"  Trace keys: {list(data['trace'].keys())}")
print()

# Pretty print as JSON
print("JSON representation (first 500 chars):")
json_str = json.dumps(data, indent=2)
print(json_str[:500] + "...")
print()

# Reconstruct
reconstructed = DiscreteDist.from_dict(data)
print("Reconstructed distribution:")
print(f"  Length: {len(reconstructed)}")
print(f"  Top item: {reconstructed.argmax().text}")
print(f"  Probabilities match: {reconstructed.p == dist.p}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Why did the scarecrow win an award? Because he was outstanding in his field!', 'probability': 0.24}, {'response': 'Parallel lines have so much in common—it’s a shame they’ll never meet.', 'probability': 0.17}, {'response': 'I asked the gym instructor if he could teach me to do the splits, but he said, “How flexible are you?” and I said, “I can’t make it on Tuesdays.”', 'probability': 0.12}]
Serialized to dict:
  Items: 3
  Trace keys: ['model', 'provider', 'latency_ms', 'k', 'tau', 'temperature', 'seed', 'use_strict_json', 'probability_definition', 'tau_relaxed', 'tau_final', 'num_filtered', 'weight_mode']

JSON representation (first 500 chars):
{
  "items": [
    {
      "text": "Why did the scarecrow win an award? Because he was outstanding in his field!",
      "p": 0.45283018867924524,
      "meta": {
        "p_raw": 0.24,
        "p_clipped": 0.24,


---

## 8. Advanced: Tau Relaxation

Demonstrates how `tau` filtering works with `min_k_survivors`:

In [16]:
prompt = "Write a product slogan for eco-friendly water bottles"

# Low tau = more diversity
dist_diverse = verbalize(prompt, k=8, tau=0.05, temperature=0.9, seed=500)
print(f"Low tau (0.05): {len(dist_diverse)} items survived")
print(f"  Min probability: {min(it.p for it in dist_diverse):.3f}")
print(f"  Tau relaxed: {dist_diverse.trace.get('tau_relaxed')}")
print()

# High tau = less diversity
dist_focused = verbalize(prompt, k=8, tau=0.20, temperature=0.9, seed=500)
print(f"High tau (0.20): {len(dist_focused)} items survived")
print(f"  Min probability: {min(it.p for it in dist_focused):.3f}")
print(f"  Tau relaxed: {dist_focused.trace.get('tau_relaxed')}")
print()

# Very high tau with min_k_survivors
dist_relaxed = verbalize(prompt, k=8, tau=0.50, min_k_survivors=3, temperature=0.9, seed=500)
print(f"Very high tau (0.50) with min_k_survivors=3:")
print(f"  Tau relaxed: {dist_relaxed.trace.get('tau_relaxed')}")
print(f"  Tau final: {dist_relaxed.trace.get('tau_final', 0):.3f}")
print(f"  Items: {len(dist_relaxed)}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Sip Sustainably, Refresh Responsibly.', 'probability': 0.16}, {'response': 'Drink Green, Live Clean.', 'probability': 0.14}, {'response': 'Hydrate Your World—One Bottle at a Time.', 'probability': 0.1}, {'response': 'Good for You, Better for the Planet.', 'probability': 0.13}, {'response': 'Pure Water, Pure Conscience.', 'probability': 0.12}, {'response': 'Quench Your Thirst, Conserve the Earth.', 'probability': 0.09}, {'response': 'Eco Chic. Earth Friendly. Always Hydrated.', 'probability': 0.11}, {'response': 'Refill. Reuse. Restore the Planet.', 'probability': 0.15}]
Low tau (0.05): 8 items survived
  Min probability: 0.090
  Tau relaxed: False

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Sip Sustainably, Refresh Responsibly.', 'probability': 0.16}, {'response': 'D

---

## 9. Provider Selection

In [17]:
prompt = "Write a haiku about programming"

# Auto-detect provider based on API keys
dist_auto = verbalize(prompt, k=3, provider="auto")
print(f"Auto provider:")
print(f"  Provider: {dist_auto.trace.get('provider')}")
print(f"  Model: {dist_auto.trace.get('model')}")
print()

# Explicit provider and model
dist_explicit = verbalize(prompt, k=3, provider="openai", model="gpt-4o-mini")
print(f"Explicit provider:")
print(f"  Provider: {dist_explicit.trace.get('provider')}")
print(f"  Model: {dist_explicit.trace.get('model')}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Silent code at night,\nLogic weaving through the keys—\nBugs fade in the dawn.', 'probability': 0.32}, {'response': 'Lines of code appear,\nIdeas bloom in silence—\nDreams debugged by light.', 'probability': 0.28}, {'response': 'Bits and bytes align,\nWhile the cursor softly blinks—\nNew worlds are created.', 'probability': 0.22}]
Auto provider:
  Provider: openai
  Model: gpt-4.1

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Lines of code entwine,  \nLogic dances in silence,  \nDreams in bits refined.', 'probability': 0.45}, {'response': 'Fingers on the keys,  \nSyntax whispers in the dark,  \nIdeas come to life.', 'probability': 0.35}, {'response': 'Debugging the night,  \nErrors fade like distant stars,  \nA new dawn of code.', 'probability': 0.2}]
Explicit provider

---

## 10. End-to-End Example (plan.md §22)

In [18]:
# 1) Elicit a tiny distribution
dist = verbalize(
    "Write five short product taglines for a minimalist desk lamp.",
    k=5, tau=0.12, temperature=0.9, seed=42
)

# 2) Inspect
print(dist.to_markdown())
print()

# 3) Serialize (for audit or caching)
payload = dist.to_dict()
print(f"Serialized: {len(json.dumps(payload))} bytes")
print()

# 4) Deterministic top pick
print(f"Top pick (argmax): {dist.argmax().text}")
print()

# 5) Seeded weighted sampling (repeatable)
print(f"Sampled (seed=42): {dist.sample(seed=42).text}")
print()

# 6) Optional neutral helper (same as methods)
print(f"Using select(): {select(dist, 'argmax').text}")

Tuning probability to 0
Distribution definition: Randomly sample the responses from the full distribution.
responses: [{'response': 'Illuminate Simplicity. Elevate Productivity.', 'probability': 0.19}, {'response': 'Where Form Meets Function. Light, Refined.', 'probability': 0.24}, {'response': 'Pure Light. Pure Focus. Pure Minimalism.', 'probability': 0.17}, {'response': 'A Brighter Workspace, Minimal Fuss.', 'probability': 0.21}, {'response': 'Sleek Design. Uncluttered Illumination.', 'probability': 0.19}]
# verbalized-sampling
k=5  τ=0.12  Σp=1.000  model=gpt-4.1

1. 0.240  "Where Form Meets Function. Light, Refined."  []
2. 0.210  "A Brighter Workspace, Minimal Fuss."  []
3. 0.190  "Illuminate Simplicity. Elevate Productivity."  []
4. 0.190  "Sleek Design. Uncluttered Illumination."  []
5. 0.170  "Pure Light. Pure Focus. Pure Minimalism."  []

Serialized: 995 bytes

Top pick (argmax): Where Form Meets Function. Light, Refined.

Sampled (seed=42): Illuminate Simplicity. Elevate Prod

---

# Summary & Key Takeaways

## What We've Demonstrated

✅ **One-Liner API**: `verbalize()` returns a `DiscreteDist` with ergonomic selection

✅ **Deterministic Semantics**: Filter → Normalize → Order pipeline with full auditability

✅ **Distribution-First Mental Model**: Work with distributions, not single samples

✅ **Functional Transforms**: `map()`, `filter_items()`, `reweight()` preserve invariants

✅ **Practical Recipes**: Creative writing, QA, synthetic data generation

## Core Principles

1. **Ask for a distribution, not a sample** - Get `k` candidates with weights in one call
2. **Weights are sampling masses** - Not calibrated probabilities; we normalize and expose repairs
3. **Deterministic & auditable** - Stable ordering, recorded repairs, full trace metadata
4. **Composable & safe** - Transforms preserve invariants (Σp=1, descending order)

## Key Parameters

- **`k`**: Number of candidates (breadth)
- **`tau`**: Probability threshold for filtering (diversity knob)
- **`temperature`**: LLM sampling temperature (orthogonal to VS)
- **`weight_mode`**: How to normalize (`elicited`, `uniform`, `softmax`)
- **`seed`**: For reproducibility

## When to Use Verbalized Sampling

✨ **Creative tasks** - Story openings, product names, brainstorming

✨ **Open-ended QA** - Multiple valid answers, coverage metrics

✨ **Synthetic data** - Generate diverse training examples with weights

✨ **Bias mitigation** - Use `weight_mode="uniform"` to equalize probabilities

✨ **Exploration** - Sample from the distribution's "tails" for novelty

---

## 🔗 Resources

- [ArXiv Paper](https://arxiv.org/abs/2510.01171)
- [GitHub Repository](https://github.com/CHATS-lab/verbalized-sampling)
- [PyPI Package](https://pypi.org/project/verbalized-sampling/)
- [Documentation](https://github.com/CHATS-lab/verbalized-sampling/blob/main/README.md)