# 2.3 LLM Generation Parameters

## Playground Notebook

Generation parameters are the **knobs and dials** that control how a model produces text. They don't change *what* the model knows ‚Äî they change *how* it samples from its knowledge.

| Parameter | What It Controls |
|-----------|------------------|
| **Temperature** | Randomness and creativity in outputs |
| **Top-P** | Limits token selection to a cumulative probability threshold |
| **Top-K** | Limits sampling to only the K most likely tokens |
| **Max Tokens** | Maximum number of tokens the model can generate |
| **Frequency Penalty** | Penalizes tokens based on how often they've appeared |
| **Presence Penalty** | Penalizes any token that has appeared at least once |
| **Stop Sequences** | Strings that immediately halt generation |

---

In [1]:
import json
import time
from IPython.display import display, Markdown, HTML
from langchain_ollama import ChatOllama
from langchain_core.messages import SystemMessage, HumanMessage

# ============================================================
#  CONFIGURATION - Change the model name here if needed
# ============================================================
MODEL = "qwen2.5:1.5b"  # Options: "qwen2.5:1.5b", "llama3.2", "mistral", "gemma2", etc.

  from pydantic.v1.fields import FieldInfo as FieldInfoV1


In [2]:
# ============================================================
#  HELPER FUNCTIONS
# ============================================================

def generate(prompt, system="You are a helpful assistant.", **kwargs):
    """Send a prompt with custom generation parameters and return the response."""
    llm = ChatOllama(model=MODEL, **kwargs)
    messages = [SystemMessage(content=system), HumanMessage(content=prompt)]
    start = time.time()
    response = llm.invoke(messages)
    elapsed = time.time() - start
    content = response.content
    display(Markdown(content))
    print(f"\n‚è±Ô∏è {elapsed:.2f}s | {len(content)} chars")
    return content


def compare(prompt, configs, system="You are a helpful assistant."):
    """Run the same prompt with different parameter configs side by side."""
    results = {}
    for cfg in configs:
        label = cfg.pop("label")
        print(f"\n{'=' * 60}")
        print(f"  {label}")
        params_str = ', '.join(f'{k}={v}' for k, v in cfg.items())
        print(f"  Parameters: {params_str}")
        print(f"{'=' * 60}")
        results[label] = generate(prompt, system=system, **cfg)
        cfg["label"] = label  # restore label
    return results


print(f"‚úÖ Using model: {MODEL}")

‚úÖ Using model: qwen2.5:1.5b


---

## 1. Temperature ‚Äî Controlling Randomness

Temperature scales the probability distribution before sampling:

```
Temperature = 0.0  ‚Üí  Always pick the most likely token (deterministic)
Temperature = 0.7  ‚Üí  Balanced creativity (good default)
Temperature = 1.5  ‚Üí  Very random, creative, sometimes incoherent
```

**Think of it like a dial:**
```
FOCUSED ‚óÑ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∫ CREATIVE
  0.0       0.3    0.7    1.0      1.5+
```

### Experiment 1A: Low vs. Medium vs. High Temperature

In [4]:
prompt = "Write a one-sentence description of the ocean."

configs = [
    {"label": "üßä Temperature = 0.0 (Deterministic)", "temperature": 0.0},
    {"label": "‚öñÔ∏è Temperature = 0.7 (Balanced)",     "temperature": 0.7},
    {"label": "üî• Temperature = 1.5 (Very Creative)", "temperature": 1.5},
]

_ = compare(prompt, configs)


  üßä Temperature = 0.0 (Deterministic)
  Parameters: temperature=0.0


The vast and mysterious ocean covers approximately 71% of Earth's surface and is home to an incredible array of life forms, from tiny plankton to massive whales.


‚è±Ô∏è 1.90s | 161 chars

  ‚öñÔ∏è Temperature = 0.7 (Balanced)
  Parameters: temperature=0.7


The vast and mysterious ocean covers more than 70% of our planet's surface and is home to an incredible array of life forms.


‚è±Ô∏è 0.69s | 124 chars

  üî• Temperature = 1.5 (Very Creative)
  Parameters: temperature=1.5


The vast,ËîöËìù„ÄÅÊ≥¢ÊæúÂ£ÆÈòîÁöÑÊµ∑Ê¥ãÔºåÊòØÂú∞ÁêÉ‰∏äÁîüÂëΩÁöÑÈáçË¶ÅÊ∫êÊ≥â„ÄÇ


‚è±Ô∏è 0.58s | 32 chars


### Experiment 1B: Temperature & Consistency

At **temperature 0**, the model should produce the *same* output every time. At higher temperatures, each run differs. Let's verify.

In [5]:
prompt = "Name one color."
num_runs = 4

for temp in [0.0, 1.0]:
    print(f"\n{'=' * 60}")
    print(f"  Temperature = {temp} ‚Äî Running {num_runs} times")
    print(f"{'=' * 60}")
    outputs = []
    for i in range(num_runs):
        llm = ChatOllama(model=MODEL, temperature=temp)
        resp = llm.invoke([HumanMessage(content=prompt)])
        text = resp.content.strip()
        outputs.append(text)
        print(f"  Run {i+1}: {text[:80]}")
    unique = len(set(outputs))
    print(f"  ‚Üí Unique outputs: {unique}/{num_runs}")


  Temperature = 0.0 ‚Äî Running 4 times
  Run 1: Blue is one of the colors.
  Run 2: Blue is one of the colors.
  Run 3: Blue is one of the colors.
  Run 4: Blue is one of the colors.
  ‚Üí Unique outputs: 1/4

  Temperature = 1.0 ‚Äî Running 4 times
  Run 1: I'll choose blue as the color to name.
  Run 2: Sure! Here's a random choice: **Blue**. Blue is the color associated with calmne
  Run 3: Blue is one of the colors.
  Run 4: Blue is a color.
  ‚Üí Unique outputs: 4/4


### Experiment 1C: Temperature for Different Tasks

Different tasks need different temperature settings.

In [None]:
tasks = [
    {
        "label": "Factual Q&A (low temp is better)",
        "prompt": "What is the capital of Inida? Answer in one sentence.",
        "temps": [0.0, 0.7, 1.5]
    },
    {
        "label": "Creative Writing (higher temp is better)",
        "prompt": "Describe a sunset without using the word 'sun' or 'sky'. One sentence only.",
        "temps": [0.0, 0.7, 1.5]
    },
    {
        "label": "Code Generation (low temp is better)",
        "prompt": "Write a Python one-liner that reverses a string.",
        "temps": [0.0, 0.7, 1.5]
    }
]

for task in tasks:
    print(f"\n{'#' * 60}")
    print(f"  TASK: {task['label']}")
    print(f"{'#' * 60}")
    for temp in task["temps"]:
        print(f"\n--- Temperature = {temp} ---")
        _ = generate(task["prompt"], temperature=temp)

    print(f"\nüí° Observation: Compare how temperature affects accuracy vs. creativity above.")



############################################################
  TASK: Factual Q&A (low temp is better)
############################################################

--- Temperature = 0.0 ---


The boiling point of water at standard atmospheric pressure (1 atmosphere) is 100 degrees Celsius.


‚è±Ô∏è 0.69s | 98 chars

--- Temperature = 0.7 ---


The boiling point of water at standard atmospheric pressure (1 atmosphere) is 100 degrees Celsius.


‚è±Ô∏è 0.60s | 98 chars

--- Temperature = 1.5 ---


The boiling point of water at standard atmospheric pressure (1 atmosphere or 760 mmHg) is 100¬∞Celsius.


‚è±Ô∏è 0.97s | 102 chars

üí° Observation: Compare how temperature affects accuracy vs. creativity above.

############################################################
  TASK: Creative Writing (higher temp is better)
############################################################

--- Temperature = 0.0 ---


A gentle wave of orange and pink hues crept across the horizon, painting the twilight sky with its final strokes before night's embrace.


‚è±Ô∏è 1.72s | 136 chars

--- Temperature = 0.7 ---


As twilight approached, the colors of the setting sun painted the horizon in vivid strokes of orange and pink, while the mountains below were bathed in a gentle warmth that slowly faded into night.


‚è±Ô∏è 2.32s | 197 chars

--- Temperature = 1.5 ---


A golden orb descends, earthward, in gentle surrender, painting vast lands with hues of twilight's most exquisite palette: pinks, purples, and grays intertwine into serene vistas that seem to hold a cosmic whisper.


‚è±Ô∏è 2.92s | 214 chars

üí° Observation: Compare how temperature affects accuracy vs. creativity above.

############################################################
  TASK: Code Generation (low temp is better)
############################################################

--- Temperature = 0.0 ---


Here's a Python one-liner to reverse a string:

```python
reverse_string = lambda s: s[::-1]
```

This uses the `s[::-1]` syntax, which is a built-in slice notation in Python that returns a reversed copy of the string `s`. The `lambda` function creates an anonymous function that takes a single argument `s`, and returns its reverse.


‚è±Ô∏è 4.90s | 333 chars

--- Temperature = 0.7 ---


Here's a one-liner in Python to reverse a string:

```python
reversed_string = ''.join(reversed(string))
```

This line of code takes the input string, `string`, and uses the built-in `reversed()` function to reverse it character by character. The result is then joined back into a single string using the `join()` method with an empty string as the separator.

Note that this one-liner only works if you're working in Python 3.x where strings are Unicode, or if you want to explicitly cast the input to a string type (e.g., `str(input())`).


‚è±Ô∏è 9.47s | 541 chars

--- Temperature = 1.5 ---


Here is a one-line Python solution using slicing:

```python
reversed_string = s[::-1]
```

However, note that the slicing syntax `[::-1]` creates a copy of the original string in reverse order, so this is just a view and not an actual change to the original string. If you want to modify the original string, you can assign it back to itself:

```python
s = "Hello, World!"
reversed_string = s[::-1]
```

Alternatively, you could use Python's `reversed` function directly on the string:

```python
reversed_string = "".join(reversed("Hello, World!"))
```

However, note that this method also returns a new reversed version of the original string, so it is only for assignment and not modification.


‚è±Ô∏è 15.98s | 698 chars

üí° Observation: Compare how temperature affects accuracy vs. creativity above.


---

## 2. Top-P (Nucleus Sampling) ‚Äî Probability Threshold

Top-P limits the model to the **smallest set of tokens** whose cumulative probability adds up to P.

```
Top-P = 0.1  ‚Üí  Only the top ~10% probability mass (very focused)
Top-P = 0.9  ‚Üí  Top ~90% probability mass (more diverse)
Top-P = 1.0  ‚Üí  Consider all tokens (no filtering)
```

**Example ‚Äî Next token probabilities:**
```
Token:   "the"   "a"    "an"   "one"  "my"  "some" ...
Prob:     0.35   0.25   0.15   0.10   0.05   0.03  ...

Top-P=0.5 ‚Üí selects {"the", "a"} (0.35+0.25=0.60 ‚â• 0.5)
Top-P=0.9 ‚Üí selects {"the", "a", "an", "one", "my"}
```

### Experiment 2A: Top-P Narrow vs. Wide

In [8]:
prompt = "List 5 unusual hobbies someone might enjoy."

configs = [
    {"label": "üéØ Top-P = 0.1 (Very Focused)",  "top_p": 0.1, "temperature": 0.8},
    {"label": "‚öñÔ∏è Top-P = 0.5 (Moderate)",       "top_p": 0.5, "temperature": 0.8},
    {"label": "üåä Top-P = 0.95 (Diverse)",       "top_p": 0.95, "temperature": 0.8},
]

_ = compare(prompt, configs)


  üéØ Top-P = 0.1 (Very Focused)
  Parameters: top_p=0.1, temperature=0.8


1. Playing the banjo: This instrument is often associated with country music and bluegrass, but it can also be played as an unusual hobby.
2. Collecting rare stamps: Many people collect stamps for fun or to add to their collections of memorabilia.
3. Growing a garden: While many hobbies involve growing plants, some individuals enjoy the challenge of growing unique or exotic species that are not commonly found in local gardens.
4. Playing chess: Chess is often considered a boring hobby, but it can be enjoyed by those who find it challenging and intellectually stimulating.
5. Collecting vintage toys: Some people collect old toys as a way to preserve history and add nostalgia to their homes.


‚è±Ô∏è 3.30s | 697 chars

  ‚öñÔ∏è Top-P = 0.5 (Moderate)
  Parameters: top_p=0.5, temperature=0.8


1. Playing with slime: Some people find it fun to create and play with slimy substances.
2. Rock climbing: This adventurous hobby involves scaling steep rock faces, often outdoors or in indoor climbing walls.
3. Making jewelry from recycled materials: Instead of using new metals and gemstones, some individuals collect old items like broken glass, buttons, and metal scraps and transform them into beautiful jewelry pieces.
4. Growing a garden with unusual plants: Some people enjoy growing rare or exotic plants that are not commonly found in their area.
5. Building model airplanes: This hobby involves creating miniature versions of real aircraft using plastic, wood, and other materials.


‚è±Ô∏è 2.18s | 692 chars

  üåä Top-P = 0.95 (Diverse)
  Parameters: top_p=0.95, temperature=0.8


1. Collecting rare stamps or coins.
2. Playing Dungeons and Dragons or other tabletop role-playing games.
3. Learning to juggle or perform acrobatics for entertainment.
4. Growing exotic houseplants that require specific care and attention.
5. Knitting or crocheting clothes, accessories, or blankets that are both functional and beautiful.


‚è±Ô∏è 1.40s | 340 chars


### Experiment 2B: Temperature vs. Top-P ‚Äî They Work Together

Temperature reshapes probabilities *first*, then Top-P filters the result. Using both together gives fine-grained control.

In [9]:
prompt = "Invent a name for a fantasy tavern."

configs = [
    {"label": "Low Temp + Low Top-P (Most predictable)",   "temperature": 0.2, "top_p": 0.3},
    {"label": "Low Temp + High Top-P",                     "temperature": 0.2, "top_p": 0.95},
    {"label": "High Temp + Low Top-P",                     "temperature": 1.2, "top_p": 0.3},
    {"label": "High Temp + High Top-P (Most creative)",    "temperature": 1.2, "top_p": 0.95},
]

_ = compare(prompt, configs)


  Low Temp + Low Top-P (Most predictable)
  Parameters: temperature=0.2, top_p=0.3


"Drinking Springs"


‚è±Ô∏è 0.49s | 18 chars

  Low Temp + High Top-P
  Parameters: temperature=0.2, top_p=0.95


"Feastfire's Forge"


‚è±Ô∏è 0.30s | 19 chars

  High Temp + Low Top-P
  Parameters: temperature=1.2, top_p=0.3


"Druid's Den"


‚è±Ô∏è 0.20s | 13 chars

  High Temp + High Top-P (Most creative)
  Parameters: temperature=1.2, top_p=0.95


"The Shadowy Vale"


‚è±Ô∏è 0.20s | 18 chars


---

## 3. Top-K ‚Äî Fixed Token Pool Size

Top-K is simpler than Top-P: it always considers exactly the **K most likely tokens**, regardless of their probabilities.

```
Top-K = 1   ‚Üí  Greedy decoding (always pick the #1 token)
Top-K = 10  ‚Üí  Choose from top 10 tokens
Top-K = 50  ‚Üí  Choose from top 50 tokens (more variety)
```

### Experiment 3A: Top-K Values Compared

In [12]:
prompt = "Give me a one-word synonym for 'Work'."

configs = [
    {"label": "Top-K = 1 (Greedy ‚Äî always picks top token)",  "top_k": 1,  "temperature": 0.8},
    {"label": "Top-K = 5",                                     "top_k": 5,  "temperature": 0.8},
    {"label": "Top-K = 40 (Default for many models)",          "top_k": 40, "temperature": 0.8},
    {"label": "Top-K = 100 (Wide pool)",                       "top_k": 100, "temperature": 0.8},
]

_ = compare(prompt, configs)


  Top-K = 1 (Greedy ‚Äî always picks top token)
  Parameters: top_k=1, temperature=0.8


Job


‚è±Ô∏è 0.77s | 3 chars

  Top-K = 5
  Parameters: top_k=5, temperature=0.8


Task.


‚è±Ô∏è 3.22s | 5 chars

  Top-K = 40 (Default for many models)
  Parameters: top_k=40, temperature=0.8


Job


‚è±Ô∏è 3.42s | 3 chars

  Top-K = 100 (Wide pool)
  Parameters: top_k=100, temperature=0.8


Job


‚è±Ô∏è 3.02s | 3 chars


### Experiment 3B: Top-K Consistency Test

With `top_k=1`, the output should be identical every time (greedy). Let's check.

In [13]:
prompt = "What is 2 + 2? Reply with just the number."
num_runs = 4

for k_val in [1, 50]:
    print(f"\n{'=' * 60}")
    print(f"  Top-K = {k_val} ‚Äî Running {num_runs} times")
    print(f"{'=' * 60}")
    outputs = []
    for i in range(num_runs):
        llm = ChatOllama(model=MODEL, top_k=k_val, temperature=0.8)
        resp = llm.invoke([HumanMessage(content=prompt)])
        text = resp.content.strip()
        outputs.append(text)
        print(f"  Run {i+1}: {text[:80]}")
    unique = len(set(outputs))
    print(f"  ‚Üí Unique outputs: {unique}/{num_runs}")


  Top-K = 1 ‚Äî Running 4 times
  Run 1: 4
  Run 2: 4
  Run 3: 4
  Run 4: 4
  ‚Üí Unique outputs: 1/4

  Top-K = 50 ‚Äî Running 4 times
  Run 1: 4
  Run 2: 4
  Run 3: 4
  Run 4: 4
  ‚Üí Unique outputs: 1/4


### Top-K vs. Top-P ‚Äî When to Use Which?

| Feature | Top-K | Top-P |
|---------|-------|-------|
| Pool size | **Fixed** (always K tokens) | **Dynamic** (varies per step) |
| Adapts to confidence? | No | Yes |
| Best for | Simple control | Nuanced generation |
| Common defaults | K=40 | P=0.9 |

---

## 4. Max Tokens ‚Äî Controlling Response Length

**Max Tokens** (called `num_predict` in Ollama) sets a hard ceiling on how many tokens the model generates. It does NOT guarantee that length ‚Äî the model may stop earlier if it finishes its thought.

```
1 token ‚âà 4 characters ‚âà ¬æ of a word (English)
```

### Experiment 4A: Varying Max Tokens

In [14]:
prompt = "Explain the theory of relativity in detail."

configs = [
    {"label": "üîπ Max Tokens = 20 (Very Short)",   "num_predict": 20},
    {"label": "üîπ Max Tokens = 80 (Short)",        "num_predict": 80},
    {"label": "üîπ Max Tokens = 300 (Medium)",      "num_predict": 300},
]

_ = compare(prompt, configs)


  üîπ Max Tokens = 20 (Very Short)
  Parameters: num_predict=20


The theory of relativity is a physical theory that describes how space and time relate to matter, according


‚è±Ô∏è 1.47s | 107 chars

  üîπ Max Tokens = 80 (Short)
  Parameters: num_predict=80


The theory of relativity is a set of scientific theories developed by two physicists, Albert Einstein and Hendrik Lorentz, that describe how space and time relate to each other. The theory was first published in 1905 and has since become one of the most important theoretical frameworks in modern physics.

One of the key ideas behind the theory is that space and time are not absolute quantities -


‚è±Ô∏è 4.73s | 398 chars

  üîπ Max Tokens = 300 (Medium)
  Parameters: num_predict=300


The theory of relativity is a set of scientific theories that describes how space and time affect objects moving at different speeds or in different gravitational fields, including acceleration. It was developed by two physicists, Albert Einstein, first as special relativity (1905) and then general relativity (1915).

Special Relativity:
The theory of special relativity introduced the concepts of spacetime and the principle of relativity to account for how the laws of physics apply in all inertial frames. It describes the effects that would occur if objects were observed moving at high speeds or near a strong gravitational field.

Key ideas in special relativity include:

1. The speed of light is constant, regardless of the motion of the observer.
2. Time dilation: time slows down for an object moving at high speeds compared to observers on Earth.
3. Length contraction: objects appear shorter than their length when they are moving fast relative to a stationary observer.
4. Mass-energy equivalence: energy and mass can be converted into one another, as described by the equation E=mc^2.

General Relativity:
The theory of general relativity describes gravity as the curvature in spacetime caused by other massive objects occupying spacetime. This means that massive bodies warp spacetime, causing them to attract each other gravitationally because they are following geodesics through this curved space.

Key ideas in general relativity include:

1. Gravitational lensing: light bends around massive objects like stars or galaxies


‚è±Ô∏è 23.97s | 1544 chars


### Experiment 4B: Max Tokens ‚Äî Cutting Off Mid-Sentence

Watch what happens when the limit is too low ‚Äî the model gets cut off mid-thought.

In [None]:
prompt = "Tell me a short story about a brave knight."

for max_tok in [10, 30, 100]:
    print(f"\n{'=' * 60}")
    print(f"  num_predict = {max_tok}")
    print(f"{'=' * 60}")
    _ = generate(prompt, num_predict=max_tok)

print("\nüí° Notice how low limits produce incomplete responses!")

---

## 5. Frequency Penalty ‚Äî Reducing Repetition

Frequency Penalty penalizes tokens **proportionally to how many times** they've already appeared in the output. The more a word repeats, the harder it gets penalized.

```
Penalty = 0.0  ‚Üí  No penalty (default)
Penalty > 0    ‚Üí  Discourages repetition (higher = stronger)
Penalty < 0    ‚Üí  Encourages repetition (rarely useful)
```

In Ollama, this maps to the `repeat_penalty` parameter (default 1.1; values > 1.0 penalize repetition).

### Experiment 5A: Repetition With and Without Penalty

In [None]:
# A prompt that tends to cause repetitive output
prompt = "Write the word 'hello' in 10 different creative ways."

configs = [
    {"label": "üîÅ repeat_penalty = 1.0 (No penalty)",       "repeat_penalty": 1.0},
    {"label": "‚öñÔ∏è repeat_penalty = 1.1 (Default / Mild)",   "repeat_penalty": 1.1},
    {"label": "üö´ repeat_penalty = 1.5 (Strong penalty)",   "repeat_penalty": 1.5},
]

_ = compare(prompt, configs)

### Experiment 5B: Frequency Penalty on Longer Text

Repetition is more visible in longer outputs. Let's test with a paragraph-length prompt.

In [None]:
prompt = "Write a paragraph about the importance of reading books. Aim for about 100 words."

for penalty in [1.0, 1.3]:
    print(f"\n{'=' * 60}")
    print(f"  repeat_penalty = {penalty}")
    print(f"{'=' * 60}")
    result = generate(prompt, repeat_penalty=penalty, num_predict=200)

    # Count word frequency to show repetition
    words = result.lower().split()
    word_counts = {}
    for w in words:
        w_clean = w.strip('.,!?;:')
        if len(w_clean) > 3:  # skip short words
            word_counts[w_clean] = word_counts.get(w_clean, 0) + 1
    repeated = {w: c for w, c in word_counts.items() if c >= 3}
    if repeated:
        print(f"  üìä Words repeated 3+ times: {repeated}")
    else:
        print(f"  üìä No words repeated 3+ times ‚Äî good variety!")

---

## 6. Presence Penalty ‚Äî Encouraging Topic Diversity

Unlike Frequency Penalty (which scales with count), Presence Penalty applies a **flat penalty** to any token that has appeared **at least once**. It doesn't matter if it appeared 1 time or 50 ‚Äî the penalty is the same.

```
Frequency Penalty:  "the" appeared 5x ‚Üí penalized 5√ó as much
Presence  Penalty:  "the" appeared 5x ‚Üí same penalty as if it appeared 1x
```

This encourages the model to bring in **new topics and vocabulary** rather than just avoiding repetition.

> **Note:** In Ollama, `repeat_penalty` combines both frequency and presence penalty effects. We simulate the distinction below.

### Experiment 6A: Presence Penalty Effect on Vocabulary Diversity

In [16]:
prompt = "List 10 different animals. Just the names, one per line."

configs = [
    {"label": "repeat_penalty = 1.0 (No penalty)",    "repeat_penalty": 1.0, "temperature": 0.7},
    {"label": "repeat_penalty = 1.2 (Moderate)",      "repeat_penalty": 1.2, "temperature": 0.7},
    {"label": "repeat_penalty = 1.8 (Aggressive)",    "repeat_penalty": 1.8, "temperature": 0.7},
]

results = compare(prompt, configs)

# Analyze unique words in each
print(f"\n{'=' * 60}")
print("VOCABULARY DIVERSITY ANALYSIS")
print(f"{'=' * 60}")
for label, text in results.items():
    words = set(text.lower().split())
    print(f"  {label[:40]:40s} ‚Üí {len(words)} unique words")


  repeat_penalty = 1.0 (No penalty)
  Parameters: repeat_penalty=1.0, temperature=0.7


1. Elephant
2. Giraffe
3. Monkey
4. Lion
5. Tiger
6. Penguin
7. Dolphin
8. Fox
9. Kangaroo
10. Octopus


‚è±Ô∏è 2.38s | 102 chars

  repeat_penalty = 1.2 (Moderate)
  Parameters: repeat_penalty=1.2, temperature=0.7


Elephant  
Lion  
Tiger  
Bear  
Monkey  
Penguin  
Kangaroo  
Giraffe  
Eagle  
Crocodile


‚è±Ô∏è 2.05s | 90 chars

  repeat_penalty = 1.8 (Aggressive)
  Parameters: repeat_penalty=1.8, temperature=0.7



KeyboardInterrupt



### Frequency vs. Presence Penalty ‚Äî Comparison

| Aspect | Frequency Penalty | Presence Penalty |
|--------|-------------------|------------------|
| Scales with count? | **Yes** ‚Äî more repetitions = more penalty | **No** ‚Äî flat penalty after first use |
| Best for | Reducing word-level repetition | Encouraging topic diversity |
| Use case | Preventing "the the the..." | Making model explore new ideas |

---

## 7. Stop Sequences ‚Äî Halting Generation

Stop sequences are strings that **immediately end** the model's generation when encountered. The model stops *before* including the stop string in the output.

Common uses:
- Stop at a newline (`\n`) for single-line answers
- Stop at a delimiter (`---`, `END`) for structured extraction
- Stop at a role marker (`User:`) to prevent the model from simulating conversation

### Experiment 7A: Stopping at a Newline (Single-Line Answers)

In [None]:
prompt = "Name a famous scientist and describe their contribution."

print("=" * 60)
print("WITHOUT stop sequence")
print("=" * 60)
_ = generate(prompt)

print("\n" + "=" * 60)
print("WITH stop=['.'] ‚Äî stops at first period")
print("=" * 60)
_ = generate(prompt, stop=["."])

### Experiment 7B: Stop Sequences for Structured Output

In [None]:
prompt = """Extract the person's name from the text below.

Text: "Dr. Sarah Chen published her findings on climate change last Tuesday."

Name:"""

print("=" * 60)
print("WITH stop=['\\n'] ‚Äî stops after extracting the name")
print("=" * 60)
_ = generate(prompt, stop=["\n"], temperature=0.0)

### Experiment 7C: Stop Sequences to Prevent Role-Playing

In [None]:
prompt = """Answer the user's question in one sentence.

User: What is gravity?
Assistant:"""

print("=" * 60)
print("WITHOUT stop ‚Äî model might continue as 'User:' and 'Assistant:'")
print("=" * 60)
_ = generate(prompt, num_predict=200)

print("\n" + "=" * 60)
print("WITH stop=['User:', '\\n\\n'] ‚Äî halts after one response")
print("=" * 60)
_ = generate(prompt, stop=["User:", "\n\n"], num_predict=200)

---

## 8. Combining Parameters ‚Äî Real-World Recipes

In practice, you'll combine multiple parameters together. Here are some common "recipes":

| Use Case | Temperature | Top-P | Top-K | Repeat Penalty | Max Tokens |
|----------|-------------|-------|-------|----------------|------------|
| Factual Q&A | 0.0 | 1.0 | 1 | 1.0 | 100-200 |
| Creative Writing | 0.9-1.2 | 0.9 | 50 | 1.2 | 500+ |
| Code Generation | 0.0-0.2 | 0.95 | 40 | 1.1 | 500 |
| Brainstorming | 1.0 | 0.95 | 100 | 1.3 | 300 |
| Data Extraction | 0.0 | 1.0 | 1 | 1.0 | 100 |

### Experiment 8A: Recipe Comparison

In [None]:
prompt = "Suggest 3 startup ideas related to artificial intelligence."

configs = [
    {
        "label": "üìã Conservative (Factual style)",
        "temperature": 0.1, "top_p": 1.0, "top_k": 1, "repeat_penalty": 1.0, "num_predict": 200
    },
    {
        "label": "‚öñÔ∏è Balanced (General purpose)",
        "temperature": 0.7, "top_p": 0.9, "top_k": 40, "repeat_penalty": 1.1, "num_predict": 200
    },
    {
        "label": "üöÄ Creative (Brainstorming)",
        "temperature": 1.1, "top_p": 0.95, "top_k": 100, "repeat_penalty": 1.3, "num_predict": 200
    }
]

_ = compare(prompt, configs)

### Experiment 8B: Code Generation Recipe

In [None]:
prompt = "Write a Python function that checks if a string is a palindrome."

system = "You are a Python developer. Write clean, well-commented code. Only output the code, nothing else."

configs = [
    {
        "label": "üéØ Precise Code (temp=0, top_k=1)",
        "temperature": 0.0, "top_k": 1, "num_predict": 300
    },
    {
        "label": "üé® Creative Code (temp=0.8, top_k=50)",
        "temperature": 0.8, "top_k": 50, "num_predict": 300
    },
]

_ = compare(prompt, configs, system=system)

---

## 9. Sandbox ‚Äî Try It Yourself!

Experiment with any combination of parameters below.

In [None]:
# ============================================================
#  SANDBOX - Tweak these values and re-run!
# ============================================================

my_prompt     = "Describe the future of space travel in 3 sentences."
my_system     = "You are a futurist and science communicator."

my_params = {
    "temperature":    0.7,    # 0.0 to 2.0
    "top_p":          0.9,    # 0.0 to 1.0
    "top_k":          40,     # 1 to 100+
    "num_predict":    200,    # max tokens to generate
    "repeat_penalty": 1.1,    # 1.0 = off, higher = less repetition
    # "stop":         ["."],  # uncomment to stop at first period
}

# ============================================================

print("YOUR CUSTOM EXPERIMENT")
print("=" * 60)
params_str = '\n'.join(f"  {k:20s} = {v}" for k, v in my_params.items())
print(params_str)
print("=" * 60)
_ = generate(my_prompt, system=my_system, **my_params)

---

## Key Takeaways

| Parameter | What It Does | Typical Range | When to Adjust |
|-----------|-------------|---------------|----------------|
| **Temperature** | Controls randomness | 0.0 ‚Äì 1.5 | Lower for facts, higher for creativity |
| **Top-P** | Dynamic probability cutoff | 0.1 ‚Äì 1.0 | Use ~0.9 for general; lower for precision |
| **Top-K** | Fixed candidate pool size | 1 ‚Äì 100 | 1 for greedy; 40-50 for balanced |
| **Max Tokens** | Hard output length limit | 10 ‚Äì 4096 | Match to your expected output length |
| **Frequency Penalty** | Penalizes repeated tokens proportionally | 1.0 ‚Äì 1.5 | Increase for less repetition |
| **Presence Penalty** | Flat penalty on any used token | 1.0 ‚Äì 1.5 | Increase for broader vocabulary |
| **Stop Sequences** | Halts generation at specific strings | N/A | Use for structured/single-line output |

### Rules of Thumb

1. **Start with defaults** ‚Äî temperature=0.7, top_p=0.9, top_k=40
2. **Adjust one parameter at a time** ‚Äî so you can see what each one does
3. **Temperature and Top-P overlap** ‚Äî usually tune one or the other, not both aggressively
4. **Low temperature + Top-K=1** ‚Äî effectively deterministic (greedy decoding)
5. **Stop sequences are underused** ‚Äî they're great for structured extraction tasks