# üß© Mini-Lab: Temperature Effects

**Module 2: LLM Core Concepts** | **Duration: ~15 min** | **Type: Mini-Lab**

---

## Learning Objectives

By the end of this mini-lab, you will be able to:

1. **Understand** what temperature does mathematically to token probabilities
2. **Observe** how temperature affects output creativity and consistency
3. **Choose** appropriate temperature settings for different use cases
4. **Identify** when to use low vs high temperature

## Target Concepts

| Concept | Description |
|---------|-------------|
| Temperature | Scaling factor applied to logits before softmax, controlling randomness |

## 1. Setup

In [1]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import numpy as np
from IPython.display import Markdown, display

load_dotenv()
client = OpenAI()

def md(text):
    display(Markdown(text))

print("‚úì Setup complete")

‚úì Setup complete


## 2. How Temperature Works (The Math)

Temperature modifies the probability distribution over tokens:

$$P(token_i) = \frac{e^{logit_i / T}}{\sum_j e^{logit_j / T}}$$

Where **T** is temperature:
- **T ‚Üí 0**: Distribution becomes peaked (deterministic)
- **T = 1**: Original distribution
- **T > 1**: Distribution becomes flatter (more random)

In [2]:
def visualize_temperature_effect():
    """Visualize how temperature affects probability distributions."""
    
    # Simulated logits for 5 tokens (before softmax)
    logits = np.array([2.0, 1.5, 1.0, 0.5, 0.0])
    token_names = ["best", "good", "fine", "okay", "bad"]
    
    temperatures = [0.1, 0.5, 1.0, 1.5, 2.0]
    
    print("\nüå°Ô∏è Temperature Effect on Token Probabilities")
    print("="*70)
    print(f"\nOriginal logits: {dict(zip(token_names, logits))}\n")
    
    for temp in temperatures:
        # Apply temperature scaling
        scaled_logits = logits / temp
        # Softmax
        probs = np.exp(scaled_logits) / np.sum(np.exp(scaled_logits))
        
        print(f"\nT = {temp}:")
        for name, prob in zip(token_names, probs):
            bar = "‚ñà" * int(prob * 50)
            print(f"  {name:5s}: {prob:5.1%} {bar}")
        
        entropy = -np.sum(probs * np.log(probs + 1e-10))
        print(f"  Entropy: {entropy:.3f} (higher = more random)")

visualize_temperature_effect()


üå°Ô∏è Temperature Effect on Token Probabilities

Original logits: {'best': np.float64(2.0), 'good': np.float64(1.5), 'fine': np.float64(1.0), 'okay': np.float64(0.5), 'bad': np.float64(0.0)}


T = 0.1:
  best : 99.3% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  good :  0.7% 
  fine :  0.0% 
  okay :  0.0% 
  bad  :  0.0% 
  Entropy: 0.041 (higher = more random)

T = 0.5:
  best : 63.6% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  good : 23.4% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  fine :  8.6% ‚ñà‚ñà‚ñà‚ñà
  okay :  3.2% ‚ñà
  bad  :  1.2% 
  Entropy: 1.000 (higher = more random)

T = 1.0:
  best : 42.9% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  good : 26.0% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  fine : 15.8% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
  okay :  9.6% ‚ñà‚ñà‚ñà‚ñà
  bad  :  5.8% ‚ñà‚ñà
  Entropy: 1.394 (higher = mo

## 3. Temperature in Practice

Let's see how temperature affects real model outputs:

In [None]:
def compare_temperatures(prompt, temperatures=[0, 0.3, 0.7, 1.0, 1.5], runs=1):
    """Compare outputs at different temperatures."""
    
    md(f"### üìù Prompt: *{prompt}*\n\n---")
    
    for temp in temperatures:
        outputs = []
        
        for _ in range(runs):
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "user", "content": prompt}],
                temperature=temp,
                max_tokens=50
            )
            outputs.append(response.choices[0].message.content.strip())
        
        if runs == 1:
            md(f"**üå°Ô∏è T = {temp}:**\n> {outputs[0]}\n")
        else:
            md(f"**üå°Ô∏è T = {temp}:**")
            for i, out in enumerate(outputs, 1):
                md(f"> Run {i}: {out}")
            md("")

# Test 1: Factual question (low temperature preferred)
compare_temperatures(
    "What is an apple? Answer in one short sentence.",
    temperatures=[0, 0.5, 1.0, 1.5, 2],
    runs=3
)

### üìù Prompt: *What is an apple? Answer in one short sentence.*

---

**üå°Ô∏è T = 0:**

> Run 1: An apple is a round fruit produced by the apple tree, typically with a sweet or tart flavor and a crisp texture.

> Run 2: An apple is a round fruit produced by the apple tree, typically red, green, or yellow, and known for its sweet or tart flavor.

> Run 3: An apple is a round fruit produced by the apple tree, typically red, green, or yellow, and known for its sweet or tart flavor.



**üå°Ô∏è T = 0.5:**

> Run 1: An apple is a round fruit produced by the apple tree, typically sweet or tart, and comes in various colors like red, green, and yellow.

> Run 2: An apple is a round fruit produced by the apple tree, typically red, green, or yellow, and known for its sweet or tart flavor.

> Run 3: An apple is a round fruit produced by the apple tree, typically with red, green, or yellow skin and a sweet or tart flavor.



**üå°Ô∏è T = 1.0:**

> Run 1: An apple is a round fruit typically with red, green, or yellow skin, known for its sweet or tart flavor and crunchy texture.

> Run 2: An apple is a round fruit from the apple tree, typically red, green, or yellow, known for its sweet or tart flavor.

> Run 3: An apple is a round, typically red, green, or yellow fruit produced by the apple tree, known for its sweet or tart flavor.



**üå°Ô∏è T = 1.5:**

> Run 1: An apple is a round fruit typically red, green, or yellow, produced by the apple tree, known scientifically as Malus domestica.

> Run 2: An apple is a round fruit from the Malus domestica tree, typically red, green, or yellow, and known for its sweet or tart flavor.

> Run 3: An apple is a round fruit produced by the apple tree, typically with smooth skin and a juicy interior.



**üå°Ô∏è T = 2:**

> Run 1: An apple is a juicy and crispy fruit typically characterized by its round shape and varying colors, commonly red, green, or yellow.

> Run 2: An apple is a round, slight atheinty rejuvenesonfrachadh Ï±Ñenuh ÿßÿ≥ÿπÿßÿ± aaboÊâìÊ≥ï current markaana Door namun gefallen jalo Îú° ‡Æµ‡Æ∞‡ØÅ‡ÆÆ‡Øç acceptable ·Éö liz◊ê◊®◊ô◊ö scales liczely ◊õ÷º ÿßŸÖŸÑŸá mannersÏó¥ Ï°∞ darah bahkan comb ujj€íuilli pjŒ≠œÅŒ±⁄ô

> Run 3: An apple is a –ø–∞–¥–∞ live scar fostering coherent mutually unwavering cooperation demanda lake Yoga ÿ™ÿπÿßŸÑŸâ culturales measurable promote d√©fcover–Ω–∏–∑ ‡∞ö‡±Ü‡∞Ç‡∞¶ –æ—Ç–æ–±—Ä–∞–∂€ÅŸàÿ± equip situations disputeayaa —Ç–µ–º–ø–µ—Ä–∞—Ç—É—Ä–∞ranavlja inpatient surpass ÌïôÌùπ herr kh√°m disease}}> Bedeutung Thomas ruthless conversaci√≥n Maharashtra upang infiltration_loss –Ω”ô—Ç–∏–∂



: 

In [4]:
# Test 2: Creative task (higher temperature may help)
compare_temperatures(
    "Write a creative one-sentence story about a time-traveling cat.",
    temperatures=[0, 0.5, 1.0, 1.5],
    runs=3
)

### üìù Prompt: *Write a creative one-sentence story about a time-traveling cat.*

---

**üå°Ô∏è T = 0:**

> Run 1: Whiskers the cat leaped through the shimmering portal of time, landing in ancient Egypt just in time to steal the Pharaoh's heart‚Äîand his favorite fish dinner.

> Run 2: Whiskers the cat leaped through the shimmering portal of time, landing in ancient Egypt just in time to steal the heart of a pharaoh and the last slice of his royal fish dinner.

> Run 3: Whiskers the cat leaped through the shimmering portal of time, landing in ancient Egypt just in time to bat at the pharaoh's golden scepter, inadvertently altering history and becoming the revered deity of mischief.



**üå°Ô∏è T = 0.5:**

> Run 1: Whiskers the cat, with a flick of his tail and a purr of determination, leaped through the shimmering portal of time, landing in ancient Egypt just in time to witness the construction of the Great Pyramid‚Äîand promptly claimed the pharaoh's

> Run 2: Whiskers the time-traveling cat leaped through the shimmering portal, only to find himself in ancient Egypt, where he promptly became the revered feline advisor to Pharaoh, teaching him the art of napping in the sun.

> Run 3: In a whirlwind of shimmering fur and curious whiskers, Whiskers the time-traveling cat leapt through the ages, leaving paw prints on the moon, chasing after ancient pharaohs, and napping in the lap of a bewildered



**üå°Ô∏è T = 1.0:**

> Run 1: As the clock struck midnight, Whiskers leaped into the shimmering portal of time, leaving behind a trail of enchanted yarn and whiskered legends that would echo through centuries of feline folklore.

> Run 2: In a swirl of shimmering stardust, Whiskers the time-traveling cat leapt through the ages, polishing the Great Pyramid with his paws one day and chasing laser mice across the neon skyline of 3023 the next, all while

> Run 3: With a flick of its shimmering tail, the time-traveling cat whisked itself back to the roaring twenties, where it confidently strolled into a jazz club, instantly becoming the star of the show with its soulful meows.



**üå°Ô∏è T = 1.5:**

> Run 1: Whiskers leapt through the shimmering portal, emerging in ancient Egypt as a pivotal player in the construction of the Great Sphinx ‚Äî after all, even a time-traveling cat must ensure her lineage of catnip kings survives intact.

> Run 2: Whiskers the feline philosopher funneled through time with each flick of his tail, leaving trails of cosmic yarn and starting revolutions in ancient Rome while simultaneously napping atop Cleopatra's finest gown.

> Run 3: Whiskers the time-traveling cat landed with a soft thud in ancient Egypt, felinefriend to pharaohs, yet unmatched in mischief, he promptly knocked over the royal ankh, altering the flow of history (and dinner time



In [5]:
# Test 3: List generation
compare_temperatures(
    "Name 3 unusual hobbies.",
    temperatures=[0, 0.5, 1.0, 1.5],
    runs=3
)

### üìù Prompt: *Name 3 unusual hobbies.*

---

**üå°Ô∏è T = 0:**

> Run 1: Here are three unusual hobbies:

1. **Extreme Ironing**: This quirky hobby combines the mundane task of ironing clothes with extreme sports. Enthusiasts take their ironing boards to unusual and often dangerous locations, such as mountain tops, underwater, or

> Run 2: Here are three unusual hobbies:

1. **Extreme Ironing**: This quirky hobby combines the mundane task of ironing clothes with extreme sports. Enthusiasts take their ironing boards to unusual and often dangerous locations, such as mountain tops, underwater, or

> Run 3: Here are three unusual hobbies:

1. **Extreme Ironing**: This quirky hobby combines the mundane task of ironing clothes with extreme sports. Enthusiasts take their ironing boards to unusual and often dangerous locations, such as mountain tops, underwater, or



**üå°Ô∏è T = 0.5:**

> Run 1: Here are three unusual hobbies you might find interesting:

1. **Extreme Ironing**: This quirky hobby combines the mundane task of ironing clothes with extreme sports. Enthusiasts take their ironing boards to unusual and often perilous locations, such as mountain

> Run 2: Sure! Here are three unusual hobbies that some people enjoy:

1. **Extreme Ironing**: This quirky hobby combines the thrill of extreme sports with the mundane task of ironing. Enthusiasts take their ironing boards to unusual and often dangerous locations,

> Run 3: Sure! Here are three unusual hobbies:

1. **Extreme Ironing**: This quirky hobby combines the mundane task of ironing clothes with extreme sports. Enthusiasts take their ironing boards to remote or unusual locations, such as mountain tops or underwater,



**üå°Ô∏è T = 1.0:**

> Run 1: Here are three unusual hobbies:

1. **Extreme Ironing**: This hobby combines the mundane task of ironing clothes with extreme outdoor activities. Enthusiasts take their ironing boards to remote or challenging locations‚Äîlike mountain tops, underwater, or while sky

> Run 2: Here are three unusual hobbies:

1. **Extreme Ironing**: This quirky hobby combines the mundane task of ironing clothes with adventure activities. Enthusiasts take portable ironing boards and irons to unusual locations, such as mountain tops, underwater, or even

> Run 3: Sure! Here are three unusual hobbies that some people enjoy:

1. **Soap Carving**: This involves creating intricate sculptures or designs from bars of soap. It requires precision and creativity, and, unlike many other carving mediums, soap is relatively easy



**üå°Ô∏è T = 1.5:**

> Run 1: Here are three unusual hobbies that some people enjoy:

1. **LARPing (Live Action Role Playing)** - LARPing involves participants acting out their characters in a fictional setting, often with elaborate costumes, props, and storylines. Events may

> Run 2: Sure! Here are three unusual hobbies:

1. **Parkour**: This is the practice of traversing obstacles in intentional and creative ways, often found in urban environments. It's a mix of movement disciplines that encourages agility, strength, and fluidity

> Run 3: Sure! Here are three unusual hobbies:

1. **BeeKeeping**: This hobby involves maintaining and caring for bee colonies, typically in human-made hives. Beekeepers manage the pollination processes and the production of honey, while also contributing to



## 4. Temperature Guidelines by Use Case

| Use Case | Recommended T | Why |
|----------|---------------|-----|
| **Factual Q&A** | 0 - 0.3 | Consistency, accuracy |
| **Code generation** | 0 - 0.3 | Deterministic, correct |
| **Summarization** | 0.3 - 0.5 | Mostly factual, some variation |
| **Conversational** | 0.5 - 0.7 | Natural, varied responses |
| **Creative writing** | 0.7 - 1.0 | Diverse, interesting outputs |
| **Brainstorming** | 0.8 - 1.2 | Maximum variety |
| **Experimental** | 1.2 - 1.5 | Unexpected combinations |

In [6]:
def demonstrate_use_cases():
    """Show appropriate temperature for different tasks."""
    
    use_cases = [
        {
            "name": "Code Generation",
            "prompt": "Write a Python function to calculate factorial.",
            "temp": 0,
            "reason": "Code must be deterministic and correct"
        },
        {
            "name": "Email Summary",
            "prompt": "Summarize this: 'Meeting moved to 3pm tomorrow. Please confirm attendance.'",
            "temp": 0.3,
            "reason": "Mostly factual with slight variation allowed"
        },
        {
            "name": "Creative Marketing",
            "prompt": "Write a catchy tagline for a coffee shop.",
            "temp": 0.9,
            "reason": "Want creative, memorable outputs"
        },
    ]
    
    for case in use_cases:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": case["prompt"]}],
            temperature=case["temp"],
            max_tokens=150
        )
        
        md(f"### üìå {case['name']} (T = {case['temp']})")
        md(f"*Reason: {case['reason']}*\n")
        md(f"**Prompt:** {case['prompt']}\n")
        md(f"**Output:**\n```\n{response.choices[0].message.content}\n```\n\n---")

demonstrate_use_cases()

### üìå Code Generation (T = 0)

*Reason: Code must be deterministic and correct*


**Prompt:** Write a Python function to calculate factorial.


**Output:**
```
Certainly! Here‚Äôs a simple Python function to calculate the factorial of a non-negative integer using recursion:

```python
def factorial(n):
    if n < 0:
        raise ValueError("Factorial is not defined for negative numbers.")
    elif n == 0 or n == 1:
        return 1
    else:
        return n * factorial(n - 1)

# Example usage:
print(factorial(5))  # Output: 120
```

Alternatively, you can also calculate the factorial using an iterative approach:

```python
def factorial(n):
    if n < 0:
        raise ValueError("Factorial is not defined for negative numbers.")
    
    result = 1
    for i in
```

---

### üìå Email Summary (T = 0.3)

*Reason: Mostly factual with slight variation allowed*


**Prompt:** Summarize this: 'Meeting moved to 3pm tomorrow. Please confirm attendance.'


**Output:**
```
The meeting is rescheduled for 3 PM tomorrow, and attendance confirmation is requested.
```

---

### üìå Creative Marketing (T = 0.9)

*Reason: Want creative, memorable outputs*


**Prompt:** Write a catchy tagline for a coffee shop.


**Output:**
```
"Awaken Your Senses, One Sip at a Time!"
```

---

## 5. Temperature and Reproducibility

For deterministic outputs, use temperature=0 and a seed:

In [13]:
def test_reproducibility():
    """Test output reproducibility with seed parameter."""
    
    prompt = "Generate a random name for a fantasy character."
    
    print("\nüé≤ Reproducibility Test")
    print("="*50)
    
    # Without seed (T=0 should still be deterministic)
    print("\n1Ô∏è‚É£ T=0, no seed (3 runs):")
    for i in range(3):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=0,
            max_tokens=20
        )
        print(f"   Run {i+1}: {response.choices[0].message.content.strip()}")
    
    # With seed
    print("\n2Ô∏è‚É£ T=0.7 with seed=42 (3 runs):")
    for i in range(3):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,
            seed=42,
            max_tokens=20
        )
        print(f"   Run {i+1}: {response.choices[0].message.content.strip()}")

    # Without seed
    print("\n2Ô∏è‚É£ T=0.7 without seed (3 runs):")
    for i in range(3):
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,
            max_tokens=20
        )
        print(f"   Run {i+1}: {response.choices[0].message.content.strip()}")

test_reproducibility()


üé≤ Reproducibility Test

1Ô∏è‚É£ T=0, no seed (3 runs):
   Run 1: Elysia Thornshadow
   Run 2: Elysia Thornshadow
   Run 3: Elysia Thornshadow

2Ô∏è‚É£ T=0.7 with seed=42 (3 runs):
   Run 1: Elysia Thornwhisper
   Run 2: Elysia Thornwhisper
   Run 3: Elysia Thornwhisper

2Ô∏è‚É£ T=0.7 without seed (3 runs):
   Run 1: Elowen Thistledown
   Run 2: Thalindra Moonshadow
   Run 3: Elysia Thornweaver


## üéØ Summary

### Key Takeaways

1. **What Temperature Does**
   - Scales logits before softmax
   - T‚Üí0: deterministic (always picks highest probability)
   - T>1: more random (flatter distribution)

2. **Practical Guidelines**
   - **T = 0**: Factual, code, classification
   - **T = 0.3-0.5**: Summarization, Q&A
   - **T = 0.7**: General conversation
   - **T = 1.0+**: Creative tasks

3. **Reproducibility**
   - Use T=0 for deterministic outputs
   - Use seed parameter for reproducible randomness

### Next Steps

- **mini-sampling**: Learn about Top-K and Top-P (complementary to temperature)
- **mini-logprobs**: See the actual probability distributions