# Visualizing LLM Temperature

### Step 1: Install and Import Dependencies

In [1]:
import os
from openai import OpenAI
from IPython.display import Markdown, display
from dotenv import load_dotenv


load_dotenv()

# Initialize the client (Replace 'your-api-key' or use environment variables)
client = OpenAI()


### Testing Function
Let's you toggle the temperature parameter

In [12]:
def get_llm_completion(prompt, temp):
    response = client.chat.completions.create(
        model="gpt-4o",  # Or "gpt-3.5-turbo"
        messages=[{"role": "user", "content": prompt}],
        temperature=temp,
        max_tokens=50,
    )
    return response.choices[0].message.content.strip()

### Run the Experiment
Deterministic (0.0), Balanced (0.7), and Creative (1.5). We will run the "Balanced" and "Creative" prompts twice to see if the model gives different answers (stochasticity). aka randomness or unpredictability.

In [13]:
prompt = "Complete this sentence in a unique way: 'The secret to a long life is...'"

# Dictionary to store our results
results = {
    "Deterministic (0.0)": [
        get_llm_completion(prompt, 0.0),
        get_llm_completion(prompt, 0.0),
    ],
    "Balanced (0.7)": [
        get_llm_completion(prompt, 0.7),
        get_llm_completion(prompt, 0.7),
    ],
    "Creative (1.5)": [
        get_llm_completion(prompt, 1.5),
        get_llm_completion(prompt, 1.5),
    ],
}

### Display and Analyze Results
Format the output to compare how the variety increases as the temperature rises.

In [14]:
for setting, outputs in results.items():
    display(Markdown(f"### {setting}"))
    print(f"Run 1: {outputs[0]}")
    print(f"Run 2: {outputs[1]}")
    print("-" * 30)

### Deterministic (0.0)

Run 1: The secret to a long life is cultivating a garden of gratitude, where each day you plant seeds of kindness, water them with laughter, and bask in the sunshine of meaningful connections.
Run 2: The secret to a long life is cultivating a garden of gratitude, where each day you plant seeds of kindness, water them with laughter, and bask in the sunshine of meaningful connections.
------------------------------


### Balanced (0.7)

Run 1: The secret to a long life is finding joy in the simple moments and nurturing the relationships that fill your heart with love and laughter.
Run 2: The secret to a long life is crafting a tapestry of meaningful relationships, where each thread represents a shared moment, a lesson learned, or a laughter echoed across time.
------------------------------


### Creative (1.5)

Run 1: The secret to a long life is finding joy in everyday moments, cherishing time with loved ones, and staying curious like an ever-blooming flower.
Run 2: finding joy in the small moments, nurturing relationships that make you laugh, and maintaining curiosity about the world around you.
------------------------------


### Interpreting the Ouput
Why did this happen?
When the LLM predicts the next token, it generates a list of "Logits" (raw scores).

At 0.0 (Deterministic): The model effectively performs "Greedy Decoding." It always picks the word with the absolute highest probability. You will notice Run 1 and Run 2 are likely identical.

At 0.7 (Balanced): The model uses "Weighted Sampling." It still favors the likely words but allows for some variety. Run 1 and Run 2 will likely be different but still make perfect sense.

At 1.5 (Creative): The probability distribution is flattened. Even very unlikely words now have a "fighting chance" to be picked. This often results in more poetic, bizarre, or even nonsensical completions.

Quick Guide for your Projects:
0.0 - 0.3: Use for Extraction, Classification, Coding, and Q&A.

0.5 - 0.8: Use for Chatbots, Email generation, and Summarization.

1.0+: Use for Brainstorming, Poetry, and Fiction writing.

# Top-P vs Temperature

### Definte the Top-P Experiment
In this cell, we add a parameter for top_p. Note that when using Top-P, we usually keep Temperature at a neutral 1.0 to see its pure effect.

In [5]:
prompt = "The cat opened the door and discovered"

for top_p in [0.1, 0.5, 0.9, 1.0]:
    response = client.responses.create(
        model="gpt-4o",
        input=prompt,
        max_output_tokens=30,
        temperature=1.0,
        top_p=top_p,
    )

    print(f"\n--- top_p = {top_p} ---")
    print(response.output_text)


--- top_p = 0.1 ---
a hidden room filled with ancient artifacts and mysterious symbols. The air was thick with dust, and a faint glow emanated from a pedestal in the center

--- top_p = 0.5 ---
a hidden room filled with ancient artifacts and mysterious symbols. The air was thick with dust, and a faint glow emanated from a pedestal in the center

--- top_p = 0.9 ---
a hidden room filled with glowing crystals and ancient artifacts. The air was thick with mystery and the soft hum of magic. Each step the cat took illuminated

--- top_p = 1.0 ---
a hidden realm filled with shimmering lights and floating islands. Mystical creatures roamed freely, and a gentle breeze carried the scent of blooming flowers. The


### Understanding the Difference
Temperature is a Rescaler: It changes the relative "volume" of every word. High temperature makes quiet (unlikely) words louder and loud (likely) words quieter.

Top-P is a Truncator: It sorts all words by probability and draws a line once the sum reaches P. Any word below that line is discarded entirely, no matter how "loud" it was.