# üß± Lab: LLM Playground

**Module 2: LLM Core Concepts** | **Duration: ~1 hour** | **Type: Wall Lab**

---

## Learning Objectives

By the end of this lab, you will be able to:

1. **Understand** how tokenization works with different models
2. **Experiment** with temperature and its effect on outputs
3. **Compare** Top-K and Top-P sampling strategies
4. **Implement** streaming responses for real-time output
5. **Analyze** how different models respond to the same prompt

## Concepts Covered

| Concept | Section |
|---------|---------|
| Tokenization | 2 |
| Token IDs | 2 |
| Temperature | 3 |
| Top-K Sampling | 4 |
| Top-P Sampling | 4 |
| Stop Sequences | 5 |
| Streaming | 6 |
| Model Comparison | 7 |

## Prerequisites

- OpenAI API key
- (Optional) Anthropic API key for model comparison

## 1. Setup (~5 min)

In [24]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import tiktoken
import time
from IPython.display import Markdown, display

# Helper function to render LLM output as formatted markdown
def md(text):
    """Display text as rendered markdown."""
    display(Markdown(text))

# Load environment variables
load_dotenv()

# Initialize OpenAI client
client = OpenAI()

# Optional: Anthropic client
try:
    import anthropic
    anthropic_client = anthropic.Anthropic()
    HAS_ANTHROPIC = True
    print("‚úì Anthropic client initialized")
except:
    HAS_ANTHROPIC = False
    print("‚úó Anthropic not available (optional)")

print("‚úì OpenAI client initialized")
print("‚úì Markdown helper md() ready")

‚úì Anthropic client initialized
‚úì OpenAI client initialized
‚úì Markdown helper md() ready


## 2. Tokenization Explorer (~10 min)

Tokenization converts text into discrete units (tokens) that models can process. Different models use different tokenizers with different vocabularies.

**Key insight**: 1 token ‚âà 4 characters or ~0.75 words in English

In [4]:
# Get tokenizer for GPT-4o
enc = tiktoken.encoding_for_model("gpt-4o")

def explore_tokens(text):
    """Visualize how text is tokenized."""
    tokens = enc.encode(text)
    print(f"Text: '{text}'")
    print(f"Token count: {len(tokens)}")
    print(f"Tokens: {tokens}")
    print(f"\nToken breakdown:")
    for token_id in tokens:
        token_text = enc.decode([token_id])
        print(f"  {token_id:6d} ‚Üí '{token_text}'")
    return tokens

# Explore different types of text
print("=" * 50)
explore_tokens("Hello, world!")

print("\n" + "=" * 50)
explore_tokens("The transformer architecture revolutionized NLP")

print("\n" + "=" * 50)
explore_tokens("def hello():\n    print('hi')")  # Code tokenization

Text: 'Hello, world!'
Token count: 4
Tokens: [13225, 11, 2375, 0]

Token breakdown:
   13225 ‚Üí 'Hello'
      11 ‚Üí ','
    2375 ‚Üí ' world'
       0 ‚Üí '!'

Text: 'The transformer architecture revolutionized NLP'
Token count: 6
Tokens: [976, 59595, 24022, 25284, 2110, 161231]

Token breakdown:
     976 ‚Üí 'The'
   59595 ‚Üí ' transformer'
   24022 ‚Üí ' architecture'
   25284 ‚Üí ' revolution'
    2110 ‚Üí 'ized'
  161231 ‚Üí ' NLP'

Text: 'def hello():
    print('hi')'
Token count: 8
Tokens: [1314, 40617, 8595, 271, 2123, 706, 3686, 1542]

Token breakdown:
    1314 ‚Üí 'def'
   40617 ‚Üí ' hello'
    8595 ‚Üí '():
'
     271 ‚Üí '   '
    2123 ‚Üí ' print'
     706 ‚Üí '(''
    3686 ‚Üí 'hi'
    1542 ‚Üí '')'


[1314, 40617, 8595, 271, 2123, 706, 3686, 1542]

## 3. Temperature Effects (~10 min)

Temperature controls the randomness of model outputs by scaling the logits before softmax:
- **Low temperature (0-0.3)**: More deterministic, focused outputs
- **Medium temperature (0.5-0.7)**: Balanced creativity and coherence
- **High temperature (0.8-1.5)**: More creative, varied outputs

In [25]:
def test_temperature(prompt, temperatures=[0, 0.5, 1.0, 1.5]):
    """Compare outputs at different temperatures."""
    md(f"**Prompt:** *{prompt}*\n\n---")
    
    for temp in temperatures:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt + "\n\nRespond in markdown format."}],
            temperature=temp,
            max_tokens=100
        )
        md(f"### üå°Ô∏è Temperature = {temp}\n\n{response.choices[0].message.content}")

# Test with creative prompt
test_temperature("Write a one-sentence story about a robot.", temperatures=[0, 0.7, 1.2])

**Prompt:** *Write a one-sentence story about a robot.*

---

### üå°Ô∏è Temperature = 0

In a world where emotions were forbidden, a lonely robot discovered an old book of poetry and, for the first time, felt the warmth of longing in its metallic heart.

### üå°Ô∏è Temperature = 0.7

In a world where emotions were forbidden, a forgotten robot discovered an old book of poetry and, for the first time, felt the warmth of hope blooming in its metallic heart.

### üå°Ô∏è Temperature = 1.2

In a world where emotions were considered outdated, a forgotten robot in a dusty corner of the museum reactivated to the sound of laughter, awakening an ancient curiosity that led it on a quest to understand what it truly meant to feel.

## 4. Sampling Strategies (~15 min)

### Top-P (Nucleus Sampling)
Selects from the smallest set of tokens whose cumulative probability exceeds P.

### Top-K Sampling
Selects from only the K most likely tokens (not directly supported in OpenAI API but important to understand).

In [23]:
def test_top_p(prompt, top_p_values=[0.1, 0.5, 0.9, 1.0]):
    """Compare outputs at different top_p values."""
    md(f"**Prompt:** *{prompt}*\n\n---")
    
    for top_p in top_p_values:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt + "\n\nRespond in markdown format."}],
            temperature=1.0,  # Use temperature=1 to see top_p effect
            top_p=top_p,
            max_tokens=100
        )
        md(f"### üìä Top-P = {top_p}\n\n{response.choices[0].message.content}")

# Test top_p
test_top_p("List 3 unique hobbies:", top_p_values=[0.1, 0.5, 0.95])

Prompt: 'List 3 unique hobbies:'


üìä Top-P = 0.1
----------------------------------------
Sure! Here are three unique hobbies:

1. **Geocaching**: This is a real-world outdoor treasure hunting game where participants use GPS devices or mobile apps to hide and seek containers, called "geocaches" or "caches," at specific locations marked by coordinates.

2. **Soap Making**: This creative hobby involves crafting your own soap from scratch using various oils, lye, and additives. It allows for customization in scents, colors, and shapes, making it both a

üìä Top-P = 0.5
----------------------------------------
Sure! Here are three unique hobbies:

1. **Geocaching**: This is a real-world outdoor treasure hunting game where participants use GPS devices or mobile apps to hide and seek containers, called "geocaches" or "caches," at specific locations marked by coordinates.

2. **Soap Making**: This creative hobby involves the process of making soap from scratch using oils, lye, and various

## 5. Stop Sequences (~5 min)

Stop sequences tell the model when to stop generating. Useful for controlling output format.

In [13]:
# Stop sequences example
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "List 5 fruits, one per line:"}],
    stop=["\n4."],  # Stop before the 4th item
    max_tokens=100
)
print("With stop sequence '\\n4.':")
print(response.choices[0].message.content)

print("\n" + "=" * 50)

# Without stop sequence
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "List 5 fruits, one per line:"}],
    max_tokens=100
)
print("\nWithout stop sequence:")
print(response.choices[0].message.content)

With stop sequence '\n4.':
1. Apple  
2. Banana  
3. Cherry  


Without stop sequence:
1. Apple  
2. Banana  
3. Orange  
4. Mango  
5. Grapes  


## 6. Streaming Responses (~10 min)

Streaming allows you to receive tokens as they're generated, improving perceived latency.

In [32]:
from IPython.display import clear_output

def stream_response(prompt):
    """Demonstrate streaming response with live markdown rendering."""
    full_response = ""
    
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt + "\n\nRespond in markdown format."}],
        stream=True,
        max_tokens=150
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            full_response += content
            # Clear and re-render markdown on each chunk for live streaming effect
            clear_output(wait=True)
            md(f"**Prompt:** *{prompt}*\n\n---\n\n### üìù Streaming Response\n\n{full_response}‚ñå")
    

# Test streaming
stream_response("Explain what makes a good AI engineer in 2-3 sentences.")

**Prompt:** *Explain what makes a good AI engineer in 2-3 sentences.*

---

### üìù Streaming Response

A good AI engineer possesses a strong foundation in mathematics and programming, allowing them to design and implement complex algorithms efficiently. Additionally, they should be adept at problem-solving and have a deep understanding of machine learning frameworks and data structures, enabling them to adapt solutions to diverse challenges in the AI landscape. Effective communication skills are also crucial, as they often need to collaborate with cross-functional teams and explain technical concepts to non-technical stakeholders.‚ñå

## 7. Model Comparison (~10 min)

Compare how different models respond to the same prompt. This helps you understand model characteristics and choose the right model for your use case.

In [26]:
def compare_models(prompt):
    """Compare responses from different models."""
    md(f"**Prompt:** *{prompt}*\n\n---")
    
    # Add markdown instruction to prompt
    full_prompt = prompt + "\n\nRespond in markdown format."
    
    # OpenAI GPT-4o-mini
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": full_prompt}],
        max_tokens=150
    )
    md(f"### ü§ñ GPT-4o-mini\n\n{response.choices[0].message.content}")
    
    # OpenAI GPT-4o
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": full_prompt}],
        max_tokens=150
    )
    md(f"### ü§ñ GPT-4o\n\n{response.choices[0].message.content}")
    
    # Anthropic Claude (if available)
    if HAS_ANTHROPIC:
        try:
            # Try latest Claude model first, fall back to alternatives
            claude_models = ["claude-sonnet-4-20250514", "claude-3-5-sonnet-latest", "claude-3-sonnet-20240229"]
            for model_name in claude_models:
                try:
                    response = anthropic_client.messages.create(
                        model=model_name,
                        max_tokens=150,
                        messages=[{"role": "user", "content": full_prompt}]
                    )
                    md(f"### ü§ñ Claude Sonnet\n\n{response.content[0].text}")
                    break
                except Exception:
                    continue
            else:
                md("### ü§ñ Claude Sonnet\n\n‚ö†Ô∏è No compatible Claude model found")
        except Exception as e:
            md(f"### ü§ñ Claude Sonnet\n\n‚ö†Ô∏è Claude API error: {e}")
    else:
        md("### ü§ñ Claude Sonnet\n\n‚ö†Ô∏è Anthropic not available - skipping Claude comparison")

# Compare models on a reasoning task
compare_models("What's the most important skill for an AI engineer to develop in 2026? Answer in 2 sentences.")

**Prompt:** *What's the most important skill for an AI engineer to develop in 2026? Answer in 2 sentences.*

---

### ü§ñ GPT-4o-mini

In 2026, the most important skill for an AI engineer will be the ability to integrate ethical considerations into AI system development, ensuring that models are fair, transparent, and accountable. Additionally, proficiency in advanced machine learning techniques, particularly in areas like explainability and bias reduction, will be crucial to navigate the evolving landscape of AI applications.

### ü§ñ GPT-4o

In 2026, the most important skill for an AI engineer to develop will be advanced proficiency in ethical AI design, ensuring responsible and fair implementation of AI technologies. Additionally, staying adept in the latest machine learning frameworks, especially those geared towards edge computing and quantum computing, will be crucial for innovation and efficiency.

### ü§ñ Claude Sonnet

**Adaptability and continuous learning** will be the most critical skill for AI engineers in 2026, as the field is evolving at an unprecedented pace with new architectures, frameworks, and paradigms emerging constantly. The ability to quickly understand, evaluate, and implement novel AI techniques while maintaining ethical considerations and system reliability will separate exceptional engineers from those who fall behind the rapidly advancing curve.

## üéØ Summary

In this lab, you explored:

1. **Tokenization** - How text gets broken into tokens and why it matters for cost and context limits
2. **Temperature** - Controlling randomness and creativity in outputs
3. **Top-P Sampling** - Another way to control output diversity
4. **Stop Sequences** - Precise control over when generation stops
5. **Streaming** - Real-time token delivery for better UX
6. **Model Comparison** - Understanding different model characteristics

### Key Takeaways

- Use **low temperature** (0-0.3) for factual, deterministic tasks
- Use **higher temperature** (0.7-1.2) for creative tasks
- **Streaming** significantly improves perceived latency
- Different models have different strengths - always test your specific use case!