# Lecture 4: Advanced Prompt Engineering

## Working with Llama 3.2 via Ollama

**Learning Objectives:**
- Understand how inference parameters (temperature) affect model output
- Learn Chain-of-Thought (CoT) prompting techniques
- Explore system prompts and persona-based prompting
- Experience how effective prompting techniques improve model performance

**Note:** We're using Llama 3.2 3B (~2GB), a capable small model that demonstrates how proper prompting techniques can significantly improve results, especially for reasoning tasks like math calculations.


---

## 1. Setup & Initialization


In [None]:
# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh


In [None]:
# Install Python wrapper for Ollama
!pip install ollama


In [None]:
# Start Ollama server in the background
# CRITICAL: Using subprocess.Popen to run ollama serve as a background process
# This prevents Colab from blocking while the server runs
import subprocess
import time

# Start ollama serve in the background
ollama_process = subprocess.Popen(
    ['ollama', 'serve'],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

# Wait for the server to start (5 seconds)
print("‚è≥ Starting Ollama server...")
time.sleep(5)
print("‚úÖ Ollama server should be running in the background")


In [None]:
# Pull the Llama 3.2 3B model
!ollama pull llama3.2:3b


---

## 2. Helper Function

Create a reusable function to query the model with customizable parameters.


In [None]:
import ollama

def query_model(prompt, temperature=0.7, system_prompt=""):
    """
    Query Llama 3.2 3B model via Ollama.
    
    Args:
        prompt (str): The user prompt/question
        temperature (float): Controls randomness (0.0 = deterministic, 1.0+ = creative)
        system_prompt (str): Optional system prompt to set model behavior/persona
    
    Returns:
        str: Model's response
    """
    # Prepare the request
    messages = []
    
    # Add system prompt if provided
    if system_prompt:
        messages.append({
            "role": "system",
            "content": system_prompt
        })
    
    # Add user prompt
    messages.append({
        "role": "user",
        "content": prompt
    })
    
    # Query the model
    response = ollama.chat(
        model='llama3.2:3b',
        messages=messages,
        options={
            'temperature': temperature
        }
    )
    
    return response['message']['content']

print("‚úÖ Helper function 'query_model' created")
print("   Usage: query_model(prompt, temperature=0.7, system_prompt='')")


---

## 3. Demo 1: Inference Parameters - Temperature

### Understanding Temperature

**Temperature** controls the randomness/creativity of model outputs:
- **Low Temperature (0.1-0.3)**: More deterministic, focused, consistent
- **High Temperature (0.7-1.5)**: More creative, diverse, unpredictable

**Task:** Generate a 4-line poem about a lonely robot on Mars with different temperature settings.


In [None]:
# Demo 1: Temperature Effects
prompt = "Write a 4-line poem about a lonely robot on Mars."

print("=" * 70)
print("Low Temp (0.1) - Deterministic")
print("=" * 70)
output_low = query_model(prompt, temperature=0.1)
print(output_low)
print()


In [None]:
print("=" * 70)
print("High Temp (1.0) - Creative")
print("=" * 70)
output_high = query_model(prompt, temperature=1.0)
print(output_high)
print()

print("üí° Notice the difference in creativity and variation between the two outputs!")


---

## 4. Demo 2: Chain-of-Thought (CoT) Prompting

### What is Chain-of-Thought?

**Chain-of-Thought (CoT)** prompting encourages the model to "think step by step" before providing an answer. This is especially important for reasoning tasks like math problems.

**Why it matters:** Breaking down complex problems into steps helps models process information more effectively and produce more accurate results.

**Task:** Solve `123 * 76` with and without CoT prompting.


In [None]:
# Demo 2: Chain-of-Thought - Baseline (No CoT)
prompt_baseline = "What is 123 * 76? Answer with just the number."

print("=" * 70)
print("Baseline (No CoT)")
print("=" * 70)
print(f"Prompt: {prompt_baseline}")
print()
result_baseline = query_model(prompt_baseline, temperature=0.1)
print(f"Model Output: {result_baseline}")
print()
print(f"Expected Answer: 9348")
print()


In [None]:
# Demo 2: Chain-of-Thought - With Step-by-Step Reasoning
prompt_cot = """What is 123 * 76? Show me step by step. 
Multiply 100*76, then 20*76, then 3*76, and add them up."""

print("=" * 70)
print("Chain-of-Thought (CoT)")
print("=" * 70)
print(f"Prompt: {prompt_cot}")
print()
result_cot = query_model(prompt_cot, temperature=0.1)
print(f"Model Output: {result_cot}")
print()
print(f"Expected Answer: 9348")
print()
print("üí° Compare: Does the step-by-step approach help the model get the correct answer?")


---

## 5. Assignment: The "Persona" Challenge

### System Prompts and Role-Playing

**System prompts** allow you to set the model's behavior, tone, and expertise level. This is powerful for:
- Adapting explanations to different audiences
- Role-playing scenarios
- Controlling output style

**Your Task:** Test Llama 3.2's ability to adapt its explanations based on different personas.

**Note:** Llama 3.2 is a capable model that should follow personas well, demonstrating how system prompts influence its responses.


In [None]:
# Assignment: Persona Challenge
# Task: Ask Llama 3.2 to explain Quantum Entanglement with different personas

prompt = "Explain Quantum Entanglement"

# Challenge A: 5-year-old teacher persona
# TODO: Fill in the system_prompt below
system_prompt_a = ""  # You are a specialized 5-year-old teacher. Speak in simple words.

print("=" * 70)
print("Challenge A: 5-Year-Old Teacher Persona")
print("=" * 70)
print(f"System Prompt: {system_prompt_a if system_prompt_a else '(Not filled in yet)'}")
print()
if system_prompt_a:
    result_a = query_model(prompt, temperature=0.7, system_prompt=system_prompt_a)
    print(result_a)
else:
    print("‚ö†Ô∏è Please fill in system_prompt_a above and re-run this cell")
print()


In [None]:
# Challenge B: Nobel Prize Physicist persona
# TODO: Fill in the system_prompt below
system_prompt_b = ""  # You are a Nobel Prize Physicist. Use technical jargon.

print("=" * 70)
print("Challenge B: Nobel Prize Physicist Persona")
print("=" * 70)
print(f"System Prompt: {system_prompt_b if system_prompt_b else '(Not filled in yet)'}")
print()
if system_prompt_b:
    result_b = query_model(prompt, temperature=0.7, system_prompt=system_prompt_b)
    print(result_b)
else:
    print("‚ö†Ô∏è Please fill in system_prompt_b above and re-run this cell")
print()


### Assignment Instructions

1. **Fill in the system prompts** in the cells above:
   - Challenge A: `"You are a specialized 5-year-old teacher. Speak in simple words."`
   - Challenge B: `"You are a Nobel Prize Physicist. Use technical jargon."`

2. **Run both cells** and observe the differences in:
   - Vocabulary complexity
   - Explanation depth
   - Tone and style

3. **Reflection Questions:**
   - How well does Llama 3.2 adapt to different personas?
   - How do system prompts influence the model's responses?
   - How might system prompts be useful in real applications?

**Expected Observations:**
- Llama 3.2 should adapt well to different personas
- You should see clear differences in vocabulary, style, and explanation depth
- This demonstrates the power of effective prompting techniques


---

## Key Takeaways

**What we learned:**
- ‚úÖ How temperature affects model creativity and determinism
- ‚úÖ Chain-of-Thought prompting for complex reasoning tasks (like math calculations)
- ‚úÖ System prompts for persona-based and role-playing scenarios
- ‚úÖ The importance of effective prompting techniques

**Why Llama 3.2 3B?**
- Capable model that demonstrates the power of proper prompting techniques
- Good balance between model size and performance
- Fast inference allows for rapid experimentation and learning
- Better at reasoning tasks like math calculations compared to very small models

**Next Steps:**
- Experiment with different temperature values
- Try CoT prompting on other reasoning tasks
- Explore more complex system prompts and multi-turn conversations
