# Chapter 6: Prompt Engineering - Easy Tasks (Solutions)

Complete solutions for all Easy Tasks with filled-in answers.

## Setup

Run all cells in this section to set up the environment and load the model.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
model_path = "microsoft/Phi-3-mini-4k-instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

In [None]:
def generate_text(prompt, temperature=0.7, max_tokens=200):
    """Generate text with specified parameters"""
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        return_full_text=False,
        max_new_tokens=max_tokens,
        do_sample=True if temperature > 0 else False,
        temperature=temperature if temperature > 0 else None,
    )
    
    messages = [{"role": "user", "content": prompt}]
    output = pipe(messages)
    return output[0]['generated_text']

## Task 1: Finding the Right Temperature - SOLUTION

### Task 1a Solution: Best Temperature Values

In [None]:
# SOLUTION: Filled in based on experimentation
best_temp_factual = 0.0  # For "What is the capital of France?" - need deterministic answer
best_temp_creative = 1.0  # For "Write the first sentence..." - want variety
best_temp_code = 0.0  # For "Write a Python function..." - need correct syntax

In [None]:
print("Testing selections:")

if best_temp_factual is not None:
    output = generate_text("What is the capital of France?", temperature=best_temp_factual, max_tokens=30)
    print(f"\nFactual (temp={best_temp_factual}): {output}")

if best_temp_creative is not None:
    output = generate_text("Write the first sentence of a mystery novel.", temperature=best_temp_creative, max_tokens=50)
    print(f"\nCreative (temp={best_temp_creative}): {output}")

if best_temp_code is not None:
    output = generate_text("Write a Python function to calculate factorial.", temperature=best_temp_code, max_tokens=100)
    print(f"\nCode (temp={best_temp_code}): {output}")

### Questions Answered

**1. At temperature=1.5, did the factual question give wrong answers?**

Sometimes yes. High temperature can produce creative but incorrect answers. Determinism is critical for factual tasks because we want the same correct answer every time.

**2. For creative writing, compare outputs at temperature=0.3 vs 1.0.**

Temperature 0.3 produces safer, more predictable sentences. Temperature 1.0 produces more interesting variations with unexpected word choices.

**3. Did code generation at temperature=1.5 produce valid Python?**

Often no. High temperature can produce syntax errors or logical mistakes. The risk is broken code that won't run.

## Task 2: Building a Complete Prompt - SOLUTION

### Task 2a Solution: Complete Version 5

In [None]:
# SOLUTION: All 7 components filled in
prompt_v5 = """Persona: You are a patient barista trainer with 10 years of coffee-making experience.

Explain how to make coffee.

Context: This person just got their first coffee machine and wants to make their first cup at home.

Audience: Someone who has never made coffee before.

Format:
1. Equipment needed
2. Step-by-step instructions
3. Common mistakes

Tone: Friendly and encouraging.

Data: Use a 1:16 coffee-to-water ratio. For one cup, use 15g coffee and 240ml water. Water temperature should be 195-205°F."""

In [None]:
print("V5: All 7 components")
output = generate_text(prompt_v5, temperature=0, max_tokens=250)
print(output)

### Questions Answered

**1. Compare V1 and V2 outputs. How did specifying Audience change the language complexity?**

V2 used simpler vocabulary, shorter sentences, and avoided jargon. The explanation became more accessible to beginners.

**2. Which component made the biggest single improvement to output quality?**

Format typically has the biggest impact because it structures the entire response. Audience is second because it controls language level.

**3. When might you intentionally use fewer components?**

- Quick exploratory tasks where V1 is sufficient
- Creative tasks where too much structure limits creativity
- Token budget constraints in production systems

## Task 3: Improving Few-Shot Examples - SOLUTION

### Task 3 Solution: Better Few-Shot Examples

In [None]:
test_greetings = [
    "Good morning, how may I assist you?",
    "Hey, what's up?",
    "Hello, nice to meet you.",
    "Hi there.",
    "Dear valued customer,",
    "Yo!",
]

In [None]:
print("Few-shot classification:")
few_results = {}

for greeting in test_greetings:
    # SOLUTION: Added two more examples to cover edge cases
    prompt = f"""Classify formality: formal, neutral, or casual.

Examples:

Greeting: Dear Sir or Madam
Formality: formal

Greeting: Yo dude
Formality: casual

Greeting: Hello, how are you
Formality: neutral

Greeting: Good evening, I hope you are well
Formality: formal

Greeting: Hey there
Formality: casual

Greeting: {greeting}
Formality:"""
    
    result = generate_text(prompt, temperature=0, max_tokens=10).strip()
    few_results[greeting] = result
    print(f"{greeting} -> {result}")

### Questions Answered

**1. Which greeting showed the biggest difference between zero-shot and few-shot?**

"Hi there" and "Dear valued customer" are ambiguous. Few-shot examples help clarify boundaries by showing similar examples.

**2. Did adding more examples improve accuracy on edge cases?**

Yes. Adding "Hey there" (casual) and "Good evening, I hope you are well" (formal) helped with borderline cases.

**3. What makes a good few-shot example?**

- Clear, unambiguous instances
- Cover the full spectrum (formal, neutral, casual)
- Include edge cases similar to what you expect
- Consistent formatting

## Task 4: Testing Chain-of-Thought - SOLUTION

### Task 4 Solution: Better CoT Examples

In [None]:
problems = [
    ("If John has 5 apples and gives 2 to Mary, how many does he have?", 3, "easy"),
    ("A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?", 5, "easy"),
    ("A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?", 0.05, "tricky"),
]

In [None]:
print("Few-shot CoT:")

for question, correct, difficulty in problems:
    # SOLUTION: Added a third example showing careful algebra
    prompt = f"""Solve step-by-step.

Q: Roger has 5 balls. He buys 2 cans with 3 balls each. How many balls does he have?
A: Roger starts with 5 balls.
He buys 2 cans, each has 3 balls.
New balls: 2 × 3 = 6
Total: 5 + 6 = 11
Answer: 11

Q: A cafe had 23 apples. They used 20 for lunch and bought 6 more. How many now?
A: Start with 23 apples.
After using 20: 23 - 20 = 3
After buying 6: 3 + 6 = 9
Answer: 9

Q: A pen and a notebook together cost $3. The notebook costs $2 more than the pen. How much is the pen?
A: Let pen cost = x
Then notebook cost = x + 2
Together: x + (x + 2) = 3
Simplify: 2x + 2 = 3
Subtract 2: 2x = 1
Divide by 2: x = 0.50
Answer: The pen costs $0.50

Q: {question}
A:"""
    
    answer = generate_text(prompt, temperature=0, max_tokens=150)
    
    print(f"\n[{difficulty.upper()}] {question}")
    print(f"Reasoning: {answer}")
    print(f"Correct: {correct}")

### Questions Answered

**1. Did direct prompting get the bat-and-ball problem wrong?**

Often yes. The common wrong answer is $0.10. People intuitively think bat=$1.00, ball=$0.10, but that makes the bat only $0.90 more, not $1.00 more.

**2. Compare few-shot CoT vs zero-shot CoT on the tricky problem.**

Few-shot CoT with the algebra example catches the mistake better because it shows how to set up equations carefully. The third example demonstrates the exact pattern needed.

**3. What type of problems benefit most from CoT?**

- Multi-step calculations
- Counter-intuitive problems
- Problems where the obvious answer is wrong
- When you need to verify reasoning

Direct prompting is fine for simple lookups or single-step problems.

## Key Takeaways

1. **Temperature**: Use 0 for factual/code, 0.7-1.0 for creative tasks
2. **Prompt Components**: Format and Audience have the biggest impact
3. **Few-Shot Learning**: Good examples cover the spectrum and include edge cases
4. **Chain-of-Thought**: Essential for multi-step and counter-intuitive problems