# Chapter 6: Prompt Engineering - Easy Tasks (Solutions)

Complete solutions with all answers filled in. This notebook has the same structure as the task notebook.

## Setup

Run all cells in this section to set up the environment and load the model.

Before running these cells, review the concepts from the main Chapter 6 notebook (00_Start_Here.ipynb).

### [Optional] - Installing Packages on Google ColabIf you are viewing this notebook on Google Colab, uncomment and run the following code to install dependencies.**Note**: Use a GPU for this notebook. In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4.

In [None]:
%%capture
!pip install transformers>=4.40.0 torch accelerate

### Model Loading

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
model_path = "microsoft/Phi-3-mini-4k-instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

### Helper Functions

In [None]:
def generate_text(prompt, temperature=0.7, max_tokens=200):
    """Generate text with specified parameters"""
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        return_full_text=False,
        max_new_tokens=max_tokens,
        do_sample=True if temperature > 0 else False,
        temperature=temperature if temperature > 0 else None,
    )

    messages = [{"role": "user", "content": prompt}]
    output = pipe(messages)
    return output[0]['generated_text']

## Challenges

Complete the following tasks by implementing the starter code.

### Level: Easy

**About This Task:**
Temperature controls randomness in generation. Lower values give consistent outputs, higher values give varied outputs.

#### Easy Task 1: Finding the Right Temperature

### Instructions

1. Execute code to compare temperature effects on three use cases
2. Fill in missing temperature values based on your observations
3. Run determinism test to verify temperature=0 consistency
4. Test with your own prompts
5. Analyze which temperatures work best for different tasks

In [None]:
temperatures = [0.0, 0.3, 0.7, 1.0, 1.5, 4.0]

Notice how different temperatures affect each use case.

In [None]:
# Factual: What is the capital city of France?
prompt1 = "What is the capital city of France?"
print(f"Prompt 1: {prompt1}")
for temp in temperatures:
  output = generate_text(prompt1, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

In [None]:
# Creative: Write the first sentence of a romance novel.
prompt2 = "Write the first sentence of a romance novel."
print(f"Prompt 2: {prompt2}")
for temp in temperatures:
  output = generate_text(prompt2, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

In [None]:
# Math: "What is the square root of 10000?"
prompt3 = "What is the square root of 10000?"
print(f"Prompt 3: {prompt3}")
for temp in temperatures:
  output = generate_text(prompt3, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

The effects of temperature is clear in the codes above

### Task 1a: Select Best Temperature

Based on the outputs above, fill in the best temperature for each use case.

In [None]:
# Fill in: What temperature works best for each task?best_temp_factual = 0.0  # SOLUTION  # For "What is the capital of France?"best_temp_creative = 1.0  # SOLUTION  # For "Write the first sentence..."best_temp_math = 0.0  # SOLUTION  # For "What is the square..."

Test your selections here.

In [None]:
print("Testing your temperature selections:")

if best_temp_factual is not None:
    output = generate_text("What is the capital of France?", temperature=best_temp_factual, max_tokens=30)
    print(f"\nFactual (temp={best_temp_factual}): {output}")

if best_temp_creative is not None:
    output = generate_text("Write the first sentence of a mystery novel.", temperature=best_temp_creative, max_tokens=50)
    print(f"\nCreative (temp={best_temp_creative}): {output}")

if best_temp_code is not None:
    output = generate_text("Write a Python function to calculate factorial.", temperature=best_temp_code, max_tokens=100)
    print(f"\nCode (temp={best_temp_code}): {output}")

### Task 1b: Determinism Test

Run this cell multiple times to verify temperature=0 gives identical outputs.

In [None]:
output = generate_text("What is 2+2?", temperature=0, max_tokens=20)
print(f"Output: {output}")
print("\nRun this cell again - you should get the EXACT same output.")

### Questions

1. At temperature=1.5, did the factual question give wrong answers? Why is determinism critical for factual tasks?

2. For creative writing, compare outputs at temperature=0.3 vs 1.0. Which produced more interesting variations?

3. Did code generation at temperature=1.5 produce valid Python? What's the risk of high temperature for code?

**About This Task:**
Prompts have seven components: Persona, Instruction, Context, Format, Audience, Tone, Data. Adding more components improves output quality.

#### Easy Task 2: Building a Complete Prompt

### Instructions

1. Run pre-built prompt versions to see incremental improvements
2. Complete `prompt_v5` by adding the missing 3 components
3. Test removing Format to see its impact
4. Create your own scenario
5. Compare output quality as components are added

We start with just an instruction and gradually add components.

In [None]:
# Version 1: Instruction only
prompt_v1 = "Explain how to make coffee."

In [None]:
print("V1: Instruction only")
output = generate_text(prompt_v1, temperature=0, max_tokens=150)
print(output)

In [None]:
# Version 2: + Audience
prompt_v2 = """Explain how to make coffee.

Audience: Someone who has never made coffee before."""

In [None]:
print("V2: + Audience")
output = generate_text(prompt_v2, temperature=0, max_tokens=150)
print(output)

Notice how adding Audience changes the language.

In [None]:
# Version 3: + Format
prompt_v3 = """Explain how to make coffee.

Audience: Someone who has never made coffee before.

Format:
1. Equipment needed
2. Step-by-step instructions
3. Common mistakes"""

In [None]:
print("V3: + Format")
output = generate_text(prompt_v3, temperature=0, max_tokens=200)
print(output)

See how Format structures the output.

In [None]:
# Version 4: + Tone
prompt_v4 = """Explain how to make coffee.

Audience: Someone who has never made coffee before.

Format:
1. Equipment needed
2. Step-by-step instructions
3. Common mistakes

Tone: Friendly and encouraging."""

In [None]:
print("V4: + Tone")
output = generate_text(prompt_v4, temperature=0, max_tokens=200)
print(output)

### Task 2a: Complete Version 5

Your task: Add Character, Context, and Data to create a complete prompt.

In [None]:
# Fill in: Add the 3 missing componentsprompt_v5 = """Character: You are a patient barista trainer with 10 years of experience  # SOLUTIONExplain how to make coffee.Context: This person just got their first coffee machine at home  # SOLUTIONAudience: Someone who has never made coffee before.Format:1. Equipment needed2. Step-by-step instructions3. Common mistakesTone: Friendly and encouraging.Data: Use 15g coffee to 240ml water (1:16 ratio). Water temperature 195-205°F  # SOLUTION"""

In [None]:
print("V5: All 7 components")
output = generate_text(prompt_v5, temperature=0, max_tokens=250)
print(output)

### Questions

1. Compare V1 and V2 outputs. How did specifying Audience change the language complexity?

2. Which component made the biggest single improvement to output quality?

3. When might you intentionally use fewer components? Give a specific scenario where V1 would be better than V5.

**About This Task:**
In-context learning uses examples to guide the model. Zero-shot has no examples, one-shot has one, few-shot has multiple.

#### Easy Task 3: Improving Few-Shot Examples

### Instructions

1. Run zero-shot, one-shot, and few-shot on test greetings
2. Identify which greetings cause disagreement
3. Improve the few-shot prompt by adding better examples
4. Test edge cases
5. Analyze why certain examples improve accuracy

In [None]:
test_greetings = [
    "Good morning, how may I assist you?",
    "Hey, what's up?",
    "Hello, nice to meet you.",
    "Hi there.",
    "Dear valued customer,",  # Very formal
    "Yo!",  # Very casual
]

### Zero-Shot

Here we ask the model to classify without any examples.

In [None]:
# Zero-shot: Good morning, how may I assist you?greeting1 = "Good morning, how may I assist you?"prompt_greeting1_zero = f"""Classify formality: formal, neutral, or casual.Greeting: {greeting1}Formality:"""result_greeting1_zero = generate_text(prompt_greeting1_zero, temperature=0, max_tokens=10).strip()print(f"{greeting1} -> {result_greeting1_zero}")

In [None]:
# Zero-shot: Hey, what's up?greeting2 = "Hey, what's up?"prompt_greeting2_zero = f"""Classify formality: formal, neutral, or casual.Greeting: {greeting2}Formality:"""result_greeting2_zero = generate_text(prompt_greeting2_zero, temperature=0, max_tokens=10).strip()print(f"{greeting2} -> {result_greeting2_zero}")

In [None]:
# Zero-shot: Hello, nice to meet you.greeting3 = "Hello, nice to meet you."prompt_greeting3_zero = f"""Classify formality: formal, neutral, or casual.Greeting: {greeting3}Formality:"""result_greeting3_zero = generate_text(prompt_greeting3_zero, temperature=0, max_tokens=10).strip()print(f"{greeting3} -> {result_greeting3_zero}")

In [None]:
# Zero-shot: Hi there.greeting4 = "Hi there."prompt_greeting4_zero = f"""Classify formality: formal, neutral, or casual.Greeting: {greeting4}Formality:"""result_greeting4_zero = generate_text(prompt_greeting4_zero, temperature=0, max_tokens=10).strip()print(f"{greeting4} -> {result_greeting4_zero}")

In [None]:
# Zero-shot: Dear valued customer,greeting5 = "Dear valued customer,"prompt_greeting5_zero = f"""Classify formality: formal, neutral, or casual.Greeting: {greeting5}Formality:"""result_greeting5_zero = generate_text(prompt_greeting5_zero, temperature=0, max_tokens=10).strip()print(f"{greeting5} -> {result_greeting5_zero}")

In [None]:
# Zero-shot: Yo!greeting6 = "Yo!"prompt_greeting6_zero = f"""Classify formality: formal, neutral, or casual.Greeting: {greeting6}Formality:"""result_greeting6_zero = generate_text(prompt_greeting6_zero, temperature=0, max_tokens=10).strip()print(f"{greeting6} -> {result_greeting6_zero}")

### One-Shot

See how a single example helps guide the model.

In [None]:
# One-shot: Good morning, how may I assist you?prompt_greeting1_one = f"""Classify formality: formal, neutral, or casual.Example:Greeting: Dear Sir or MadamFormality: formalGreeting: {greeting1}Formality:"""result_greeting1_one = generate_text(prompt_greeting1_one, temperature=0, max_tokens=10).strip()print(f"{greeting1} -> {result_greeting1_one}")

In [None]:
# One-shot: Hey, what's up?prompt_greeting2_one = f"""Classify formality: formal, neutral, or casual.Example:Greeting: Dear Sir or MadamFormality: formalGreeting: {greeting2}Formality:"""result_greeting2_one = generate_text(prompt_greeting2_one, temperature=0, max_tokens=10).strip()print(f"{greeting2} -> {result_greeting2_one}")

In [None]:
# One-shot: Hello, nice to meet you.prompt_greeting3_one = f"""Classify formality: formal, neutral, or casual.Example:Greeting: Dear Sir or MadamFormality: formalGreeting: {greeting3}Formality:"""result_greeting3_one = generate_text(prompt_greeting3_one, temperature=0, max_tokens=10).strip()print(f"{greeting3} -> {result_greeting3_one}")

In [None]:
# One-shot: Hi there.prompt_greeting4_one = f"""Classify formality: formal, neutral, or casual.Example:Greeting: Dear Sir or MadamFormality: formalGreeting: {greeting4}Formality:"""result_greeting4_one = generate_text(prompt_greeting4_one, temperature=0, max_tokens=10).strip()print(f"{greeting4} -> {result_greeting4_one}")

In [None]:
# One-shot: Dear valued customer,prompt_greeting5_one = f"""Classify formality: formal, neutral, or casual.Example:Greeting: Dear Sir or MadamFormality: formalGreeting: {greeting5}Formality:"""result_greeting5_one = generate_text(prompt_greeting5_one, temperature=0, max_tokens=10).strip()print(f"{greeting5} -> {result_greeting5_one}")

In [None]:
# One-shot: Yo!prompt_greeting6_one = f"""Classify formality: formal, neutral, or casual.Example:Greeting: Dear Sir or MadamFormality: formalGreeting: {greeting6}Formality:"""result_greeting6_one = generate_text(prompt_greeting6_one, temperature=0, max_tokens=10).strip()print(f"{greeting6} -> {result_greeting6_one}")

### Few-Shot

Your task: Improve this prompt by adding 1-2 more examples to handle edge cases better.

In [None]:
# Few-shot: Good morning, how may I assist you?prompt_greeting1_few = f"""Classify formality: formal, neutral, or casual.Examples:Greeting: Dear Sir or MadamFormality: formalGreeting: Yo dudeFormality: casualGreeting: Hello, how are youFormality: neutral# SOLUTION: Added two edge case examplesGreeting: Good evening, I hope you are wellFormality: formalGreeting: Hey thereFormality: casualGreeting: {greeting1}Formality:"""result_greeting1_few = generate_text(prompt_greeting1_few, temperature=0, max_tokens=10).strip()print(f"{greeting1} -> {result_greeting1_few}")

In [None]:
# Few-shot: Hey, what's up?prompt_greeting2_few = f"""Classify formality: formal, neutral, or casual.Examples:Greeting: Dear Sir or MadamFormality: formalGreeting: Yo dudeFormality: casualGreeting: Hello, how are youFormality: neutral# SOLUTION: Added two edge case examplesGreeting: Good evening, I hope you are wellFormality: formalGreeting: Hey thereFormality: casualGreeting: {greeting2}Formality:"""result_greeting2_few = generate_text(prompt_greeting2_few, temperature=0, max_tokens=10).strip()print(f"{greeting2} -> {result_greeting2_few}")

In [None]:
# Few-shot: Hello, nice to meet you.prompt_greeting3_few = f"""Classify formality: formal, neutral, or casual.Examples:Greeting: Dear Sir or MadamFormality: formalGreeting: Yo dudeFormality: casualGreeting: Hello, how are youFormality: neutral# SOLUTION: Added two edge case examplesGreeting: Good evening, I hope you are wellFormality: formalGreeting: Hey thereFormality: casualGreeting: {greeting3}Formality:"""result_greeting3_few = generate_text(prompt_greeting3_few, temperature=0, max_tokens=10).strip()print(f"{greeting3} -> {result_greeting3_few}")

In [None]:
# Few-shot: Hi there.prompt_greeting4_few = f"""Classify formality: formal, neutral, or casual.Examples:Greeting: Dear Sir or MadamFormality: formalGreeting: Yo dudeFormality: casualGreeting: Hello, how are youFormality: neutral# SOLUTION: Added two edge case examplesGreeting: Good evening, I hope you are wellFormality: formalGreeting: Hey thereFormality: casualGreeting: {greeting4}Formality:"""result_greeting4_few = generate_text(prompt_greeting4_few, temperature=0, max_tokens=10).strip()print(f"{greeting4} -> {result_greeting4_few}")

In [None]:
# Few-shot: Dear valued customer,prompt_greeting5_few = f"""Classify formality: formal, neutral, or casual.Examples:Greeting: Dear Sir or MadamFormality: formalGreeting: Yo dudeFormality: casualGreeting: Hello, how are youFormality: neutral# SOLUTION: Added two edge case examplesGreeting: Good evening, I hope you are wellFormality: formalGreeting: Hey thereFormality: casualGreeting: {greeting5}Formality:"""result_greeting5_few = generate_text(prompt_greeting5_few, temperature=0, max_tokens=10).strip()print(f"{greeting5} -> {result_greeting5_few}")

In [None]:
# Few-shot: Yo!prompt_greeting6_few = f"""Classify formality: formal, neutral, or casual.Examples:Greeting: Dear Sir or MadamFormality: formalGreeting: Yo dudeFormality: casualGreeting: Hello, how are youFormality: neutral# SOLUTION: Added two edge case examplesGreeting: Good evening, I hope you are wellFormality: formalGreeting: Hey thereFormality: casualGreeting: {greeting6}Formality:"""result_greeting6_few = generate_text(prompt_greeting6_few, temperature=0, max_tokens=10).strip()print(f"{greeting6} -> {result_greeting6_few}")

### Comparison

Here we identify disagreements to see where examples help most.

In [None]:
# Compare results for: Good morning, how may I assist you?print(f"\n{greeting1}")print(f"  Zero-shot: {result_greeting1_zero}")print(f"  One-shot:  {result_greeting1_one}")print(f"  Few-shot:  {result_greeting1_few}")if result_greeting1_zero == result_greeting1_one == result_greeting1_few:    print(f"  All agree")else:    print(f"  DISAGREEMENT")

In [None]:
# Compare results for: Hey, what's up?print(f"\n{greeting2}")print(f"  Zero-shot: {result_greeting2_zero}")print(f"  One-shot:  {result_greeting2_one}")print(f"  Few-shot:  {result_greeting2_few}")if result_greeting2_zero == result_greeting2_one == result_greeting2_few:    print(f"  All agree")else:    print(f"  DISAGREEMENT")

In [None]:
# Compare results for: Hello, nice to meet you.print(f"\n{greeting3}")print(f"  Zero-shot: {result_greeting3_zero}")print(f"  One-shot:  {result_greeting3_one}")print(f"  Few-shot:  {result_greeting3_few}")if result_greeting3_zero == result_greeting3_one == result_greeting3_few:    print(f"  All agree")else:    print(f"  DISAGREEMENT")

In [None]:
# Compare results for: Hi there.print(f"\n{greeting4}")print(f"  Zero-shot: {result_greeting4_zero}")print(f"  One-shot:  {result_greeting4_one}")print(f"  Few-shot:  {result_greeting4_few}")if result_greeting4_zero == result_greeting4_one == result_greeting4_few:    print(f"  All agree")else:    print(f"  DISAGREEMENT")

In [None]:
# Compare results for: Dear valued customer,print(f"\n{greeting5}")print(f"  Zero-shot: {result_greeting5_zero}")print(f"  One-shot:  {result_greeting5_one}")print(f"  Few-shot:  {result_greeting5_few}")if result_greeting5_zero == result_greeting5_one == result_greeting5_few:    print(f"  All agree")else:    print(f"  DISAGREEMENT")

In [None]:
# Compare results for: Yo!print(f"\n{greeting6}")print(f"  Zero-shot: {result_greeting6_zero}")print(f"  One-shot:  {result_greeting6_one}")print(f"  Few-shot:  {result_greeting6_few}")if result_greeting6_zero == result_greeting6_one == result_greeting6_few:    print(f"  All agree")else:    print(f"  DISAGREEMENT")

### Questions

1. Which greeting showed the biggest difference between zero-shot and few-shot? Why was it ambiguous?

2. Did adding more examples improve accuracy on edge cases like "Yo!" or "Dear valued customer"?

3. What makes a good few-shot example? Should you show edge cases or clear typical examples?

**About This Task:**
Chain-of-Thought prompting asks the model to show its reasoning step-by-step, improving accuracy on complex problems.

#### Easy Task 4: Testing Chain-of-Thought

### Instructions

1. Run direct prompting on simple and tricky problems
2. Compare with few-shot CoT to see reasoning improvements
3. Test zero-shot CoT on hard problems
4. Improve CoT examples to fix errors
5. Analyze when step-by-step reasoning prevents mistakes

We test on both simple problems and counter-intuitive ones.

In [None]:
problems = [
    ("If John has 5 apples and gives 2 to Mary, how many does he have?", 3, "easy"),
    ("A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?", 5, "easy"),
    ("A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?", 0.05, "tricky"),
]

### Direct Prompting

Here we ask for answers directly without reasoning.

In [None]:
print("Direct prompting (no reasoning):")

for question, correct, difficulty in problems:
    prompt = f"{question}\nAnswer:"
    answer = generate_text(prompt, temperature=0, max_tokens=30)

    print(f"\n[{difficulty.upper()}] {question}")
    print(f"Model: {answer.strip()}")
    print(f"Correct: {correct}")

In [None]:
# Direct prompting: If John has 5 apples and gives 2 to Mary, how many...problem1_question = "If John has 5 apples and gives 2 to Mary, how many does he have?"problem1_correct = 3problem1_difficulty = "easy"prompt_problem1_direct = f"{problem1_question}\nAnswer:"answer_problem1_direct = generate_text(prompt_problem1_direct, temperature=0, max_tokens=30)print(f"\n[{problem1_difficulty.upper()}] {problem1_question}")print(f"Model: {answer_problem1_direct.strip()}")print(f"Correct: {problem1_correct}")

In [None]:
# Direct prompting: A ticket costs $15. I buy 3 tickets with a $50 bil...problem2_question = "A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?"problem2_correct = 5problem2_difficulty = "easy"prompt_problem2_direct = f"{problem2_question}\nAnswer:"answer_problem2_direct = generate_text(prompt_problem2_direct, temperature=0, max_tokens=30)print(f"\n[{problem2_difficulty.upper()}] {problem2_question}")print(f"Model: {answer_problem2_direct.strip()}")print(f"Correct: {problem2_correct}")

In [None]:
# Direct prompting: A bat and ball cost $1.10 total. The bat costs $1 ...problem3_question = "A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?"problem3_correct = 0.05problem3_difficulty = "tricky"prompt_problem3_direct = f"{problem3_question}\nAnswer:"answer_problem3_direct = generate_text(prompt_problem3_direct, temperature=0, max_tokens=30)print(f"\n[{problem3_difficulty.upper()}] {problem3_question}")print(f"Model: {answer_problem3_direct.strip()}")print(f"Correct: {problem3_correct}")

Notice how direct prompting might fail on the tricky problem.

### Few-Shot CoT

Your task: Improve the prompt by adding a third example showing careful algebra.

In [None]:
# Few-shot CoT: If John has 5 apples and gives 2 to Mary, how many...prompt_problem1_fewcot = f"""Solve step-by-step.Q: Roger has 5 balls. He buys 2 cans with 3 balls each. How many balls does he have?A: Roger starts with 5 balls.He buys 2 cans, each has 3 balls.New balls: 2 × 3 = 6Total: 5 + 6 = 11Answer: 11Q: A cafe had 23 apples. They used 20 for lunch and bought 6 more. How many now?A: Start with 23 apples.After using 20: 23 - 20 = 3After buying 6: 3 + 6 = 9Answer: 9# SOLUTION: Added algebra example for tricky problemsQ: A pen and notebook cost $3 total. The notebook costs $2 more than the pen. How much is the pen?A: Let pen cost = xThen notebook = x + 2Together: x + (x + 2) = 3Simplify: 2x + 2 = 3Subtract 2: 2x = 1Divide by 2: x = 0.50Answer: $0.50Q: {problem1_question}A:"""answer_problem1_fewcot = generate_text(prompt_problem1_fewcot, temperature=0, max_tokens=150)print(f"\n[{problem1_difficulty.upper()}] {problem1_question}")print(f"Reasoning: {answer_problem1_fewcot}")print(f"Correct: {problem1_correct}")

In [None]:
# Few-shot CoT: A ticket costs $15. I buy 3 tickets with a $50 bil...prompt_problem2_fewcot = f"""Solve step-by-step.Q: Roger has 5 balls. He buys 2 cans with 3 balls each. How many balls does he have?A: Roger starts with 5 balls.He buys 2 cans, each has 3 balls.New balls: 2 × 3 = 6Total: 5 + 6 = 11Answer: 11Q: A cafe had 23 apples. They used 20 for lunch and bought 6 more. How many now?A: Start with 23 apples.After using 20: 23 - 20 = 3After buying 6: 3 + 6 = 9Answer: 9# SOLUTION: Added algebra example for tricky problemsQ: A pen and notebook cost $3 total. The notebook costs $2 more than the pen. How much is the pen?A: Let pen cost = xThen notebook = x + 2Together: x + (x + 2) = 3Simplify: 2x + 2 = 3Subtract 2: 2x = 1Divide by 2: x = 0.50Answer: $0.50Q: {problem2_question}A:"""answer_problem2_fewcot = generate_text(prompt_problem2_fewcot, temperature=0, max_tokens=150)print(f"\n[{problem2_difficulty.upper()}] {problem2_question}")print(f"Reasoning: {answer_problem2_fewcot}")print(f"Correct: {problem2_correct}")

In [None]:
# Few-shot CoT: A bat and ball cost $1.10 total. The bat costs $1 ...prompt_problem3_fewcot = f"""Solve step-by-step.Q: Roger has 5 balls. He buys 2 cans with 3 balls each. How many balls does he have?A: Roger starts with 5 balls.He buys 2 cans, each has 3 balls.New balls: 2 × 3 = 6Total: 5 + 6 = 11Answer: 11Q: A cafe had 23 apples. They used 20 for lunch and bought 6 more. How many now?A: Start with 23 apples.After using 20: 23 - 20 = 3After buying 6: 3 + 6 = 9Answer: 9# SOLUTION: Added algebra example for tricky problemsQ: A pen and notebook cost $3 total. The notebook costs $2 more than the pen. How much is the pen?A: Let pen cost = xThen notebook = x + 2Together: x + (x + 2) = 3Simplify: 2x + 2 = 3Subtract 2: 2x = 1Divide by 2: x = 0.50Answer: $0.50Q: {problem3_question}A:"""answer_problem3_fewcot = generate_text(prompt_problem3_fewcot, temperature=0, max_tokens=150)print(f"\n[{problem3_difficulty.upper()}] {problem3_question}")print(f"Reasoning: {answer_problem3_fewcot}")print(f"Correct: {problem3_correct}")

See how showing reasoning steps helps catch mistakes.

### Zero-Shot CoT

Here we use the phrase "Let's think step-by-step" to trigger reasoning without examples.

In [None]:
# Zero-shot CoT: If John has 5 apples and gives 2 to Mary, how many...prompt_problem1_zerocot = f"{problem1_question}\n\nLet's think step-by-step:"answer_problem1_zerocot = generate_text(prompt_problem1_zerocot, temperature=0, max_tokens=150)print(f"\n[{problem1_difficulty.upper()}] {problem1_question}")print(f"Reasoning: {answer_problem1_zerocot}")print(f"Correct: {problem1_correct}")

In [None]:
# Zero-shot CoT: A ticket costs $15. I buy 3 tickets with a $50 bil...prompt_problem2_zerocot = f"{problem2_question}\n\nLet's think step-by-step:"answer_problem2_zerocot = generate_text(prompt_problem2_zerocot, temperature=0, max_tokens=150)print(f"\n[{problem2_difficulty.upper()}] {problem2_question}")print(f"Reasoning: {answer_problem2_zerocot}")print(f"Correct: {problem2_correct}")

In [None]:
# Zero-shot CoT: A bat and ball cost $1.10 total. The bat costs $1 ...prompt_problem3_zerocot = f"{problem3_question}\n\nLet's think step-by-step:"answer_problem3_zerocot = generate_text(prompt_problem3_zerocot, temperature=0, max_tokens=150)print(f"\n[{problem3_difficulty.upper()}] {problem3_question}")print(f"Reasoning: {answer_problem3_zerocot}")print(f"Correct: {problem3_correct}")

Notice how a simple phrase triggers step-by-step reasoning.

### Questions

1. Did direct prompting get the bat-and-ball problem wrong? What's the common wrong answer ($0.10)?

2. Compare few-shot CoT vs zero-shot CoT on the tricky problem. Which caught the mistake better?

3. What type of problems benefit most from CoT? When is direct prompting good enough?