# Chapter 6: Prompt Engineering - Easy Tasks

This notebook covers basic prompt engineering concepts: temperature effects, prompt components, in-context learning, and chain-of-thought reasoning.

## Setup

Run all cells in this section to set up the environment and load the model.

Before running these cells, review the concepts from the main Chapter 6 notebook (00_Start_Here.ipynb).

### [Optional] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab, uncomment and run the following code to install dependencies.

**Note**: Use a GPU for this notebook. In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4.

In [1]:
%%capture
!pip install transformers>=4.40.0 torch accelerate

### Model Loading

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [3]:
model_path = "microsoft/Phi-3-mini-4k-instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

### Helper Functions

In [4]:
def generate_text(prompt, temperature=0.7, max_tokens=200):
    """Generate text with specified parameters"""
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        return_full_text=False,
        max_new_tokens=max_tokens,
        do_sample=True if temperature > 0 else False,
        temperature=temperature if temperature > 0 else None,
    )

    messages = [{"role": "user", "content": prompt}]
    output = pipe(messages)
    return output[0]['generated_text']

## Challenges

Complete the following tasks by implementing the starter code.

### Level: Easy

**About This Task:**
Temperature controls randomness in generation. Lower values give consistent outputs, higher values give varied outputs.

#### Easy Task 1: Finding the Right Temperature

### Instructions

1. Execute code to compare temperature effects on three use cases
2. Fill in missing temperature values based on your observations
3. Run determinism test to verify temperature=0 consistency
4. Test with your own prompts
5. Analyze which temperatures work best for different tasks

In [43]:
temperatures = [0.0, 0.3, 0.7, 1.0, 1.5, 4.0]

Notice how different temperatures affect each use case.

In [52]:
# Factual: What is the capital city of France?
prompt1 = "What is the capital city of France?"
print(f"Prompt 1: {prompt1}")
for temp in temperatures:
  output = generate_text(prompt1, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

Device set to use cuda


Prompt 1: What is the capital city of France?


Device set to use cuda


Temp=0.0:  The capital city of France is Paris.


Device set to use cuda


Temp=0.3:  The capital city of France is Paris.


Device set to use cuda


Temp=0.7:  The capital city of France is Paris.


Device set to use cuda


Temp=1.0:  The capital city of France is Paris.


Device set to use cuda


Temp=1.5:  The capital city of France is Paris.
Temp=4.0:  Marseille 28


In [53]:
# Creative: Write the first sentence of a romance novel.
prompt2 = "Write the first sentence of a romance novel."
print(f"Prompt 2: {prompt2}")
for temp in temperatures:
  output = generate_text(prompt2, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

Device set to use cuda


Prompt 2: Write the first sentence of a romance novel.


Device set to use cuda


Temp=0.0:  As the sun dipped below the horizon, their eyes met across the crowded room, and in that fleeting moment, they knew their lives would never be the same again.


Device set to use cuda


Temp=0.3:  Under the soft glow of the moonlight, their eyes met for the first time, and an unspoken promise of love hung in the air between them.


Device set to use cuda


Temp=0.7:  Underneath the vibrant hues of the setting sun, Ella found herself entranced by the mysterious stranger who had just entered the quaint little café.


Device set to use cuda


Temp=1.0:  Amelia gazed through the foggy windowpane, finding the first glimmer of sunrise in Ethan's eyes, her heart fluttering like a captive bird.


Device set to use cuda


Temp=1.5:  Amelia nervously clenched her partner's calloused hand as the fading sunlight basked their garden date in a warm, ethereal light.
Temp=4.0:  The dextone cliff behind, Cass designed a stunning plan of escape but forgot just twice before reaching Olivet Hill: not before meeting in-betreft Alex beneath the centuries-piled autumntal leaves during Worldly Games


In [54]:
# Math: "What is the square root of 10000?"
prompt3 = "What is the square root of 10000?"
print(f"Prompt 3: {prompt3}")
for temp in temperatures:
  output = generate_text(prompt3, temperature=temp, max_tokens=50)
  print(f"Temp={temp}: {output}")

Device set to use cuda


Prompt 3: What is the square root of 10000?


Device set to use cuda


Temp=0.0:  The square root of 10000 is 100. This is because 100 multiplied by itself (100 * 100) equals 10000.


Device set to use cuda


Temp=0.3:  The square root of 10000 is 100.


Device set to use cuda


Temp=0.7:  The square root of a number is a value that, when multiplied by itself, gives the original number. For the number 10000:

√10000 = 100

This is because


Device set to use cuda


Temp=1.0:  The square root of a number is a value that, when multiplied by itself, gives the original number. In this case, the square root of 10000 is 100, because when you multiply 100


Device set to use cuda


Temp=1.5:  The square root of 10000 is exactly
Temp=4.0:  **Calculations on Solution Exhibitor.model(sqrt).calc() for problem number $Mathemat-2* Mathemai...** 36 | ApproXe at..:42 + or~ $-0^.$
Sol


The effects of temperature is clear in the codes above

### Task 1a: Select Best Temperature

Based on the outputs above, fill in the best temperature for each use case.

In [55]:
# Fill in: What temperature works best for each task?
best_temp_factual = None  # For "What is the capital of France?"
best_temp_creative = None  # For "Write the first sentence..."
best_temp_math = None  # For "What is the square..."

Test your selections here.

In [9]:
print("Testing your temperature selections:")

if best_temp_factual is not None:
    output = generate_text("What is the capital of France?", temperature=best_temp_factual, max_tokens=30)
    print(f"\nFactual (temp={best_temp_factual}): {output}")

if best_temp_creative is not None:
    output = generate_text("Write the first sentence of a mystery novel.", temperature=best_temp_creative, max_tokens=50)
    print(f"\nCreative (temp={best_temp_creative}): {output}")

if best_temp_code is not None:
    output = generate_text("Write a Python function to calculate factorial.", temperature=best_temp_code, max_tokens=100)
    print(f"\nCode (temp={best_temp_code}): {output}")

Testing your temperature selections:


### Task 1b: Determinism Test

Run this cell multiple times to verify temperature=0 gives identical outputs.

In [10]:
output = generate_text("What is 2+2?", temperature=0, max_tokens=20)
print(f"Output: {output}")
print("\nRun this cell again - you should get the EXACT same output.")

Device set to use cuda


Output:  The sum of 2 and 2 is 4.

Run this cell again - you should get the EXACT same output.


### Questions

1. At temperature=1.5, did the factual question give wrong answers? Why is determinism critical for factual tasks?

2. For creative writing, compare outputs at temperature=0.3 vs 1.0. Which produced more interesting variations?

3. Did code generation at temperature=1.5 produce valid Python? What's the risk of high temperature for code?

**About This Task:**
Prompts have seven components: Persona, Instruction, Context, Format, Audience, Tone, Data. Adding more components improves output quality.

#### Easy Task 2: Building a Complete Prompt

### Instructions

1. Run pre-built prompt versions to see incremental improvements
2. Complete `prompt_v5` by adding the missing 3 components
3. Test removing Format to see its impact
4. Create your own scenario
5. Compare output quality as components are added

We start with just an instruction and gradually add components.

In [11]:
# Version 1: Instruction only
prompt_v1 = "Explain how to make coffee."

In [12]:
print("V1: Instruction only")
output = generate_text(prompt_v1, temperature=0, max_tokens=150)
print(output)

Device set to use cuda


V1: Instruction only
 Making coffee is a simple process that can be done with a few basic tools and ingredients. Here is a step-by-step guide to making a basic cup of coffee:

1. Gather your supplies: You will need a coffee maker, coffee grounds, water, a coffee filter, a spoon, and a mug.

2. Measure the water: Fill the water reservoir of your coffee maker with the desired amount of water. The general rule of thumb is to use 1 to 2 tablespoons of coffee grounds per 6 ounces of water.

3. Add the coffee grounds: Place a coffee filter in the coffee maker's filter


In [13]:
# Version 2: + Audience
prompt_v2 = """Explain how to make coffee.

Audience: Someone who has never made coffee before."""

In [14]:
print("V2: + Audience")
output = generate_text(prompt_v2, temperature=0, max_tokens=150)
print(output)

Device set to use cuda


V2: + Audience
 Making coffee is a simple process that can be enjoyed by anyone. Here's a step-by-step guide to help you make a delicious cup of coffee:

1. Gather your supplies: You will need a coffee maker, coffee grounds, water, a coffee filter, and a mug.

2. Measure the coffee grounds: A good rule of thumb is to use 1-2 tablespoons of coffee grounds for every 6 ounces of water. Adjust the amount according to your taste preference.

3. Fill the coffee maker: Pour water into the reservoir of your coffee maker. The amount of water you use will depend on the size of


Notice how adding Audience changes the language.

In [15]:
# Version 3: + Format
prompt_v3 = """Explain how to make coffee.

Audience: Someone who has never made coffee before.

Format:
1. Equipment needed
2. Step-by-step instructions
3. Common mistakes"""

In [16]:
print("V3: + Format")
output = generate_text(prompt_v3, temperature=0, max_tokens=200)
print(output)

Device set to use cuda


V3: + Format
 To make coffee, you'll need the following equipment: a coffee maker or a French press, a kettle, a coffee grinder (if you're using whole beans), a mug, and water.


Step-by-step instructions:

1. Measure out the desired amount of coffee beans. A standard ratio is 1 to 2 tablespoons of coffee per 6 ounces of water.

2. Grind the coffee beans to a medium-coarse consistency, similar to sea salt.

3. Boil water in the kettle. If using a French press, aim for water just off the boil (around 200°F or 93°C).

4. Place the coffee grounds in the coffee maker's filter or directly into the French press.

5. Slowly pour the hot water over the coffee grounds,


See how Format structures the output.

In [17]:
# Version 4: + Tone
prompt_v4 = """Explain how to make coffee.

Audience: Someone who has never made coffee before.

Format:
1. Equipment needed
2. Step-by-step instructions
3. Common mistakes

Tone: Friendly and encouraging."""

In [18]:
print("V4: + Tone")
output = generate_text(prompt_v4, temperature=0, max_tokens=200)
print(output)

Device set to use cuda


V4: + Tone
 Making coffee can be a delightful experience, and with a few simple steps, you can enjoy a perfect cup of coffee at home. Here's how to do it:


**Equipment Needed:**

- A coffee maker (drip, French press, or espresso machine)

- Fresh coffee beans or ground coffee

- A coffee grinder (if using whole beans)

- A kettle or a coffee maker with a built-in heater

- A coffee mug

- A spoon

- A timer


**Step-by-Step Instructions:**

1. Measure out the coffee beans or ground coffee. A general guideline is about 2 tablespoons of coffee per 6 ounces of water.

2. If using whole beans, grind them to a medium-coarse consist


### Task 2a: Complete Version 5

Your task: Add Character, Context, and Data to create a complete prompt.

In [56]:
# Fill in: Add the 3 missing components
prompt_v5 = """Character: [Who is giving this explanation?]

Explain how to make coffee.

Context: [Why does the person need to learn this?]

Audience: Someone who has never made coffee before.

Format:
1. Equipment needed
2. Step-by-step instructions
3. Common mistakes

Tone: Friendly and encouraging.

Data: [Specific details like coffee-to-water ratio]"""

In [57]:
print("V5: All 7 components")
output = generate_text(prompt_v5, temperature=0, max_tokens=250)
print(output)

Device set to use cuda


V5: All 7 components
 Character: A coffee enthusiast named Alex.


Context: Alex is teaching a friend who has never made coffee before.


Audience: Someone who has never made coffee before.


Format:

1. Equipment needed:

   - Coffee maker (drip, French press, or espresso machine)

   - Fresh coffee beans or ground coffee

   - Filter (for drip coffee makers)

   - Coffee scoop or measuring spoon

   - Water

   - Mug


2. Step-by-step instructions:

   - Measure out the coffee beans or ground coffee. A standard ratio is 1 to 2 tablespoons of coffee per 6 ounces of water.

   - If using a drip coffee maker, place a filter in the basket and add the ground coffee.

   - Fill the water reservoir with fresh, cold water up to the fill line.

   - Turn on the coffee maker and wait for the brewing process to complete.

   - If


### Questions

1. Compare V1 and V2 outputs. How did specifying Audience change the language complexity?

2. Which component made the biggest single improvement to output quality?

3. When might you intentionally use fewer components? Give a specific scenario where V1 would be better than V5.

**About This Task:**
In-context learning uses examples to guide the model. Zero-shot has no examples, one-shot has one, few-shot has multiple.

#### Easy Task 3: Improving Few-Shot Examples

### Instructions

1. Run zero-shot, one-shot, and few-shot on test greetings
2. Identify which greetings cause disagreement
3. Improve the few-shot prompt by adding better examples
4. Test edge cases
5. Analyze why certain examples improve accuracy

In [21]:
test_greetings = [
    "Good morning, how may I assist you?",
    "Hey, what's up?",
    "Hello, nice to meet you.",
    "Hi there.",
    "Dear valued customer,",  # Very formal
    "Yo!",  # Very casual
]

### Zero-Shot

Here we ask the model to classify without any examples.

In [22]:
print("Zero-shot classification:")
zero_results = {}

for greeting in test_greetings:
    prompt = f"""Classify formality: formal, neutral, or casual.

Greeting: {greeting}
Formality:"""

    result = generate_text(prompt, temperature=0, max_tokens=10).strip()
    zero_results[greeting] = result
    print(f"{greeting} -> {result}")

Device set to use cuda


Zero-shot classification:


Device set to use cuda


Good morning, how may I assist you? -> Formal


Device set to use cuda


Hey, what's up? -> Greeting: Hey, what's up?


Device set to use cuda


Hello, nice to meet you. -> Formal


Device set to use cuda


Hi there. -> Greeting: Hi there.
Formality:


Device set to use cuda


Dear valued customer, -> Formal
Yo! -> Greeting: Yo!
Formality:


### One-Shot

See how a single example helps guide the model.

In [62]:
print("One-shot classification:")
one_results = {}

for greeting in test_greetings:
    prompt = f"""Classify formality: formal, neutral, or casual.

Example:
Greeting: Dear Sir or Madam
Formality: formal

Greeting: {greeting}
Formality:"""

    result = generate_text(prompt, temperature=0, max_tokens=10).strip()
    one_results[greeting] = result
    print(f"{greeting} -> {result}")

Device set to use cuda


One-shot classification:


Device set to use cuda


Good morning, how may I assist you? -> Greeting: Good morning, how may I assist


Device set to use cuda


Hey, what's up? -> Greeting: Hey, what's up?


Device set to use cuda


Hello, nice to meet you. -> Greeting: Hello, nice to meet you.


Device set to use cuda


Hi there. -> Greeting: Hi there.
Formality:


Device set to use cuda


Dear valued customer, -> Formality: formal
Yo! -> Greeting: Yo!
Formality:


### Few-Shot

Your task: Improve this prompt by adding 1-2 more examples to handle edge cases better.

In [24]:
print("Few-shot classification:")
few_results = {}

for greeting in test_greetings:
    # Fill in: Add 1-2 more examples after "Hello, how are you"
    prompt = f"""Classify formality: formal, neutral, or casual.

Examples:

Greeting: Dear Sir or Madam
Formality: formal

Greeting: Yo dude
Formality: casual

Greeting: Hello, how are you
Formality: neutral

[Add 1-2 more examples here]

Greeting: {greeting}
Formality:"""

    result = generate_text(prompt, temperature=0, max_tokens=10).strip()
    few_results[greeting] = result
    print(f"{greeting} -> {result}")

Device set to use cuda


Few-shot classification:


Device set to use cuda


Good morning, how may I assist you? -> Greeting: Good morning, how may I assist


Device set to use cuda


Hey, what's up? -> Greeting: Good evening, Professor Smith
Form


Device set to use cuda


Hello, nice to meet you. -> Greeting: Good evening, Professor Smith
Form


Device set to use cuda


Hi there. -> Greeting: Hi there.
Formality:


Device set to use cuda


Dear valued customer, -> Greeting: Dear valued customer,
Yo! -> Greeting: Yo!
Formality:


### Comparison

Here we identify disagreements to see where examples help most.

In [25]:
disagreements = []

for greeting in test_greetings:
    zero = zero_results[greeting]
    one = one_results[greeting]
    few = few_results[greeting]

    print(f"\n{greeting}")
    print(f"  Zero-shot: {zero}")
    print(f"  One-shot:  {one}")
    print(f"  Few-shot:  {few}")

    if zero == one == few:
        print(f"  All agree")
    else:
        print(f"  DISAGREEMENT")
        disagreements.append(greeting)


Good morning, how may I assist you?
  Zero-shot: Formal
  One-shot:  Greeting: Good morning, how may I assist
  Few-shot:  Greeting: Good morning, how may I assist
  DISAGREEMENT

Hey, what's up?
  Zero-shot: Greeting: Hey, what's up?
  One-shot:  Greeting: Hey, what's up?
  Few-shot:  Greeting: Good evening, Professor Smith
Form
  DISAGREEMENT

Hello, nice to meet you.
  Zero-shot: Formal
  One-shot:  Greeting: Hello, nice to meet you.
  Few-shot:  Greeting: Good evening, Professor Smith
Form
  DISAGREEMENT

Hi there.
  Zero-shot: Greeting: Hi there.
Formality:
  One-shot:  Greeting: Hi there.
Formality:
  Few-shot:  Greeting: Hi there.
Formality:
  All agree

Dear valued customer,
  Zero-shot: Formal
  One-shot:  Formality: formal
  Few-shot:  Greeting: Dear valued customer,
  DISAGREEMENT

Yo!
  Zero-shot: Greeting: Yo!
Formality:
  One-shot:  Greeting: Yo!
Formality:
  Few-shot:  Greeting: Yo!
Formality:
  All agree


Notice which greetings benefit most from examples.

In [26]:
print(f"\n{len(disagreements)} greetings showed disagreement:")
for g in disagreements:
    print(f"  - {g}")


4 greetings showed disagreement:
  - Good morning, how may I assist you?
  - Hey, what's up?
  - Hello, nice to meet you.
  - Dear valued customer,


### Questions

1. Which greeting showed the biggest difference between zero-shot and few-shot? Why was it ambiguous?

2. Did adding more examples improve accuracy on edge cases like "Yo!" or "Dear valued customer"?

3. What makes a good few-shot example? Should you show edge cases or clear typical examples?

**About This Task:**
Chain-of-Thought prompting asks the model to show its reasoning step-by-step, improving accuracy on complex problems.

#### Easy Task 4: Testing Chain-of-Thought

### Instructions

1. Run direct prompting on simple and tricky problems
2. Compare with few-shot CoT to see reasoning improvements
3. Test zero-shot CoT on hard problems
4. Improve CoT examples to fix errors
5. Analyze when step-by-step reasoning prevents mistakes

We test on both simple problems and counter-intuitive ones.

In [27]:
problems = [
    ("If John has 5 apples and gives 2 to Mary, how many does he have?", 3, "easy"),
    ("A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?", 5, "easy"),
    ("A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?", 0.05, "tricky"),
]

### Direct Prompting

Here we ask for answers directly without reasoning.

In [28]:
print("Direct prompting (no reasoning):")

for question, correct, difficulty in problems:
    prompt = f"{question}\nAnswer:"
    answer = generate_text(prompt, temperature=0, max_tokens=30)

    print(f"\n[{difficulty.upper()}] {question}")
    print(f"Model: {answer.strip()}")
    print(f"Correct: {correct}")

Device set to use cuda


Direct prompting (no reasoning):


Device set to use cuda



[EASY] If John has 5 apples and gives 2 to Mary, how many does he have?
Model: John has 3 apples left.
Correct: 3


Device set to use cuda



[EASY] A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?
Model: The total cost of the tickets is 3 tickets * $15/ticket = $45.

You paid with a
Correct: 5

[TRICKY] A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?
Model: Let's call the cost of the ball x dollars. Since the bat costs $1 more than the ball, the cost of the bat is x
Correct: 0.05


Notice how direct prompting might fail on the tricky problem.

### Few-Shot CoT

Your task: Improve the prompt by adding a third example showing careful algebra.

In [29]:
print("Few-shot CoT:")

for question, correct, difficulty in problems:
    # Fill in: Add a third example to help with the tricky problem
    prompt = f"""Solve step-by-step.

Q: Roger has 5 balls. He buys 2 cans with 3 balls each. How many balls does he have?
A: Roger starts with 5 balls.
He buys 2 cans, each has 3 balls.
New balls: 2 × 3 = 6
Total: 5 + 6 = 11
Answer: 11

Q: A cafe had 23 apples. They used 20 for lunch and bought 6 more. How many now?
A: Start with 23 apples.
After using 20: 23 - 20 = 3
After buying 6: 3 + 6 = 9
Answer: 9

[Add another example showing careful math]

Q: {question}
A:"""

    answer = generate_text(prompt, temperature=0, max_tokens=150)

    print(f"\n[{difficulty.upper()}] {question}")
    print(f"Reasoning: {answer}")
    print(f"Correct: {correct}")

Device set to use cuda


Few-shot CoT:


Device set to use cuda



[EASY] If John has 5 apples and gives 2 to Mary, how many does he have?
Reasoning:  John starts with 5 apples.
He gives 2 apples to Mary.
Remaining apples: 5 - 2 = 3
Answer: 3
Correct: 3


Device set to use cuda



[EASY] A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?
Reasoning:  Cost of 3 tickets: 3 × $15 = $45
Amount paid: $50
Change: $50 - $45 = $5
Answer: $5
Correct: 5

[TRICKY] A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?
Reasoning:  Let's denote the cost of the ball as x.

Since the bat costs $1 more than the ball, the cost of the bat is x + $1.

The total cost of the bat and ball is $1.10, so we can write the equation:

x + (x + $1) = $1.10

Now, let's solve for x:

2x + $1 = $1.10

Subtract $1 from both sides:

2x = $0.10

Divide both sides by 2:

x = $0.05

Answer: The ball costs $0.
Correct: 0.05


See how showing reasoning steps helps catch mistakes.

### Zero-Shot CoT

Here we use the phrase "Let's think step-by-step" to trigger reasoning without examples.

In [30]:
print("Zero-shot CoT:")

for question, correct, difficulty in problems:
    prompt = f"{question}\n\nLet's think step-by-step:"
    answer = generate_text(prompt, temperature=0, max_tokens=150)

    print(f"\n[{difficulty.upper()}] {question}")
    print(f"Reasoning: {answer}")
    print(f"Correct: {correct}")

Device set to use cuda


Zero-shot CoT:


Device set to use cuda



[EASY] If John has 5 apples and gives 2 to Mary, how many does he have?
Reasoning:  To solve this problem, we need to follow these steps:

1. Identify the initial number of apples John has.
2. Determine the number of apples John gives to Mary.
3. Subtract the number of apples given to Mary from the initial number of apples John has.

Step 1: John initially has 5 apples.
Step 2: John gives 2 apples to Mary.
Step 3: Subtract the number of apples given to Mary from the initial number of apples John has: 5 - 2 = 3.

So, after giving 2 apples to Mary, John has 3 apples left
Correct: 3


Device set to use cuda



[EASY] A ticket costs $15. I buy 3 tickets with a $50 bill. How much change?
Reasoning:  Step 1: Calculate the total cost of the tickets.
The cost of one ticket is $15, and you are buying 3 tickets. So, the total cost is 15 * 3 = $45.

Step 2: Calculate the change.
You paid with a $50 bill, so the change you should receive is the difference between the amount you paid and the total cost of the tickets.
Change = Amount paid - Total cost = $50 - $45 = $5.

So, you should receive $5 in change.
Correct: 5

[TRICKY] A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much is the ball?
Reasoning:  Let's denote the cost of the ball as x (in dollars). Since the bat costs $1 more than the ball, the cost of the bat can be represented as x + $1.

According to the problem, the total cost of the bat and ball is $1.10. So, we can set up the following equation:

x (ball cost) + (x + $1) (bat cost) = $1.10

Now, let's solve the equation:

2x + $1 = $1.10

Subtract $1 from both s

Notice how a simple phrase triggers step-by-step reasoning.

### Questions

1. Did direct prompting get the bat-and-ball problem wrong? What's the common wrong answer ($0.10)?

2. Compare few-shot CoT vs zero-shot CoT on the tricky problem. Which caught the mistake better?

3. What type of problems benefit most from CoT? When is direct prompting good enough?