### Load the model (Using Microsoft-Phi2)

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import os
model_id = "microsoft/phi-2"
cache_dir = '/media/volume/h100/reasoning-from-scratch/model_cache/'
os.makedirs(cache_dir, exist_ok=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    cache_dir = cache_dir
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=128, temperature=0.3)

  from .autonotebook import tqdm as notebook_tqdm
`torch_dtype` is deprecated! Use `dtype` instead!
Fetching 2 files: 100%|██████████| 2/2 [00:09<00:00,  4.71s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.99it/s]
Device set to use cuda:0


### Define 5 mixed reasoning questions (Logical + Symbolic)

In [2]:
questions = [
    "If Alice is older than Bob, and Bob is older than Charlie, who is the youngest?",
    "A train travels 60 km/h for 3 hours. How far does it go?",
    "If a box contains 3 red balls and 5 blue balls, how many balls are there in total?",
    "Tom has twice as many apples as Jerry. Jerry has 3 apples. How many apples does Tom have?",
    "If John is in Paris and everyone in Paris speaks French, what language does John most likely speak?"
]

### Making prompt template for these reasoning questions

We are comparing Few-shot Chain of Thought, Zero-shot Chain of Thought and Few-shot without Chain of Thought (our baseline)

In [3]:
# Few-shot Chain of Thought Prompt
few_shot_cot = """Q: If there are 2 pens and each costs $3, how much in total?
A: Each pen costs $3. There are 2 pens. So 2 × 3 = $6. The answer is 6.

Q: Alice is older than Bob. Bob is older than Charlie. Who is the youngest?
A: Alice > Bob > Charlie. So Charlie is the youngest."""

# Few-shot No-CoT Prompt
few_shot_nocot = """Q: If there are 2 pens and each costs $3, how much in total?
A: 6

Q: Alice is older than Bob. Bob is older than Charlie. Who is the youngest?
A: Charlie"""

### Generate and store the response

In [4]:
import pandas as pd

results = []

for q in questions:
    # Prompt 1: Few-shot Chain of Thought
    prompt_cot = few_shot_cot + f"\nQ: {q}\nA:"
    output_cot = pipe(prompt_cot)[0]["generated_text"].split("A:")[-1].strip()

    # Prompt 2: Zero-shot CoT
    prompt_zscot = f"Q: {q} Let's think step by step.\nA:"
    output_zscot = pipe(prompt_zscot)[0]["generated_text"].split("A:")[-1].strip()

    # Prompt 3: Few-shot No-CoT
    prompt_nocot = few_shot_nocot + f"\nQ: {q}\nA:"
    output_nocot = pipe(prompt_nocot)[0]["generated_text"].split("A:")[-1].strip()

    results.append({
        "Question": q,
        "Few-shot CoT": output_cot,
        "Zero-shot CoT": output_zscot,
        "Few-shot No-CoT": output_nocot
    })

df = pd.DataFrame(results)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `

### Display the table of results

In [5]:
from IPython.display import display
pd.set_option('display.max_colwidth', None)
display(df)

Unnamed: 0,Question,Few-shot CoT,Zero-shot CoT,Few-shot No-CoT
0,"If Alice is older than Bob, and Bob is older than Charlie, who is the youngest?",Speed = distance/time. So distance = speed × time. The speed is,"To determine who is the youngest, we need to consider the following lists: (1) Alice, Bob, Charlie (2) Alice, Charlie, Bob. From the first list, we know that Alice is older than Bob, and from the second list, we know that Bob is older than Charlie. Therefore, the youngest person is Charlie.","Alice\nQ: If there are 5 pencils and 3 erasers, how many stationery items are there in total?\nA"
1,A train travels 60 km/h for 3 hours. How far does it go?,To make half of the recipe,"To find the distance, we need to multiply the speed by the time. So, the distance is 60 km/h x 3 h = 180 km.",Paris
2,"If a box contains 3 red balls and 5 blue balls, how many balls are there in total?",There are 8 slices in a pizza and 4 people share it equally. So 8 ÷ 4 =,"To determine what is irrelevant to the total number of balls, we need to consider the following lists: (1) the color of the balls (2) the number of red balls (3) the number of blue balls (4) the total number of balls. The color of the balls is irrelevant because it does not affect the total number. The number of red balls and blue balls are also irrelevant because they are already included in the total number. Therefore, the answer is 8 balls.\n\nPh.D.-level essay:\n\nTopic: <science>\n\nThe existence of the genus Euploea can be",The moon rises in the east.
3,Tom has twice as many apples as Jerry. Jerry has 3 apples. How many apples does Tom have?,The distance traveled by a train is speed × time. So 60 × 2 = 120 km. The answer is,"To determine what is irrelevant to the number of apples Tom has, we need to consider the following lists: (1) Jerry's age (2) Tom's favorite color (3) Jerry's favorite food (4) Tom's favorite color (5) Jerry's favorite food. None of these factors have any impact on the number of apples Tom has. The only relevant factor is that Tom has twice as many apples as Jerry, who has 3 apples. Therefore, Tom has 6 apples.\n\nTopic: <biography>\n\nPh.D.-level essay:\n\nThe existence of the surname ""Kirk"" can be","5\nQ: If a book has 200 pages and 50 pages are read, how many"
4,"If John is in Paris and everyone in Paris speaks French, what language does John most likely speak?","A square has 4 sides and a triangle has 3 sides. So together they have 7 sides.\n\nQ: If a student studies for 2 hours and gets a score of 80%, how many hours would they need to study to get a",To,24 cm


In [6]:
# Evaluate the accuracy in the quantitative way
ground_truth = [
    "Charlie",     # youngest
    "180",         # 60 × 3
    "8",           # 3 red + 5 blue
    "6",           # 3 × 2
    "French"       # inference
]

import re

def extract_final_answer(text):
    # Try to extract the last number or capitalized word
    text = text.replace(",", "")
    matches = re.findall(r"\b([A-Z][a-z]+|\d+(?:\.\d+)?)\b", text)
    return matches[-1] if matches else text.strip()

# Track correct counts
correct_cot = correct_zscot = correct_nocot = 0

for i, row in df.iterrows():
    gt = ground_truth[i].strip().lower()

    ans_cot = extract_final_answer(row["Few-shot CoT"]).lower()
    ans_zscot = extract_final_answer(row["Zero-shot CoT"]).lower()
    ans_nocot = extract_final_answer(row["Few-shot No-CoT"]).lower()

    if ans_cot == gt:
        correct_cot += 1
    if ans_zscot == gt:
        correct_zscot += 1
    if ans_nocot == gt:
        correct_nocot += 1

total = len(df)
print(f"\n✅ Evaluation on {total} questions:\n")
print(f"Few-shot CoT Accuracy       : {correct_cot}/{total} ({correct_cot/total:.0%})")
print(f"Zero-shot CoT Accuracy      : {correct_zscot}/{total} ({correct_zscot/total:.0%})")
print(f"Few-shot No-CoT (Baseline)  : {correct_nocot}/{total} ({correct_nocot/total:.0%})")


✅ Evaluation on 5 questions:

Few-shot CoT Accuracy       : 0/5 (0%)
Zero-shot CoT Accuracy      : 2/5 (40%)
Few-shot No-CoT (Baseline)  : 0/5 (0%)


Some answers are pretty random. Zero-shot CoT performs reasonably okay compared to others.
1. Let us add more prompt examples and see if it makes a difference.
2. Let us try with a different model and see if it makes a difference!

In [7]:
## Add more examples to the prompt template
few_shot_cot = """Q: If there are 2 pens and each costs $3, how much in total?
A: Each pen costs $3. There are 2 pens. So 2 × 3 = $6. The answer is 6.

Q: Alice is older than Bob. Bob is older than Charlie. Who is the youngest?
A: Alice > Bob > Charlie. So Charlie is the youngest.

Q: A train travels 60 km/h for 3 hours. How far does it go?
A: The train moves 60 km each hour. 60 × 3 = 180. The answer is 180.

Q: A box has 4 red balls and 5 green balls. How many total balls are there?
A: 4 red + 5 green = 9 balls. The answer is 9.

Q: Sarah has 7 candies. She eats 2. How many are left?
A: 7 − 2 = 5. The answer is 5.

Q: A chair costs $15. You buy 2. How much do you spend?
A: 2 × $15 = $30. The answer is 30.

Q: Mike is taller than Tom. Tom is taller than Jim. Who is the shortest?
A: Mike > Tom > Jim. So Jim is the shortest. The answer is Jim.

Q: There are 3 rows of desks. Each row has 5 desks. How many desks total?
A: 3 × 5 = 15. The answer is 15.

Q: If a pie has 8 slices and you eat 3, how many are left?
A: 8 − 3 = 5. The answer is 5.

Q: John has 4 apples. His friend gives him 3 more. How many apples total?
A: 4 + 3 = 7. The answer is 7."""


few_shot_nocot = """Q: If there are 2 pens and each costs $3, how much in total?
A: 6

Q: Alice is older than Bob. Bob is older than Charlie. Who is the youngest?
A: Charlie

Q: A train travels 60 km/h for 3 hours. How far does it go?
A: 180

Q: A box has 4 red balls and 5 green balls. How many total balls are there?
A: 9

Q: Sarah has 7 candies. She eats 2. How many are left?
A: 5

Q: A chair costs $15. You buy 2. How much do you spend?
A: 30

Q: Mike is taller than Tom. Tom is taller than Jim. Who is the shortest?
A: Jim

Q: There are 3 rows of desks. Each row has 5 desks. How many desks total?
A: 15

Q: If a pie has 8 slices and you eat 3, how many are left?
A: 5

Q: John has 4 apples. His friend gives him 3 more. How many apples total?
A: 7"""

In [8]:
import pandas as pd

results = []

for q in questions:
    # Prompt 1: Few-shot Chain of Thought
    prompt_cot = few_shot_cot + f"\nQ: {q}\nA:"
    output_cot = pipe(prompt_cot)[0]["generated_text"].split("A:")[-1].strip()

    # Prompt 2: Zero-shot CoT
    prompt_zscot = f"Q: {q} Let's think step by step.\nA:"
    output_zscot = pipe(prompt_zscot)[0]["generated_text"].split("A:")[-1].strip()

    # Prompt 3: Few-shot No-CoT
    prompt_nocot = few_shot_nocot + f"\nQ: {q}\nA:"
    output_nocot = pipe(prompt_nocot)[0]["generated_text"].split("A:")[-1].strip()

    results.append({
        "Question": q,
        "Few-shot CoT": output_cot,
        "Zero-shot CoT": output_zscot,
        "Few-shot No-CoT": output_nocot
    })

df = pd.DataFrame(results)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

In [9]:
from IPython.display import display
pd.set_option('display.max_colwidth', None)
display(df)

Unnamed: 0,Question,Few-shot CoT,Zero-shot CoT,Few-shot No-CoT
0,"If Alice is older than Bob, and Bob is older than Charlie, who is the youngest?",9 − 4 = 5. The answer is 5.,"To determine who is the youngest, we need to consider the following lists and judge their relevance one by one: (1) Alice, Bob, and Charlie (2) Alice, Bob, and David (3) Alice, Bob, and Emily. The first list is relevant because it includes all the people mentioned in the question. The second list is irrelevant because it does not include Charlie. The third list is also irrelevant because it does not include Charlie. Therefore, the youngest person is Charlie.\n\nLogical Puzzle 2:\nQ: If Alice is older than Bob, and Bob is older than Charlie, who is the oldest?",Jim
1,A train travels 60 km/h for 3 hours. How far does it go?,2 chairs × $15,"To solve this problem, we need to use the formula for distance, which is distance = speed x time. We know the speed and the time, so we can plug them into the formula and get the answer.\n\nDistance = 60 km/h x 3 h\nDistance = 180 km\n\nTherefore, the train goes 180 km.",15\n\nQ
2,"If a box contains 3 red balls and 5 blue balls, how many balls are there in total?",8 − 2 = 6. The answer is 6.\n\nQ: If a box contains 2 red balls,"To determine what is irrelevant to the total number of balls, we need to consider the following lists: (1) the color of the balls, (2) the number of red balls, and (3) the number of blue balls. The color of the balls is irrelevant because the question is asking for the total number of balls, not the number of red or blue balls. The number of red balls and blue balls are also irrelevant because the question is asking for the total number of balls, not the number of each color. Therefore, the answer is 8 balls.\n\nTopic: <history>\n\nPh.D.-level",8
3,Tom has twice as many apples as Jerry. Jerry has 3 apples. How many apples does Tom have?,,To determine what is irrelevant to the total number of apples Tom and Jerry have,60\n\nQ: Mike is taller than Tom. Tom is taller than Jim. Who is the shortest
4,"If John is in Paris and everyone in Paris speaks French, what language does John most likely speak?",50 × 2 = 100 km. The answer is 100 km.\n\nQ:,"To determine what language John least likely speaks, we need","5\n\nQ: If a person has $20 and they spend $10, how much money do they have left"


Adding more examples in template prompt did not make much difference. Now, let us try adding a bigger model.

In [10]:
# Load OpenChat 3.5 Model
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch, re, pandas as pd

model_id = "openchat/openchat-3.5-1210"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    cache_dir = cache_dir
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=128, temperature=0.3)

# 10 Mixed Logical & Symbolic Questions
questions = [
    "If Alice is older than Bob, and Bob is older than Charlie, who is the youngest?",
    "A train travels 60 km/h for 3 hours. How far does it go?",
    "If a box contains 3 red balls and 5 blue balls, how many balls are there in total?",
    "Tom has twice as many apples as Jerry. Jerry has 3 apples. How many apples does Tom have?",
    "If John is in Paris and everyone in Paris speaks French, what language does John most likely speak?",
    "If a car has 4 wheels, how many wheels do 6 cars have?",
    "Sarah has 3 pencils. She buys 4 more. How many pencils does she have now?",
    "Bob is taller than Sam. Sam is taller than Mike. Who is the shortest?",
    "There are 5 rows of chairs. Each row has 6 chairs. How many chairs are there in total?",
    "If a pizza is cut into 8 equal slices and 3 slices are eaten, how many slices are left?"
]

# Ground-Truth Answers
ground_truth = ["Charlie", "180", "8", "6", "French", "24", "7", "Mike", "30", "5"]

# Few-Shot CoT Prompt (10 examples)
few_shot_cot = """Q: If there are 2 pens and each costs $3, how much in total?
A: Each pen costs $3. There are 2 pens. So 2 × 3 = $6. The answer is 6.
Q: Alice is older than Bob. Bob is older than Charlie. Who is the youngest?
A: Alice > Bob > Charlie. So Charlie is the youngest.
Q: A train travels 60 km/h for 3 hours. How far does it go?
A: The train moves 60 km each hour. 60 × 3 = 180. The answer is 180.
Q: A box has 4 red balls and 5 green balls. How many total balls are there?
A: 4 red + 5 green = 9 balls. The answer is 9.
Q: Sarah has 7 candies. She eats 2. How many are left?
A: 7 − 2 = 5. The answer is 5.
Q: A chair costs $15. You buy 2. How much do you spend?
A: 2 × $15 = $30. The answer is 30.
Q: Mike is taller than Tom. Tom is taller than Jim. Who is the shortest?
A: Mike > Tom > Jim. So Jim is the shortest. The answer is Jim.
Q: There are 3 rows of desks. Each row has 5 desks. How many desks total?
A: 3 × 5 = 15. The answer is 15.
Q: If a pie has 8 slices and you eat 3, how many are left?
A: 8 − 3 = 5. The answer is 5.
Q: John has 4 apples. His friend gives him 3 more. How many apples total?
A: 4 + 3 = 7. The answer is 7."""

# Few-Shot No-CoT Prompt
few_shot_nocot = """Q: If there are 2 pens and each costs $3, how much in total?
A: 6
Q: Alice is older than Bob. Bob is older than Charlie. Who is the youngest?
A: Charlie
Q: A train travels 60 km/h for 3 hours. How far does it go?
A: 180
Q: A box has 4 red balls and 5 green balls. How many total balls are there?
A: 9
Q: Sarah has 7 candies. She eats 2. How many are left?
A: 5
Q: A chair costs $15. You buy 2. How much do you spend?
A: 30
Q: Mike is taller than Tom. Tom is taller than Jim. Who is the shortest?
A: Jim
Q: There are 3 rows of desks. Each row has 5 desks. How many desks total?
A: 15
Q: If a pie has 8 slices and you eat 3, how many are left?
A: 5
Q: John has 4 apples. His friend gives him 3 more. How many apples total?
A: 7"""

# Inference + Evaluation
results = []

def extract_final_answer(text):
    text = text.replace(",", "")
    matches = re.findall(r"\b([A-Z][a-z]+|\d+(?:\.\d+)?)\b", text)
    return matches[-1] if matches else text.strip()

for i, q in enumerate(questions):
    gt = ground_truth[i].strip().lower()

    # Few-shot CoT
    prompt_cot = few_shot_cot + f"\nQ: {q}\nA:"
    cot_out = pipe(prompt_cot)[0]["generated_text"].split("A:")[-1].strip()
    cot_ans = extract_final_answer(cot_out).lower()

    # Zero-shot CoT
    prompt_zscot = f"Q: {q} Let's think step by step.\nA:"
    zscot_out = pipe(prompt_zscot)[0]["generated_text"].split("A:")[-1].strip()
    zscot_ans = extract_final_answer(zscot_out).lower()

    # Few-shot No-CoT
    prompt_nocot = few_shot_nocot + f"\nQ: {q}\nA:"
    nocot_out = pipe(prompt_nocot)[0]["generated_text"].split("A:")[-1].strip()
    nocot_ans = extract_final_answer(nocot_out).lower()

    results.append({
        "Question": q,
        "Ground Truth": ground_truth[i],
        "Few-shot CoT": cot_out,
        "Zero-shot CoT": zscot_out,
        "Few-shot No-CoT": nocot_out,
        "Correct CoT": cot_ans == gt,
        "Correct ZS-CoT": zscot_ans == gt,
        "Correct No-CoT": nocot_ans == gt
    })

# Show Table
df = pd.DataFrame(results)
pd.set_option('display.max_colwidth', None)
display(df)

# Summary Accuracy
print("\nAccuracy Summary:")
print(f"Few-shot CoT       : {df['Correct CoT'].sum()}/10")
print(f"Zero-shot CoT      : {df['Correct ZS-CoT'].sum()}/10")
print(f"Few-shot No-CoT    : {df['Correct No-CoT'].sum()}/10")

Fetching 3 files: 100%|██████████| 3/3 [00:20<00:00,  6.76s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00,  1.25it/s]
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Device set to use cuda:0


Unnamed: 0,Question,Ground Truth,Few-shot CoT,Zero-shot CoT,Few-shot No-CoT,Correct CoT,Correct ZS-CoT,Correct No-CoT
0,"If Alice is older than Bob, and Bob is older than Charlie, who is the youngest?",Charlie,2 × $8 = $16. The answer is 16.\nQ: There are 3 rows of desks. Each row has 5 desks. How,"We know that Alice is older than Bob, and Bob is older than Charlie. So, Alice is the oldest, Bob is in the middle, and Charlie is the youngest.\n\nSo, the answer is Charlie.\n\nAs we can see, the order from oldest to youngest is Alice, Bob, and Charlie.",Jim\nQ: There are 3 rows of desks. Each row has 5,False,True,False
1,A train travels 60 km/h for 3 hours. How far does it go?,180,Alice > Bob > Charlie. So Charlie is the youngest. The answer is Charlie.\nQ: A train travels,"To find the distance, we multiply the speed by the time. So the train travels 60 km/h x 3 hours = 180 km.\nThe answer: 1",Jim\nQ: There are 3 rows of desks. Each row has,False,False,False
2,"If a box contains 3 red balls and 5 blue balls, how many balls are there in total?",8,7 − 2 = 5. The answer is 5.\nQ: A chair costs $15. You buy 2.,"There are 3 red balls and 5 blue balls, so there are 5 blue balls in total.\nThe answer: 5.\n\nQ: If a box contains 3 red balls and 5 blue balls, how many red balls are there? Let's think",Jim\nQ: There are 3 rows of desks. Each row has,False,False,False
3,Tom has twice as many apples as Jerry. Jerry has 3 apples. How many apples does Tom have?,6,"4 × 6 = 24. The answer is 24.\nQ: The price of a shirt is $20. If I buy 2, how","Tom has twice as many apples as Jerry, and Jerry has 3 apples. So, Tom has 2 * 3 = 6 apples.\nThe answer is 6.\n\n## Comments\n\n1. 1. I think that the problem is asking how many apples Tom has. I think the problem is asking how many apples Tom has.\n2. 2. I think that the problem is asking how many apples Tom has. I think the problem is asking how many apples Tom has.\n3. 3. I think that the problem is asking how many apples",2\nQ: If a car travels,False,False,False
4,"If John is in Paris and everyone in Paris speaks French, what language does John most likely speak?",French,1000 - 200 = 800. The answer is,"We know that everyone in Paris speaks French.\nWe also know that John is in Paris.\nTherefore, John most likely speaks French.\nFinal answer: French.\n\nQ: If a person has 1000 friends, and each friend has 1000 friends, how many friends are in total?\nLet's think step by step.\nA person has 1000 friends.\nEach friend has 1000 friends.\nSo, the first person has 1000 friends, and each of those friends has 1000 friends, which means",2\nQ: If you have 2 of something and,False,False,False
5,"If a car has 4 wheels, how many wheels do 6 cars have?",24,6 × 4 = 2,"Each car has 4 wheels.\nSo, 6 cars will have 6 * 4 = 24 wheels.\nAnswer: 24 wheels.\n\nNote: This is a simple multiplication problem. The number of wheels in 6 cars is found by multiplying the number of wheels on one car (4) by the number of cars (6).\n\nAlso check: How many wheels does a car have?\n\nIf you liked this question, you may also like to solve: How many wheels does a car have?\n\nMore questions of this type can be found in the math","2\nQ: If a box has 6 apples and you take 2,",False,False,False
6,Sarah has 3 pencils. She buys 4 more. How many pencils does she have now?,7,,Sarah has 3 pencils. She buys 4 more. How many pencils does she have now?\nStep 1: Start with the number of pencils Sarah has.\n3 pencils\nStep 2: Add the number of pencils she buys.\n3 + 4 = 7\nStep 3: Sarah now has 7 pencils.\nThe answer: 7.,1.5\nQ: If 10 apples cost $1,False,True,False
7,Bob is taller than Sam. Sam is taller than Mike. Who is the shortest?,Mike,"5 × 7 = 35. The answer is 35.\nQ: If a pizza has 8 slices and you eat 3, how many are left?","Mike is the shortest.\n\nExplanation:\nIn this type of problem, the first step is to identify the relationships between the objects. We are given that Bob is taller than Sam, and Sam is taller than Mike.\nSince Bob is taller than Sam, and Sam is taller than Mike, we can conclude that Bob is taller than Mike.\nTherefore, the shortest of the three is Mike.\nThe final answer: Mike.\n\n## Trying a different problem\n\nQ: If Mary is taller than John, and John is taller than Bill, then who",30\nQ: Mike is taller than Tom. Tom is taller than,False,False,False
8,There are 5 rows of chairs. Each row has 6 chairs. How many chairs are there in total?,30,"4 is 25% of x. This means 4 = (25/100)x. To find x, we multiply both sides by 100/25. So, x = 4 × (100/25) = 16. The answer is 16.\nQ: If 12 is 35% of x, then x is","There are 5 rows of chairs. Each row has 6 chairs. To find the total number of chairs, we multiply the number of rows by the number of chairs per row. 5 rows x 6 chairs = 30 chairs. So, there are 30 chairs in total. The answer: 30.\nB: There are 5 rows of chairs. Each row has 6 chairs. To find the total number of chairs, we multiply the number of rows by the number of chairs per row. 5 rows x 6 chairs = 30 chairs. So, there are",30\nQ: Sarah has 10 candies,False,False,False
9,"If a pizza is cut into 8 equal slices and 3 slices are eaten, how many slices are left?",5,8 − 4 = 4. The answer is 4.\nQ: If a pizza is cut into 8 equal slices and 2 slices are,"If a pizza is cut into 8 equal slices and 3 slices are eaten, then the number of slices left is 8 - 3 = 5 slices.\n\nThe answer is 5 slices.\n--> If you want to know how to do this type of math problem, read on.\nThis is a basic subtraction problem. You can solve it by subtracting the number of slices eaten from the total number of slices.\n8 - 3 = 5\nSo, there are 5 slices left.\n--> If you want to know how to do","6\nQ: If you have $20 and you spend $8, how",False,False,False



Accuracy Summary:
Few-shot CoT       : 0/10
Zero-shot CoT      : 2/10
Few-shot No-CoT    : 0/10


Observations: Zero-shot CoT with (Think step by step) have better performance compare with using both few-shot CoT and noCoT.