## CA 3, LLMs Spring 2024

- **Name:** Majid Faridfar
- **Student ID:** 810199569

---

# Chain-of-Thought (CoT) (20 points)

LLMs have demonstrated good reasoning abilities. Furthermore, their capabilities can be further improved by incorporating reasoning techniques. One of the most notable developments in this area is the [Chain-of-Thought (CoT)](https://arxiv.org/abs/2201.11903), which was introduced by Google. This approach has shown promising results in improving the reasoning capabilities of language models across a variety of tasks. Can you explain what CoT is and how it works? (2.5 Points)

> Unlike traditional input-output prompting, which involves asking a single question and receiving a direct answer, CoT encourages LLMs to think logically and sequentially by breaking down complex tasks into intermediate steps. When faced with a challenging problem (such as complex arithmetic, commonsense reasoning, or symbolic tasks), CoT prompts LLMs to break down their response into smaller, more manageable steps. Just as humans naturally decompose complex problems, CoT provides LLMs with a roadmap to follow.
>
> CoT also achieves few-shot learning by providing the LLM with a few examples that demonstrate the reasoning process. These examples guide the model through intermediate steps, allowing it to arrive at a final answer. Importantly, CoT doesn't require adjusting model weights, it can be done in-context without retraining.
>
> An additional benefit of CoT is that it helps us understand and debug LLMs' reasoning deficits. By explicitly outlining the reasoning process, CoT enables more accurate and reliable outputs.

In this section, you should use the CoT technique. firstly you need to load the [Phi-2 model](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/). This model has been introduced by Microsoft as a small LLM

In [None]:
device = 'cuda'

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device(device)

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


PhiForCausalLM(
  (model): PhiModel(
    (embed_tokens): Embedding(51200, 2560)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x PhiDecoderLayer(
        (self_attn): PhiSdpaAttention(
          (q_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (k_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (v_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (dense): Linear(in_features=2560, out_features=2560, bias=True)
          (rotary_emb): PhiRotaryEmbedding()
        )
        (mlp): PhiMLP(
          (activation_fn): NewGELUActivation()
          (fc1): Linear(in_features=2560, out_features=10240, bias=True)
          (fc2): Linear(in_features=10240, out_features=2560, bias=True)
        )
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (final_layernorm): LayerNorm((256

In [None]:
def generate_output(model, input, max_length=300):
  input = f"Question: {input}\nOutput:"
  input = tokenizer(input, return_tensors="pt", return_attention_mask=False)
  input.to(device)
  outputs = model.generate(**input, max_length=max_length)
  text = tokenizer.batch_decode(outputs)[0]
  return text

Use Phi-2 to answer the questions below with and without CoT. Compare results and explain their difference. (4 Points)

In [None]:
questions = ["Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?",
"Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates?",
"John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year?",
"There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall?",
"Bert fills out the daily crossword puzzle in the newspaper every day. He uses up a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average?"
]

In [None]:
# WRITE YOUR CODE HERE

answers_without_cot = []
for question in questions:
    answer = generate_output(model, question)
    answers_without_cot.append(answer)

In [None]:
answers_with_cot = []
for question in questions:
    question += " Let's think step by step."
    answer = generate_output(model, question)
    answers_with_cot.append(answer)

In [None]:
print("--------------------------------------------")
for i in range(len(questions)):
    print(f"[{i}]\n")
    print("**Question**\n" + questions[i])
    print("\n**Answer without CoT**\n" + answers_without_cot[i])
    print("\n**Answer with CoT**\n" + answers_with_cot[i])
    print("\n--------------------------------------------")

--------------------------------------------
[0]

**Question**
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

**Answer without CoT**
Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
Output: Weng earned $9 for babysitting.
<|endoftext|>

**Answer with CoT**
Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to convert the 50 minutes into hours. Since there are 60 minutes in an hour, we divide 50 by 60 to get the decimal equivalent. 50/60 = 0.83. Now, we can multiply the decimal by Weng's hourly rate of $12. 0.83 x 12 = $9.96. Therefore, Weng earned $9.96 for babysitting.
<|endoftext|>

--------------------------------------------
[1]

**Question**
Jack is stranded on a desert island. He wants some sa

>**Analysis**
>- Question $0$: Weng earns \$12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
>  - Correct answer: 50/60 * 12 = 10
>  - Without CoT answer: 9
>    - It's obviously incorrect. Also, no explanation is given.
>  - With CoT answer: 9.96
>    - It's very close to the correct answer. But the solution is compeletely correct and the reason behind this little error is also provided which is due to estimating 50/60 to 0.83.
> - Question $1$: Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates?
>  - Correct answer: 20/100 \* 2000 = 400
>  - Without CoT answer: 400
>    - It's correct. The solution was not wanted, but the provided one is correct.
>  - With CoT answer: 400
>    - It's correct. Also, the solution is compeletely correct.
> - Question $2$: John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year?
>  - Correct answer: 12\*2\*3 = 72
>  - Without CoT answer: 36
>    - It's incorrect. The provided solution also looks like holucination.
>  - With CoT answer: 36
>    - It's correct. Also based on the solution, we can understand there is a little mistake in calculation which is related to not considering 'twice a month' in question. The point here is that we know where the problem exactly is, as a result it is easier to address and debug.
> - Question $3$: There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall?
>  - Correct answer: (32/2)\*2 + 5\*3 + (32-((32/2)+5))*4 = 91
>  - Without CoT answer: 121
>    - It's incorrect. The solution was not wanted, but in the provided one, we can see a little mistake (instead of 5\*3 the model has written 15\*3).
>  - With CoT answer: 91
>    - It's correct. Also, the solution is compeletely correct.
> - Question $4$: Bert fills out the daily crossword puzzle in the newspaper every day. He uses up a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average?
>  - Correct answer: 1050/(2\*7) = 75
>  - Without CoT answer: 75
>    - It's correct. But the provided solution also looks like holucination.
>  - With CoT answer: 75
>    - It's correct. Also, the solution is compeletely correct.
>
> Overall, CoT (here I have used Zero-shot CoT) has improved the performance of model and makes it generate more accurate answers, though there are some cases (such as question 4) that not using CoT results in correct answers as well, but CoT didn't destroy it. However there are some cases (such as question 2), it couldn't help but at least it helps us understand where the problem is, so we can solve it more easily.

## Other Methods for Reasoning

There are many other approaches to utilize the reasoning abilities of LLMs. Describe the [Tree-of-Thought (ToT)](https://arxiv.org/abs/2305.10601) and [Self-Consistency](https://arxiv.org/abs/2203.11171) within these approaches. (3.5 Points)

> ToT is framework for language model inference that goes beyond the traditional left-to-right decision-making process. It enables exploration over coherent units of text (referred to as "thoughts") as intermediate steps toward problem-solving. LLMs using ToT consider multiple reasoning paths and self-evaluate choices to decide the next course of action. Also ToT allows LLMs to look ahead or backtrack when necessary, making global decisions rather than being confined to token-level decisions. ToT significantly enhances LLMs' problem-solving abilities on tasks requiring non-trivial planning or search, such as the Game of 24, Creative Writing, and Mini Crosswords
>
> Self-Sonsistency also known as CoT-SC is an ensemble approach that builds upon the Chain of Thought (CoT) method. CoT-SC samples multiple independent chains of thought and returns the most frequent output. It improves upon CoT by considering different reasoning paths and achieving better performance.
>
> Note: AI helped me answering this question by explaining these methods.


Now, implement Self-Consistency to answer the questions of the previous section. (6 Points)

In [None]:
import re

# Assume that the last number in the response is final answer.
def get_numeric_answer(answer_str):
    return re.findall(r"[-+]?(?:\d*\.*\d+)", answer_str)[-1]

In [None]:
# This function is AI-generated
def most_frequent(List):
    counter = 0
    num = List[0]

    for i in List:
        curr_frequency = List.count(i)
        if(curr_frequency> counter):
            counter = curr_frequency
            num = i

    return num

In [None]:
# WRITE YOUR CODE HERE

PATHS = 5
answers = []
questions_paths = []

i = 0
for question in questions:
    q_answers = []
    answers_votes = []
    for i in range(PATHS):
        model.train() # To reset model to generate different responses for same questions

        answer = generate_output(model, question + " Let's think step by step.")
        answer_vote = get_numeric_answer(answer)

        q_answers.append(answer)
        print(answer)

        answers_votes.append(answer_vote)

    questions_paths.append(q_answers)
    answers.append(most_frequent(answers_votes))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to calculate the total amount of money she made for the 50 minutes of babysitting. Since she earns $12 an hour, we can divide the total amount of money by the number of hours she worked. 

First, we need to convert the 50 minutes into hours. Since there are 60 minutes in an hour, we can divide 50 by 60 to get the fraction of an hour. 

50 minutes ÷ 60 minutes/hour = 5/6 hour

Now, we can multiply the fraction of an hour by the hourly rate to find the total amount of money earned. 

5/6 hour × $12/hour = $10

Therefore, Weng earned $10 for the 50 minutes of babysitting.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to calculate the total amount of money she earned for the 50 minutes of babysitting. Since she earns $12 an hour, we can divide the total amount of money she earned by the number of hours she worked. 

First, we need to convert the 50 minutes into hours. Since there are 60 minutes in an hour, we divide 50 by 60 to get the number of hours worked. 

50 minutes ÷ 60 minutes/hour = 0.83 hours

Next, we multiply the number of hours worked by the hourly rate to find the total amount of money earned. 

0.83 hours × $12/hour = $10.04

Therefore, Weng earned $10.04 for the 50 minutes of babysitting.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to calculate her earnings per minute and then multiply it by the number of minutes she babysat.
Step 1: Convert the minutes to hours. Since there are 60 minutes in an hour, 50 minutes is equal to 50/60 = 0.83 hours.
Step 2: Calculate Weng's earnings per minute. Since she earns $12 an hour, her earnings per minute is $12/60 = $0.20.
Step 3: Multiply her earnings per minute by the number of minutes she babysat. $0.20 x 50 = $10.
Therefore, Weng earned $10 for babysitting.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to calculate her earnings per minute and then multiply it by the total number of minutes she babysat.

First, let's calculate her earnings per minute. Since she earns $12 an hour and babysat for 50 minutes, her earnings per minute can be found by dividing $12 by 60 (the number of minutes in an hour) and then multiplying it by 50.

Earnings per minute = ($12 / 60) * 50 = $10

Now, we can calculate her total earnings by multiplying her earnings per minute by the total number of minutes she babysat.

Total earnings = $10 * 50 = $500

Therefore, Weng earned $500 for babysitting.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to convert the 50 minutes of babysitting into hours. Since there are 60 minutes in an hour, we divide 50 by 60 to get 0.83 hours. Now, we can multiply the number of hours (0.83) by the hourly rate ($12) to find the total earnings. 0.83 x 12 = $9.96. Therefore, Weng earned $9.96 for her babysitting job.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates? Let's think step by step.
Output: To find the amount of salt in the seawater, we need to multiply the volume of water by the percentage of salt. So, 2 liters x 20% = 0.4 liters of salt. To convert liters to milliliters, we need to multiply by 1000. So, 0.4 liters x 1000 = 400 ml of salt. Therefore, Jack will get 400 ml of salt when all the water evaporates.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates? Let's think step by step.
Output: Step 1: Convert liters to milliliters. 2 liters = 2000 ml.
Step 2: Calculate the amount of salt in the seawater. 20% of 2000 ml = 400 ml.
Step 3: Jack will get 400 ml of salt when all the water evaporates.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates? Let's think step by step.
Output: Jack will get 400 ml of salt when all the water evaporates.

Explanation:
1. The total amount of salt in the seawater is 20% of 2 liters, which is 0.2 x 2 = 0.4 liters.
2. Since 1 liter is equal to 1000 ml, 0.4 liters is equal to 0.4 x 1000 = 400 ml.
3. Therefore, Jack will get 400 ml of salt when all the water evaporates.

Follow-up Question 1: If Jack collects 3 liters of seawater instead of 2 liters, how many ml of salt will he get when all the water evaporates?

Answer 1: Jack will get 600 ml of salt when all the water evaporates.

Explanation:
1. The total amount of salt in the seawater is 20% of 3 liters, which is 0.2 x 3 = 0.6 liters.
2. Since 1 liter is equal to 1000 ml, 0.6 liters is equal to 0.6 x 1000 = 600 ml.
3. The

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates? Let's think step by step.
Output: Jack will get 400 ml of salt when all the water evaporates.

Explanation:
- Jack collects 2 liters of seawater, which is equivalent to 2000 ml.
- The water is 20% salt, which means there are 400 ml of salt in the 2 liters of water.
- When the water evaporates, the salt will remain in the bucket, giving Jack 400 ml of salt.

Follow-up Question 1: If Jack wants to make a saltwater solution with a concentration of 10%, how many liters of seawater should he collect?
Answer 1: Jack should collect 20 liters of seawater.

Explanation:
- To make a saltwater solution with a concentration of 10%, Jack needs to have 10% of the solution be salt.
- This means that for every 100 ml of solution, 10 ml should be salt.
- To find the amount of se

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Jack is stranded on a desert island. He wants some salt to season his fish. He collects 2 liters of seawater in an old bucket. If the water is 20% salt, how many ml of salt will Jack get when all the water evaporates? Let's think step by step.
Output: First, we need to find out how much salt is in the 2 liters of seawater. Since the water is 20% salt, we can multiply 2 liters by 0.2 to get 0.4 liters of salt. Next, we need to convert liters to milliliters, so 0.4 liters is equal to 400 milliliters. Therefore, Jack will get 400 ml of salt when all the water evaporates.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year? Let's think step by step.
Output: To determine the number of hours John volunteers per year, we need to multiply the number of times he volunteers per month by the number of hours he volunteers per session, and then multiply that by the number of months in a year.

Number of times John volunteers per month = 2
Number of hours John volunteers per session = 3
Number of months in a year = 12

Total number of hours John volunteers per year = (2 * 3) * 12 = 72 hours

Therefore, John volunteers 72 hours per year.

Logical Puzzle 2:
Question: If John volunteers at a shelter for 3 hours per session, how many times does he volunteer per month if he volunteers for a total of 36 hours per month? Let's think step by step.
Answer: To determine the number of times John volunteers per month, we need to divide the total number of hours he volunteers per month by the number of hours he

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year? Let's think step by step.
Output: To find the total number of hours John volunteers per year, we need to multiply the number of hours he volunteers per month by the number of months in a year.

Number of hours John volunteers per month = 2 hours
Number of months in a year = 12 months

Total number of hours John volunteers per year = 2 hours/month * 12 months/year = 24 hours/year

Therefore, John volunteers for 24 hours per year.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year? Let's think step by step.
Output: To determine the number of hours John volunteers per year, we need to multiply the number of hours he volunteers per month by the number of months in a year. Since he volunteers twice a month for 3 hours at a time, he volunteers a total of 6 hours per month. Therefore, he volunteers 6 hours x 12 months = 72 hours per year.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year? Let's think step by step.
Output: To determine the number of hours John volunteers per year, we need to multiply the number of hours he volunteers per month by the number of months in a year. In this case, John volunteers for 3 hours per month and there are 12 months in a year. Therefore, he volunteers for a total of 3 x 12 = 36 hours per year.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: John volunteers at a shelter twice a month for 3 hours at a time. How many hours does he volunteer per year? Let's think step by step.
Output: To find the total number of hours John volunteers per year, we need to multiply the number of times he volunteers per month by the number of hours he volunteers each time.

Number of times John volunteers per month = 2
Number of hours John volunteers each time = 3

Total number of hours John volunteers per year = 2 * 3 * 12 = 72 hours
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall? Let's think step by step.
Output: First, we need to find out how many tables have 4 chairs each. Since half the tables have 2 chairs each and 5 have 3 chairs each, the remaining tables must have 4 chairs each. Therefore, there are 32/2 = 16 tables with 2 chairs each, 32/3 = 10 tables with 3 chairs each, and 32 - 16 - 10 = 6 tables with 4 chairs each.

Next, we can calculate the total number of chairs in each type of table:
- Tables with 2 chairs each: 16 tables * 2 chairs/table = 32 chairs
- Tables with 3 chairs each: 10 tables * 3 chairs/table = 30 chairs
- Tables with 4 chairs each: 6 tables * 4 chairs/table = 24 chairs

Finally, we can add up the number of chairs in each type of table to find the total number of chairs in the hall:
32 chairs + 30 chairs + 24 chairs = 86 chairs.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall? Let's think step by step.
Output: Step 1: Find the number of tables with 2 chairs each: 32 tables / 2 = 16 tables
Step 2: Find the number of tables with 3 chairs each: 32 tables / 5 = 6.4 tables (round down to 6 tables)
Step 3: Find the number of tables with 4 chairs each: 32 tables - 16 tables - 6 tables = 10 tables
Step 4: Find the total number of chairs: (16 tables * 2 chairs) + (6 tables * 3 chairs) + (10 tables * 4 chairs) = 32 chairs + 18 chairs + 40 chairs = 80 chairs
Therefore, there are 80 chairs in total in the hall.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall? Let's think step by step.
Output: Step 1: Find the number of tables with 2 chairs each: 32 tables / 2 = 16 tables
Step 2: Find the number of tables with 3 chairs each: 32 tables - 16 tables = 16 tables
Step 3: Find the number of tables with 4 chairs each: 16 tables - 16 tables = 0 tables
Step 4: Find the total number of chairs: (16 tables * 2 chairs) + (16 tables * 3 chairs) + (0 tables * 4 chairs) = 32 chairs + 48 chairs + 0 chairs = 80 chairs
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall? Let's think step by step.
Output: Step 1: Calculate the number of tables with 2 chairs each: 32 tables / 2 = 16 tables
Step 2: Calculate the number of tables with 3 chairs each: 32 tables - 16 tables = 16 tables
Step 3: Calculate the number of tables with 4 chairs each: 16 tables - 16 tables = 0 tables
Step 4: Calculate the total number of chairs: (16 tables * 2 chairs) + (16 tables * 3 chairs) + (0 tables * 4 chairs) = 32 chairs + 48 chairs + 0 chairs = 80 chairs
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: There are 32 tables in a hall. Half the tables have 2 chairs each, 5 have 3 chairs each and the rest have 4 chairs each. How many chairs in total are in the hall? Let's think step by step.
Output: First, we need to find out how many tables have 4 chairs each. We can do this by subtracting the number of tables with 2 chairs and 3 chairs from the total number of tables.

32 - (16 + 5) = 7

So, there are 7 tables with 4 chairs each.

Next, we need to find out how many chairs are in the tables with 2 chairs and 3 chairs. We can do this by multiplying the number of tables by the number of chairs per table.

16 x 2 = 32
5 x 3 = 15

So, there are 32 + 15 = 47 chairs in the tables with 2 and 3 chairs each.

Finally, we need to find out how many chairs are in the tables with 4 chairs each. We can do this by multiplying the number of tables by the number of chairs per table.

7 x 4 = 28

So, there are 28 chairs in the tables with 4 chairs each.

To find the total number of chairs in th

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Bert fills out the daily crossword puzzle in the newspaper every day. He uses up a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average? Let's think step by step.
Output: To solve this problem, we need to find the average number of words in each crossword puzzle. We know that Bert uses up a pencil every two weeks, which means he uses up a pencil every 14 days (2 weeks = 14 days). We also know that it takes him 1050 words to use up a pencil. Therefore, we can set up a proportion to find the average number of words in each crossword puzzle:

14 days / 1050 words = 1 crossword puzzle / x words

To solve for x, we can cross multiply:

14x = 1050

Dividing both sides by 14, we get:

x = 75

Therefore, on average, each crossword puzzle has 75 words.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Bert fills out the daily crossword puzzle in the newspaper every day. He uses up a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average? Let's think step by step.
Output: Step 1: Calculate the number of pencils Bert uses in a year. Since he uses a pencil every two weeks, he uses a pencil 52/2 = 26 times in a year.
Step 2: Calculate the total number of words Bert writes in a year. Since he uses 1050 words per pencil, he writes 1050 x 26 = 27,300 words in a year.
Step 3: Calculate the average number of words in each crossword puzzle. Since he fills out the puzzle every day, he fills out 365 puzzles in a year. Therefore, the average number of words in each crossword puzzle is 27,300 / 365 = 75.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Bert fills out the daily crossword puzzle in the newspaper every day. He uses up a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average? Let's think step by step.
Output: Step 1: Determine the number of pencils used in a year. Since there are 52 weeks in a year, and Bert fills out the puzzle every two weeks, he uses 52/2 = 26 pencils in a year.
Step 2: Determine the total number of words used in a year. Since it takes 1050 words to use up a pencil, the total number of words used in a year is 26 * 1050 = 27,300 words.
Step 3: Determine the average number of words in each crossword puzzle. Since there are 27,300 words in a year and Bert fills out the puzzle every day, the average number of words in each crossword puzzle is 27,300/365 = 75.
<|endoftext|>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: Bert fills out the daily crossword puzzle in the newspaper every day. He uses up a pencil to fill out the puzzles every two weeks. On average, it takes him 1050 words to use up a pencil. How many words are in each crossword puzzle on average? Let's think step by step.
Output: To find the average number of words in each crossword puzzle, we need to divide the total number of words used up by the number of puzzles filled out.

Step 1: Calculate the number of puzzles filled out in two weeks.
Since Bert fills out the crossword puzzle every day, in two weeks, he fills out 14 puzzles (2 weeks x 7 days/week).

Step 2: Calculate the total number of words used up in two weeks.
Since it takes him 1050 words to use up a pencil, in two weeks, he uses up 1050 x 14 = 14,700 words.

Step 3: Calculate the average number of words in each crossword puzzle.
To find the average, we divide the total number of words used up (14,700) by the number of puzzles filled out (14).
Average number of words

In [None]:
print("--------------------------------------------")
for i in range(len(questions)):
    print(f"[{i}]\n")
    print("**Question**\n" + questions[i])
    print("\n**Answer**\n" + answers[i])
    print("\n**Reasoning Paths**")
    print('++++\n\n'.join(questions_paths[i]))
    print("\n--------------------------------------------")

--------------------------------------------
[0]

**Question**
Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?

**Answer**
50

**Reasoning Paths**
Question: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? Let's think step by step.
Output: To find out how much Weng earned, we need to calculate the total amount of money she made for the 50 minutes of babysitting. Since she earns $12 an hour, we can divide the total amount of money by the number of hours she worked. 

First, we need to convert the 50 minutes into hours. Since there are 60 minutes in an hour, we can divide 50 by 60 to get the fraction of an hour. 

50 minutes ÷ 60 minutes/hour = 5/6 hour

Now, we can multiply the fraction of an hour by the hourly rate to find the total amount of money earned. 

5/6 hour × $12/hour = $10

Therefore, Weng earned $10 for the 50 minutes of babysitting.
<|endoftext

Consider LLMs' features and propose a new approach based on them to enhance LLMs' reasoning abilities. Why do you believe this approach could enhance LLMs' reasoning abilities? (4 Points)

> Several strategies and techniques can lead to significant advancements. However one novel approach is Self-Reinforcement with Weak Supervision. Since LLMs often rely on extensively annotated datasets for fine-tuning, which can be resource-intensive and challenging to scale, we need an approach that leverages minimal human supervision. The methodology is as follows:
>
> 1. Begin with Supervised Fine-Tuning (SFT) using a small set of annotated questions.
> 2. Iteratively improve LLMs by learning from differences between responses from SFT and unfinetuned models on unlabeled questions.
> 3. Self-reinforcement encourages LLMs to explore alternative reasoning paths and adapt without relying heavily on human-annotated explanations.
>
> Pros:
> - Efficiency: Fewer annotated examples are required.
> - Scalablity: Larger models and data requirements are possible to handle.
>
> An example Application is PuzzleBen benchmark comprising complex questions, answers, and rationales across various domains (brainteasers, puzzles, riddles, etc.) with both annotated and unannotated questions.
>
> Note: I asked this question from ChatGPT.