# Chain-of-Thought (CoT)

LLMs can be useful for a variety of tasks. Sometimes though, they need a little nudging in the right direction to get the desired outcome. Broadly, reasoning refers to a collection of techniques that can perform this nudge. In essence, we want to make the model "think" through things, break a task down into individual steps or explain its own reasoning.

Note: When we say reason or think in this context, we are not referring to the kind of thinking or reasoning that we, as humans, are capable of. This [blog](https://cacm.acm.org/blogs/blog-cacm/276268-can-llms-really-reason-and-plan/fulltext) encapsulates the difference between these two ideas.

In [2]:
# Load environment variables
from dotenv import load_dotenv

load_dotenv("../../.env")

True

In [3]:
from tools import llm_call

## A Failure Case

The typical CoT use case is with reasoning or numerical tasks like the example below. We offer a fairly simple task. Add up the number in a list and decide if the preceding statement is true or false.

Note: OpenAI updates GPT fairly regularly so these examples go out of date sometimes. So you may not get the same answer when you try this out later.

In [4]:
prompt = """
Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
"""

print("----------PROMPT----------")
print(prompt)
print("---------RESPONSE---------")
result = llm_call(model="gpt-4", prompt=prompt, parameters={"temperature": 0.0})
print(result)

----------PROMPT----------

Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]

---------RESPONSE---------
True


In [5]:
print(sum([0, 1, 2, 42, 37, 59, 12]))

153


Adding the same list up, we see that the answer should actually be false! Now let's try this with Chain-of-Thought.

## Zero-Shot CoT

Like Zero-Shot and Few-Shot prompting, we can also perform CoT in a similar manner. For the Zero-Shot case, it's simply a matter of appending a phrase like "Let's think step by step" to the end of the prompt.

In [6]:
prompt = """
Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
Let's think step by step.
"""

print("----------PROMPT----------")
print(prompt)
print("---------RESPONSE---------")
result = llm_call(model="gpt-4", prompt=prompt, parameters={"temperature": 0.0})
print(result)

----------PROMPT----------

Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
Let's think step by step.

---------RESPONSE---------
First, let's add up the numbers: 0 + 1 + 2 + 42 + 37 + 59 + 12 = 153

153 is an odd number, not an even number.

So, the statement is False.


With a slight change to the prompt and no changes to the model or parameters, we get the right answer. By simply "asking" the LLM to think through this step by step, we get the right answer. 

## Few-Shot CoT

In other, more complex scenarios, the LLM might not be able to get the right answer even with zero-shot CoT. In such cases, examples are really helpful. Few-Shot CoT is pretty straightforward. We do the same as above, except we inject a few examples for the model to learn from.

In [7]:
prompt = """
Answer with True or False:
The numbers in the following list add up to an even number: [21, 8, 11, 2]
Let's think step by step.
The sum of all numbers in the list is 42. 42 is even. The statement is True.

Answer with True or False:
The numbers in the following list add up to an odd number: [0, 3, 2, 3, 4, 9, 1]
Let's think step by step.
The sum of all numbers in the list is 22. 22 is even. The statement is False.

Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
Let's think step by step.
"""

print("----------PROMPT----------")
print(prompt)
print("---------RESPONSE---------")
result = llm_call(model="gpt-4", prompt=prompt, parameters={"temperature": 0.0})
print(result)

----------PROMPT----------

Answer with True or False:
The numbers in the following list add up to an even number: [21, 8, 11, 2]
Let's think step by step.
The sum of all numbers in the list is 42. 42 is even. The statement is True.

Answer with True or False:
The numbers in the following list add up to an odd number: [0, 3, 2, 3, 4, 9, 1]
Let's think step by step.
The sum of all numbers in the list is 22. 22 is even. The statement is False.

Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
Let's think step by step.

---------RESPONSE---------
The sum of all numbers in the list is 153. 153 is odd. The statement is False.


Here we see that the model learns how to solve the problem from the example and also gives us answers using the same style of reasoning.

# Self-Consistency

So from the above examples we know that the LLM can at times give us the wrong answer. But we can't always verify the answers manually. One interesting approach to getting the right answer is to simply ask the LLM the same question multiple times. More often than not, the LLM ends up giving the right answer the majority of the time. In this scenario, we could then just take the most popular answer as the right answer. When combined with CoT techniques from above and tools that we'll discuss over the next two days, a more robust system can be established.

Now there is one obvious caveat here - this is expensive. We're increasing our costs N-fold where N is the number of times we query the model for a given prompt. Thus while self-consistency works to some extent, it is not very scalable.

In [21]:
prompt = """
Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
Answer: 
"""

print("----------PROMPT----------")
print(prompt)
print("---------RESPONSE---------")
result = [llm_call(model="gpt-4", prompt=prompt, parameters={"temperature": 1.0}) for i in range(5)]
print(result)

----------PROMPT----------

Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
Answer: 

---------RESPONSE---------
['True', 'False', 'False', 'True', 'False']


We see that we got the right answer (False) more than the wrong answer. Thus if we took the most frequent response here, we would get the answer we needed. 

Note: We've increased the temperature of the model from 0 to 1 here as setting a temperature of 0 makes the model more likely (no guarantees with GPT-4!) to give the same answer on each run. The temperature parameter is something we would recommend tuning based on your use case.

# Least-to-Most Prompting

Least-to-Most prompting is a way to get the model to break down tasks into a set of subtasks. Then each subtask is solved in order with the prompt and solution for the previous subtask being fed into the model at each step. In essence, we want to solve the smallest individual problem and slowly build our way to the final answer. The example below demonstrates this two step process.

In [18]:
problem = """Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]"""

prompt = f"""
What subproblems must be solved before solving the following task? Each subproblem must be a new task.
```
{problem}
```
"""

print("----------PROMPT----------")
print(prompt)
print("---------RESPONSE---------")
result = llm_call(model="gpt-4", prompt=prompt, parameters={"temperature": 0.0})
print(result)

----------PROMPT----------

What subproblems must be solved before solving the following task? Each subproblem must be a new task.
```
Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
```

---------RESPONSE---------
1. Identify the numbers in the list.
2. Add all the numbers in the list together.
3. Determine if the sum is an even number.
4. Formulate a response in a True or False format.


Now we can take each of these tasks and solve them one by one.

In [34]:
new_prompt = f"""
Given the following context and a list of tasks, solve any tasks that do not already have a solution.
Context:
```
{problem}
```
Tasks:
"""

tasks = result.split("\n")

for task in tasks:
    new_prompt += "\n" + task
    print("----------PROMPT----------")
    print(new_prompt)
    print("---------RESPONSE---------")
    result = llm_call(model="gpt-4", prompt=new_prompt, parameters={"temperature": 0.0})
    print(result)
    new_prompt += "\n" + result
    print("###" * 30)

----------PROMPT----------

Given the following context and a list of tasks, solve any tasks that do not already have a solution.
Context:
```
Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
```
Tasks:

1. Identify the numbers in the list.
---------RESPONSE---------
Solution: The numbers in the list are 0, 1, 2, 42, 37, 59, 12.
##########################################################################################
----------PROMPT----------

Given the following context and a list of tasks, solve any tasks that do not already have a solution.
Context:
```
Answer with True or False:
The numbers in the following list add up to an even number: [0, 1, 2, 42, 37, 59, 12]
```
Tasks:

1. Identify the numbers in the list.
Solution: The numbers in the list are 0, 1, 2, 42, 37, 59, 12.
2. Add all the numbers in the list together.
---------RESPONSE---------
Solution: The sum of all the numbers in the list is 153.
#################

While this took quite a few steps and quite a few calls to the model, we've managed to break down the problem into its constituent steps and then solve each step one at a time.

## Taking Reasoning Further

This notebook has shown you some popular approaches towards reasoning. Tomorrow, we'll talk about tools and agents which can be used to extend the functionality of LLMs. When used in conjunction with reasoning techniques, we obtain powerful frameworks like ReAct and Plan-and-Solve which can reason on a problem, determine next steps, use tools such as Google searches or Python CLIs, and more!

You can read more about this and other concepts in the [Georgian AI Library](https://github.com/georgian-io/GAL/blob/main/Prompting%20Tools%20%26%20Techniques/reasoning.md).