## Tree of Thoughts for problem solving with large language models

TLDR: This blog post is about using "Tree of Thoughts", a tree-based framework to solve the Game of 24 tasks with a large language model.

Tree of Thoughts (ToT) is a framework used by LLMs to solve complex reasoning problems. The intermediate steps in a reasoning process are split into "thoughts" as similar to Chain of Thought, but there are multiple thoughts generated per step, resulting in a tree-like structure. A search algorithm is implemented allowing ToT to explore among the thoughts.

### 1. Load Model

In this tutorial, we'll use two different LLMs: Mistral and GPT-4. 

We can use the Hugging face ```transformers``` library to generate text with our LLMs. First, we start off by importing the necessary libraries.

In [1]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM

We'll use the popular open-source language model, Mistral-7B. We can load the model and the tokenizer by:



In [4]:
model_id = "mistralai/Mistral-7B-v0.3"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Next, we create a function called ```mistral``` which we'll use to feed in our prompts and receive completions.

In [5]:
def mistral(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=100)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

mistral("Hi! My name is ")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'Hi! My name is 👋\n\nI’m a software engineer and a designer. I’m currently working at Google as a software engineer.\n\nI’m passionate about building products that people love. I’m also passionate about building products that are accessible to everyone.\n\nI’m a big fan of open source software. I’ve contributed to a few open source projects, including React Native, React, and Redux.\n\nI’m also a big fan of the'

Alternative, we can also use OpenAI's GPT-3.5/4 models. We can load the model by:

In [None]:
import os
import openai
from openai import OpenAI
from constants import OPENAI_API_KEY

api_key = os.getenv("OPENAI_API_KEY", OPENAI_API_KEY)

if api_key != "":
    openai.api_key = OPENAI_API_KEY
else:
    print("Warning: OPENAI_API_KEY is not set")

client = OpenAI(api_key=api_key)

In [None]:
def gpt(prompt, model="gpt-4", temperature=0.7, max_tokens=1000, n=1, stop=None) -> list:
    
    messages = [{"role": "user", "content": prompt}]
    outputs = []

    response = client.chat.completions.create(model=model, messages=messages, temperature=temperature, max_tokens=max_tokens, n=n, stop=stop)
        
    for choice in response.choices:
        outputs.extend([choice.message.content])

    return outputs 

### 2. Implementing Tree of Thoughts (ToT) 

ToT can be broken down into 4 key steps as described below. 

(Step 0): Thought Decomposition

(Step 1): Thought Generation
- In this step, the LLM is prompted to generate thoughts by either one of two ways:
    - Sample: The thoughts are generated by sampling i.i.d thoughts from a Chain of Thought prompt.
    - Propose: The thoughts are propsed sequentially depending on the previous prompts. 

(Step 2): Thought Evaluation
- The LLMs are prompted to evaluate the thoughts generated in the previous step, by either: 
    - Value: The thoughts are assigned a score individually. 
    - Vote: All of thoughts are evaluated together and assigned a score.

(Step 3): Search Algorithm
- The search algorithm is used to explore the thoughts generated in the previous steps:
    - Breadth first search
    - Depth first search


In this tutorial, we'll be using ToT with Mistral-7B-v0.3 to solve the Game of 24.

The Game of 24 is a task where given a sequence of 4 numbers, we’ll need to find the correct mathematical operations (add, subtract, multiply, divide) that’ll lead to the number 24. For example, if the sequence is {4, 9, 10, 13}, the correct operations using the 4 numbers are: (10 - 4) * (13 - 9) = 24. Each number in the sequence can only be used once.

In this tutorial, we'll be using 'Propose' for Thought Generation, 'Value' for Thought Evaluation and 'Breadth-first search' for the search algorithm.

#### 2.0 Thought Decomposition

We define the prompts necessary for the ToT framework. Each prompt will be used in different stages of the ToT process. The propose_prompt is meant to guide the LLM to come up with possible next steps, given a certain point in the problem.

The value_prompt assigns a classification (sure/likely/impossible) to each thought, depending on how likely it is to reach the number 24 given the current sequence of thoughts.

In [6]:
# Prompts

# 5-shot
standard_prompt = '''Use numbers and basic arithmetic operations (+ - * /) to obtain 24.
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) = 24
Input: 2 9 10 12
Answer: 2 * 12 * (10 - 9) = 24
Input: 4 9 10 13
Answer: (13 - 9) * (10 - 4) = 24
Input: 1 4 8 8
Answer: (8 / 4 + 1) * 8 = 24
Input: 5 5 5 9
Answer: 5 + 5 + 5 + 9 = 24
Input: {input}
'''


# PROMPTS FOR THOUGHT GENERATION
cot_prompt = '''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Each step, you are only allowed to choose two of the remaining numbers to obtain a new number.
Input: 4 4 6 8
Steps:
4 + 8 = 12 (left: 4 6 12)
6 - 4 = 2 (left: 2 12)
2 * 12 = 24 (left: 24)
Answer: (6 - 4) * (4 + 8) = 24
Input: 2 9 10 12
Steps:
12 * 2 = 24 (left: 9 10 24)
10 - 9 = 1 (left: 1 24)
24 * 1 = 24 (left: 24)
Answer: (12 * 2) * (10 - 9) = 24
Input: 4 9 10 13
Steps:
13 - 10 = 3 (left: 3 4 9)
9 - 3 = 6 (left: 4 6)
4 * 6 = 24 (left: 24)
Answer: 4 * (9 - (13 - 10)) = 24
Input: 1 4 8 8
Steps:
8 / 4 = 2 (left: 1 2 8)
1 + 2 = 3 (left: 3 8)
3 * 8 = 24 (left: 24)
Answer: (1 + 8 / 4) * 8 = 24
Input: 5 5 5 9
Steps:
5 + 5 = 10 (left: 5 9 10)
10 + 5 = 15 (left: 9 15)
15 + 9 = 24 (left: 24)
Answer: ((5 + 5) + 5) + 9 = 24
Input: {input}
'''

generate_prompt = '''Input: 2 8 8 14
Possible next steps:
2 + 8 = 10 (left: 8 10 14)
8 / 2 = 4 (left: 4 8 14)
14 + 2 = 16 (left: 8 8 16)
2 * 8 = 16 (left: 8 14 16)
8 - 2 = 6 (left: 6 8 14)
14 - 8 = 6 (left: 2 6 8)
14 /  2 = 7 (left: 7 8 8)
14 - 2 = 12 (left: 8 8 12)
Input: {input}
Possible next steps:
'''

# PROMPTS FOR THOUGHT EVALUATION

value_prompt = '''Evaluate if given numbers can reach 24 (sure/likely/impossible)
10 14
10 + 14 = 24
sure
11 12
11 + 12 = 23
12 - 11 = 1
11 * 12 = 132
11 / 12 = 0.91
impossible
4 4 10
4 + 4 + 10 = 8 + 10 = 18
4 * 10 - 4 = 40 - 4 = 36
(10 - 4) * 4 = 6 * 4 = 24
sure
4 9 11
9 + 11 + 4 = 20 + 4 = 24
sure
5 7 8
5 + 7 + 8 = 12 + 8 = 20
(8 - 5) * 7 = 3 * 7 = 21
I cannot obtain 24 now, but numbers are within a reasonable range
likely
5 6 6
5 + 6 + 6 = 17
(6 - 5) * 6 = 1 * 6 = 6
I cannot obtain 24 now, but numbers are within a reasonable range
likely
10 10 11
10 + 10 + 11 = 31
(11 - 10) * 10 = 10
10 10 10 are all too big
impossible
1 3 3
1 * 3 * 3 = 9
(1 + 3) * 3 = 12
1 3 3 are all too small
impossible
{input}
'''

value_last_step_prompt = '''Use numbers and basic arithmetic operations (+ - * /) to obtain 24. Given an input and an answer, give a judgement (sure/impossible) if the answer is correct, i.e. it uses each input exactly once and no other numbers, and reach 24.
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) = 24
Judge: 
sure
Input: 2 9 10 12
Answer: 2 * 12 * (10 - 9) = 24
Judge: 
sure
Input: 4 9 10 13
Answer: (13 - 9) * (10 - 4) = 24
Judge: 
sure
Input: 4 4 6 8
Answer: (4 + 8) * (6 - 4) + 1 = 25
Judge: 
impossible
Input: 2 9 10 12
Answer: 2 * (12 - 10) = 24
Judge: 
impossible
Input: 4 9 10 13
Answer: (13 - 4) * (10 - 9) = 24
Judge: 
impossible
Input: {input}
Answer: {answer}
Judge:'''

Next, we'll start implementing our ToT algorithm. We'll define a function for each core part of the ToT algorithm.



#### 2.1 Thought Generation

First, we'll define functions necessary for "Thought Generation".


In [7]:
def get_current_numbers(y: str) -> str:

    last_line = y.strip().split('\n')[-1]
    return last_line.split('left: ')[-1].split(')')[0]

def prepare_generate_prompt(current_numbers, thought, data):
    
    if current_numbers == '24':
        prompt = cot_prompt.format(input=data) + 'Steps: ' + thought
    else:
        prompt = generate_prompt.format(input=current_numbers)

    return prompt


def generate_thoughts(data, thoughts):

    new_thoughts = []
    
    for thought in thoughts:

        # Prepare prompt
        current_numbers = get_current_numbers(thought if thought else data)
        prompt = prepare_generate_prompt(current_numbers, thought, data)
        
        # Generate thoughts with prompt
        proposals = mistral(prompt).split('\n')
        #proposals = gpt(prompt, n=1, stop=None)[0].split('\n')
        new_thoughts.extend([thought + _ + '\n' for _ in proposals])

    return new_thoughts

#### 2.2 Thought Evaluation

Next, we'll create the functions necessary for "Thought Evaluation", where each of the thoughts are evaluated by the LLM.

First, we create the ```prepare_evaluate_prompt``` function which turns our current thought into an evaluation prompt by using the ```value_prompt``` from above.

In [8]:
def prepare_evaluate_prompt(data: str, thought: str) -> str:
    last_line = thought.strip().split('\n')[-1]
    if 'left: ' not in last_line:
        ans = last_line.lower().replace('answer: ', '')
        return value_last_step_prompt.format(input=data, answer=ans) 
    current_numbers = get_current_numbers(thought)
    return value_prompt.format(input=current_numbers)

We then create the ```evaluate_outputs_unwrap``` which converts the values assigned to each thought into a list of integers.

In [9]:
def evaluate_outputs_unwrap(thought: str, evaluate_outputs: list) -> float:
    if len(thought.strip().split('\n')) == 4 and 'answer' not in thought.lower():
        return 0
    value_names = [_.split('\n')[-1] for _ in evaluate_outputs]
    value_map = {'impossible': 0.001, 'likely': 1, 'sure': 20}
    score = sum(value * value_names.count(name) for name, value in value_map.items())
    return score

And, finally we wrap the above functions into the ```evaluate``` function.

In [17]:
def evaluate_thoughts(data, thoughts, n_evaluate_sample):
    scores = []
    for thought in thoughts:
        evaluate_prompt = prepare_evaluate_prompt(data, thought)
        #evaluate_outputs = gpt(evaluate_prompt, n=n_evaluate_sample, stop=None)
        evaluate_outputs = mistral(evaluate_prompt)
        score = evaluate_outputs_unwrap(thought, evaluate_outputs)
        scores.append(score)
    return scores

#### 2.3 Search algorithm

Finally, we'll implement the "Search Algorithm" which will be used to search through the thoughts generated by the LLM.

In [11]:
def search_algorithm(new_thoughts, ids, scores, n_select_sample):
    selected_ids = sorted(ids, key=lambda x: scores[x], reverse=True)[:n_select_sample] # Take top n_select_sample from list based on scores
    select_new_thoughts = [new_thoughts[select_id] for select_id in selected_ids]  
    return select_new_thoughts

### 3. Run ToT with sample data

We'll test our implementation with some sample data i.e the sequence 4 5 6 10. We'll comebine the functions from above. If ToT works sucessfully, it should output the operations that can be performed to reach 24.

In [12]:
data = '4 5 6 10'

In [18]:
from functools import partial


def solve(model, temperature, n_evaluate_sample, n_select_sample):

    #global gpt
    #gpt = partial(gpt, model=model, temperature=temperature)
    
    thoughts = ['']
    data = '4 5 6 10'

    steps = 1

    for step in range(steps):

        print('Step Number ::', step)
        print('(Step 0) Current Thoughts: ', thoughts)

        # Step 1: Thought Generation
        new_thoughts = generate_thoughts(data, thoughts)
        ids = list(range(len(new_thoughts)))

        print('(Step 1) New Thoughts: ', new_thoughts)
        print('(Step 1) ids ', ids)

        # Step 2: Thought Evaluation
        scores = evaluate_thoughts(data, new_thoughts, n_evaluate_sample)
        print('(Step 2) Scores: ', scores)

        # Step 3: Search algorithm
        
        selected_new_thoughts = search_algorithm(new_thoughts, ids, scores, n_select_sample) 
        print('(Step 3) Selected new thoughts: ', selected_new_thoughts)

        thoughts = selected_new_thoughts

        print('--------')

    return thoughts

In [19]:
solve(model='gpt-3.5-turbo', temperature=0.7, n_evaluate_sample=3, n_select_sample=5)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Step Number :: 0
(Step 0) Current Thoughts:  ['']


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


(Step 1) New Thoughts:  ['Input: 2 8 8 14\n', 'Possible next steps:\n', '2 + 8 = 10 (left: 8 10 14)\n', '8 / 2 = 4 (left: 4 8 14)\n', '14 + 2 = 16 (left: 8 8 16)\n', '2 * 8 = 16 (left: 8 14 16)\n', '8 - 2 = 6 (left: 6 8 14)\n', '14 - 8 = 6 (left: 2 6 8)\n', '14 /  2 = 7 (left: 7 8 8)\n', '14 - 2 = 12 (left: 8 8 12)\n', 'Input: 4 5 6 10\n', 'Possible next steps:\n', '4 + 5 = 9 (left: 5 6 10)\n', '5 / 4 = 1 (left: 1 5 6)\n', '6 + 5 = 11 (left: 5 6 11)\n', '6 - 5 = 1 (left: 1 6 11)\n', '10 + 4 = 14 (left: 4 6 14)\n', '4 * \n']
(Step 1) ids  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for o

(Step 2) Scores:  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
(Step 3) Selected new thoughts:  ['Input: 2 8 8 14\n', 'Possible next steps:\n', '2 + 8 = 10 (left: 8 10 14)\n', '8 / 2 = 4 (left: 4 8 14)\n', '14 + 2 = 16 (left: 8 8 16)\n']
--------


['Input: 2 8 8 14\n',
 'Possible next steps:\n',
 '2 + 8 = 10 (left: 8 10 14)\n',
 '8 / 2 = 4 (left: 4 8 14)\n',
 '14 + 2 = 16 (left: 8 8 16)\n']