# ReasoningAgent - Advanced LLM Reasoning with Multiple Search Strategies

## Introduction

The `ReasoningAgent` is designed to enhance language models' reasoning capabilities through systematic exploration of thought processes. By implementing the Tree of Thoughts (ToT) framework, it enables LLMs like GPT-4 and Llama to break down complex problems into manageable steps and explore multiple solution paths simultaneously.

This notebook demonstrates the key features and capabilities of the `ReasoningAgent`, showing how it can effectively reason about problems even when using smaller models like `gpt-4o-mini`.

## Search Strategies

The `ReasoningAgent` supports multiple search strategies for exploring the reasoning space:

### 1. Beam Search (Default)
- Maintains the top `k` most promising paths at each step
- Efficient for problems with clear evaluation criteria
- Configurable beam width to balance exploration vs computation
- Special case: DFS mode (beam size = 1) for linear reasoning similar to Chain-of-Thought

### 2. Monte Carlo Tree Search (MCTS)
- Balances exploration and exploitation using UCT formula
- Particularly effective for problems with delayed rewards
- Stochastic exploration helps avoid local optima
- Configurable number of simulations and exploration constant

### 3. Language Agent Tree Search (LATS)
- Provides immediate reflection feedback before the next simulation
- Helps identify poor reasoning paths early for future improvement
- Especially useful for complex multi-step reasoning

## Core Components

1. **Thinker Agent**: Generates potential next steps in the reasoning process
2. **Grader Agent**: Evaluates the quality of each reasoning step
3. **Code Execution**: a child user agent will execute code automatically during reasoning
4. **Tree Structure**: Organizes thoughts hierarchically for systematic exploration
5. **Visualization Tools**: Built-in Graphviz support for analyzing reasoning paths
6. **Logging Features**: Log and save thinking trajectories to finetune the language model
7. **Configuration Options**: The agent is highly configurable through a single `reason_config` dictionary

In [1]:
import os
import random

from autogen import AssistantAgent, ReasoningAgent, ThinkNode, UserProxyAgent

api_key = os.environ.get("OPENAI_API_KEY")

config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]

question = "What is the expected maximum dice value if you can roll a 6-sided dice three times?"
random.seed(1)  # setup seed for reproducibility

In [2]:
def last_meaningful_msg(sender, recipient, summary_args):
    import warnings

    if sender == recipient:
        return "TERMINATE"

    summary = ""
    chat_messages = recipient.chat_messages[sender]

    for msg in reversed(chat_messages):
        try:
            content = msg["content"]
            if isinstance(content, str):
                summary = content.replace("TERMINATE", "")
            elif isinstance(content, list):
                # Remove the `TERMINATE` word in the content list.
                summary = "\n".join(
                    x["text"].replace("TERMINATE", "") for x in content if isinstance(x, dict) and "text" in x
                )
            if summary.strip().rstrip():
                return summary
        except (IndexError, AttributeError) as e:
            warnings.warn(f"Cannot extract summary using last_msg: {e}. Using an empty str as summary.", UserWarning)
    return summary

## Chain-of-Thought Reasoning with DFS

The simplest form of tree-based reasoning uses depth-first search (DFS) to explore a single path, similar to OpenAI's O1 feature.
By setting `method="dfs"` in the reason_config, the agent will:
1. Generate one reasoning step at a time
2. Follow that single path until reaching a conclusion
3. Never explore alternative branches

Note: The effectiveness depends on the underlying model's training. Models not specifically trained for step-by-step reasoning
may show limited improvement with this approach.

In [3]:
reason_agent = ReasoningAgent(
    name="reason_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    reason_config={"method": "dfs", "max_depth": 3},  # Using DFS
    silent=False,
    # NOTE: it is equivalent to use beam size 1 for O1-style reasoning
    # reason_config={"method": "beam_search", "beam_size": 1, "max_depth": 3},
)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [4]:
ans = user_proxy.initiate_chat(reason_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to reason_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mreason_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to reason_agent):

REFLECTION:
The previous steps have not been clearly outlined, but the question indicates an inquiry about the expected maximum value from rolling a 6-sided die three times. A mistake could have been made if the previous steps failed to account for calculating the maximum value across multiple rolls rather than just the average or a single roll. The focus should be on understanding how to derive that expected maximum value correctly.

**Possible Options:**
Option 1: Calculate the probab

In [5]:
reason_agent._root.children

[Calculate the probability distribution for the maximum value of three rolls of a 6-sided die and derive the expected value. -> Depth: 1 Value: 0.8888888888888888 Visits: 0,
 Introduce the concept of order statistics and explain how the maximum of multiple independent rolls can be calculated. -> Depth: 1 Value: 0.7777777777777778 Visits: 0,
 Reassess the earlier calculations or assumptions, ensuring they correctly factor in the multiple rolls and the concept of maximizing results. -> Depth: 1 Value: 0.6666666666666666 Visits: 0,
 Simplify the problem by simulating the process of rolling a die three times and observing the values to estimate the expected maximum. -> Depth: 1 Value: 0.6666666666666666 Visits: 0,
 TERMINATE - The question can be resolved by applying the correct statistical methods from available knowledge. -> Depth: 1 Value: 0.3333333333333333 Visits: 0]

In [6]:
print(ans.summary)

To find the expected maximum value when rolling a 6-sided die three times, we can follow the outlined steps.

### Step 1: Probability Calculation
When we roll a 6-sided die, the possible outcomes for one roll are {1, 2, 3, 4, 5, 6}. If we denote \(X\) as the maximum value of three rolls, we can calculate the probability for each possible maximum value:

- **Maximum = 1**: This can only happen if all rolls are 1. The probability is \((1/6)^3 = 1/216\).
- **Maximum = 2**: This occurs if at least one roll is 2 and all are less than or equal to 2. The number of configurations where this happens can be calculated as:
  \[
  P(X \leq 2) = P(all \leq 2) = (2/6)^3 = 8/216
  \]
  Thus, the probability that the maximum is exactly 2 is \(P(X = 2) = P(X \leq 2) - P(X \leq 1) = \frac{8}{216} - \frac{1}{216} = \frac{7}{216}\).
  
- **Maximum = 3**: Similarly, this can be computed as follows:
  \[
  P(X \leq 3) = (3/6)^3 = 27/216
  \]
  From here, \(P(X = 3) = P(X \leq 3) - P(X \leq 2) = \frac{27}{21

## Beam Search in Tree of Thought

Beam Search is a powerful technique used in tree-based reasoning that allows the agent to explore multiple paths simultaneously. By setting `beam_size` greater than 1, the agent can maintain several candidate solutions at each step, evaluating them based on their potential to lead to the best final answer. This method is particularly effective when the solution space is large and complex, as it balances exploration and exploitation, ensuring that promising paths are prioritized while still considering alternative options.

In this approach, the agent generates multiple reasoning steps in parallel, allowing it to compare different trajectories and select the most promising ones for further exploration. This can lead to more robust and accurate conclusions, especially in scenarios where intermediate evaluations are critical to the final outcome.

In [7]:
reason_agent = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    reason_config={"method": "beam_search", "beam_size": 3, "max_depth": 3},
)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config={"use_docker": False},
    max_consecutive_auto_reply=10,
)

In [8]:
ans = user_proxy.initiate_chat(
    reason_agent,
    message="Design a mixed integer linear program for a coffee roasting supply chain",
    summary_method=last_meaningful_msg,
)

[33muser_proxy[0m (to reason_agent):

Design a mixed integer linear program for a coffee roasting supply chain

--------------------------------------------------------------------------------
[33mreason_agent[0m (to tot_thinker):

# Question:
Design a mixed integer linear program for a coffee roasting supply chain
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to reason_agent):

REFLECTION:
The previous steps did not provide any detail regarding the design elements or constraints necessary for a mixed integer linear program (MILP) tailored specifically for a coffee roasting supply chain. There is a lack of clarity on what aspects (e.g., production, transportation, inventory) the user wishes to include in the MILP. Thus, the next steps should focus on narrowing down the requirements and addressing any potential misunderstandings regarding the specifics of the MILP.

**Possible Option

In [9]:
print(ans.summary)

To design a mixed integer linear program (MILP) for a coffee roasting supply chain, we can follow a structured approach. Let’s create a detailed outline based on the possibilities you've provided, focusing on key components, decision variables, objective functions, and constraints.

### Step 1: Define Key Components
1. **Suppliers**: Different sources of coffee beans, each with a specific supply limit and cost per unit.
2. **Roasting Facilities**: Locations where coffee beans are roasted. Each facility may have a different roasting capacity and cost.
3. **Distribution Centers**: Warehouses that handle the transfer of roasted coffee to retail outlets. They have operational costs and capacity constraints.
4. **Retail Outlets**: The final customers or stores where roasted coffee is sold. Each outlet has a specific demand for coffee.

### Step 2: Identify Decision Variables
1. **x_ij**: Amount of coffee sourced from supplier i to roasting facility j.
2. **y_jk**: Amount of roasted coffee t

## MCTS
This section demonstrates how to use Monte Carlo Tree Search (MCTS) with ReasoningAgent for complex reasoning tasks. MCTS provides several advantages over beam search when:

1. Ground truth evaluation is available
2. LLM-based evaluation is expensive
3. You want to generate diverse, high-quality training data

In [10]:
mcts_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "mcts", "nsim": 5, "max_depth": 4},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [11]:
ans = user_proxy.initiate_chat(mcts_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not provide any context or specific previous actions taken. However, the user has posed a mathematical question related to probability involving a six-sided die. The user seeks to understand the expected maximum value from rolling a die three times, which suggests that a probabilistic approach should be used. It’s important to ensure the calculations are accurate and centered on the maximum value from the rolls.

**Possible Options:**
Option 1: Calculate the 

In [12]:
print(ans.summary)

The expected maximum value when rolling a 6-sided die three times can be calculated theoretically or through simulation. Here’s a summary of how to derive the expected maximum value:

### Theoretical Calculation

1. **Understand the Problem**: We want to find the expected maximum value, \( E[\text{max}] \), from rolling a 6-sided die three times.

2. **Define Probabilities**: For each maximum value \( k \) (from 1 to 6), the probability that the maximum is \( k \) can be calculated:
   \[
   P(\text{max} = k) = P(\text{max} \leq k) - P(\text{max} < k) = \left( \frac{k}{6} \right)^3 - \left( \frac{k-1}{6} \right)^3
   \]

3. **Calculate Each Probability**:
   - For \( k = 1 \):
     \[
     P(\text{max} = 1) = \left( \frac{1}{6} \right)^3 = \frac{1}{216} 
     \]
   - For \( k = 2 \):
     \[
     P(\text{max} = 2) = \left( \frac{2}{6} \right)^3 - \left( \frac{1}{6} \right)^3 = \frac{8}{216} - \frac{1}{216} = \frac{7}{216}
     \]
   - For \( k = 3 \):
     \[
     P(\text{max} = 3) = \

## LATS

It is important to note that our reasoning agent operates based on "process" and lacks direct access to the environment. In contrast, the LATS approach relies on feedback from the environment. To address this, we utilize our existing grader agent to generate pseudo-rewards and provide feedback. The major difference between our LATS implementation and our MCTS implementation is that the LATS approach incorporate the reflection into prompt context before next round of simulation. You can define the agent using the LATS approach as follows.

In [13]:
lats_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "lats", "nsim": 5, "max_depth": 4},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [14]:
lats_res = user_proxy.initiate_chat(recipient=lats_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

## Here are some previous trajectories and reflections



---

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not appear to provide any detail on how the expected maximum dice value is calculated from multiple rolls, specifically with three rolls of a 6-sided die. There is an opportunity to clarify the mechanics of maximizing the value across multiple rolls or provide a probability analysis that leads to the correct answer.

**Possible Options:**
Option 1: Calculate the probability distribution of the max

In [15]:
print(lats_res.summary)

### Step 1: Clarification of the Question

The question asks for the "expected maximum" value when rolling a 6-sided die three times. In probability theory, the expected maximum refers to the average of the highest value obtained from multiple rolls. This is a common topic in statistics and probability, and it helps to understand the outcomes we might expect over repeated experiments.

### Step 2: Calculate the Expected Maximum Value

To find the expected maximum when rolling three 6-sided dice, we need to calculate the probabilities of getting each possible maximum value (from 1 to 6) and then compute the expected value.

1. **Outcomes**: The possible maximum values from rolling three dice are from 1 to 6.

2. **Probabilities**:
   - The probability that the maximum value is exactly \( k \) can be calculated using:
     \[
     P(\text{max} = k) = P(\text{max} \leq k) - P(\text{max} \leq k - 1)
     \]
   - To do that, we calculate:
     - \( P(\text{max} \leq k) = \left( \frac{k}{6} 

## Use a different Model for Grading 

To use a different model for grading instead of gpt-4o, pass the `grader_llm_config` argument when initializing the `ReasoningAgent`. This ensures that the grading of trajectories is performed using the specified configuration from the `config_list`, separate from the main `llm_config`.

In [16]:
grader_config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]

grader_llm_config = {"config_list": grader_config_list}

writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    grader_llm_config=grader_llm_config,
    reason_config={"method": "lats", "nsim": 2, "max_depth": 3},
)

In [17]:
import json

In [18]:
data = reason_agent._root.to_dict()
with open("reasoning_tree.json", "w") as f:
    json.dump(data, f)

# recover the node
with open("reasoning_tree.json", "r") as f:
    new_node = ThinkNode.from_dict(json.load(f))

In [19]:
from autogen.agentchat.contrib.reasoning_agent import extract_rlhf_preference_dataset, extract_sft_dataset

sft_data = extract_sft_dataset(reason_agent._root)
rlhf_data = extract_rlhf_preference_dataset(reason_agent._root)

In [20]:
print(rlhf_data)

[{'instruction': '# Question:\nDesign a mixed integer linear program for a coffee roasting supply chain\n---\n', 'reflection': 'The previous steps did not provide any detail regarding the design elements or constraints necessary for a mixed integer linear program (MILP) tailored specifically for a coffee roasting supply chain. There is a lack of clarity on what aspects (e.g., production, transportation, inventory) the user wishes to include in the MILP. Thus, the next steps should focus on narrowing down the requirements and addressing any potential misunderstandings regarding the specifics of the MILP.', 'preferred_response': 'Step 1: Define the key components of the coffee roasting supply chain, such as suppliers, roasting facilities, distribution centers, and retail outlets.', 'dispreferred_response': 'Step 1: TERMINATE.'}, {'instruction': '# Question:\nDesign a mixed integer linear program for a coffee roasting supply chain\n---\n', 'reflection': 'The previous steps did not provide

## Utilizing Ground Truth to Enhance Training Data Generation

Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore:
- The process of incorporating ground truth into prompts
- The methods by which the agent leverages ground truth for evaluation

In [21]:
ans = user_proxy.initiate_chat(lats_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

## Here are some previous trajectories and reflections



---

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not appear to provide any detail on how the expected maximum dice value is calculated from multiple rolls, specifically with three rolls of a 6-sided die. There is an opportunity to clarify the mechanics of maximizing the value across multiple rolls or provide a probability analysis that leads to the correct answer.

**Possible Options:**
Option 1: Calculate the probability distribution of the max

In [22]:
print(ans.summary)

To address the question of what the expected maximum value is when rolling a 6-sided die three times, I'll follow your outlined steps.

### Step 1: Calculate the Probability Distribution

To find the expected maximum value when rolling three independent 6-sided dice, we first need to calculate the probabilities for each possible maximum value (1 through 6).

1. **Define Possible Outcomes**: The possible maximum value from rolling three 6-sided dice is \( k \) where \( k \) can be 1, 2, 3, 4, 5, or 6.

2. **Calculate Probabilities**:
   - The probability that the maximum value is exactly \( k \) is given by:
     \[
     P(\text{max} = k) = P(\text{max} \leq k) - P(\text{max} \leq k - 1)
     \]

   - We calculate \( P(\text{max} \leq k) \) as:
     \[
     P(\text{max} \leq k) = \left( \frac{k}{6} \right)^3
     \]

3. **Calculate Each Probability**:
   - For \( k = 1 \):
     \[
     P(\text{max} = 1) = \left( \frac{1}{6} \right)^3 = \frac{1}{216}
     \]
   - For \( k = 2 \):
     \[

## Enable Code Execution During Reasoning

You can setup the parameter `code_execution_config` in reasoning agent to enable code execution during reasoning.
By default, `code_execution_config=False`, which means it will not execute code for reasoning. 

In [23]:
lats_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    reason_config={"method": "lats", "nsim": 5, "max_depth": 4},
    code_execution_config={"use_docker": False},  # Enable Code execution. We skip docker here for simplicity
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

ans = user_proxy.initiate_chat(lats_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

## Here are some previous trajectories and reflections



---

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps appear to lack a detailed analysis of how to calculate the expected maximum value of rolling a 6-sided die three times. While rolling a die three times could be considered, there seems to be a misunderstanding of the expected value operation involved in finding the maximum value. It’s crucial to derive a proper formula or computation method to get to the expected maximum value clearly.

**Possible Op

In [24]:
print(ans.summary)

To calculate the expected maximum value that can be obtained from rolling a 6-sided die three times, we can follow a systematic approach. Here’s a breakdown of the steps involved in both the theoretical and analytical aspects:

### Step 1: Calculate Theoretical Maximum Value

When rolling a 6-sided die, the outcomes for each die roll range from 1 to 6. When rolling three dice, we want to find the maximum value \( M \) that could result from these rolls. 

The maximum of three rolls can take on any of the values from 1 to 6. We need to calculate the probabilities of achieving each of these maximum values.

### Step 2: Joint Distribution of Probabilities

1. The probability that the maximum value is exactly \( k \) (where \( k \) ranges from 1 to 6) can be expressed as follows:

   \[
   P(M = k) = P(\text{at least one die is } k) - P(\text{all dice } < k
   \]

   - The probability that a single die shows a value less than or equal to \( k \) is \( \frac{k}{6} \).
   - The probability t

## Visualizing the Reasoning Tree

### Installation of Graphviz

To visualize the reasoning tree, you need to install Graphviz. Please note that using `pip install` may not be sufficient for all operating systems. In some cases, you might need to manually download and install Graphviz.

`pip install graphviz`

### To save the visualization as "tree_of_thoughts.png", run the following command:
```python
visualize_tree(mcts_agent._root)
```

## Utilizing ReasoningAgent for Nested Chat Interactions

In this example, we will explore how the ReasoningAgent can be employed to facilitate nested chat interactions, specifically for writing a blog post about NVIDIA. The agent will engage in a structured dialogue to enhance the quality of the content through iterative feedback and reasoning.

### Task: Writing a Blog Post on NVIDIA

The goal is to generate a concise yet engaging blog post about NVIDIA. The process involves one turn (for simplicity) of conversation where the agent reflects on the content, reasons about improvements, and incorporates user feedback. You can update the `max_turns` parameter to execute multiple times.

**WARNING:** It may take a long time to run this example (up to 10 minutes).

In [25]:
writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    reason_config={"method": "lats", "nsim": 2, "max_depth": 3},
)


def reflection_message(recipient, messages, sender, config):
    print("Reflecting...", "yellow")
    return f"Reflect, Reason and provide critique on the following writing. \n\n {recipient.chat_messages_for_summary(sender)[-1]['content']}"

In [26]:
user_proxy.register_nested_chats(
    [
        {
            "recipient": reason_agent_for_writer,
            "message": reflection_message,
            "summary_method": "last_msg",
            "max_turns": 1,
        }
    ],
    trigger=writer,
)

In [27]:
task = """Write a concise but engaging blogpost about Nvidia."""
res = user_proxy.initiate_chat(recipient=writer, message=task, max_turns=2, summary_method="last_msg")

[33muser_proxy[0m (to Writer):

Write a concise but engaging blogpost about Nvidia.

--------------------------------------------------------------------------------
[33mWriter[0m (to user_proxy):

**Title: Nvidia: The Powerhouse of Visual Computing and AI Innovation**

In a world increasingly defined by digital experiences, Nvidia stands as a titan, driving the future of technology with its groundbreaking advancements in graphics processing. Established in 1993, Nvidia has evolved from a graphics card manufacturer into a leader in AI, gaming, and deep learning.

At the heart of Nvidia’s success is its Graphics Processing Unit (GPU), a marvel of engineering that has transformed not just gaming but industries ranging from film to healthcare. The iconic GeForce series has become synonymous with high-performance gaming, delivering stunning graphics that bring virtual worlds to life. However, Nvidia's impact extends far beyond the gaming realm; their GPUs power some of the most complex

In [28]:
print(res.summary)

**Title: Nvidia: The Architect of Tomorrow’s Visual Computing and AI**

Imagine a world where virtual landscapes feel as real as the ground beneath your feet, where artificial intelligence not only assists but understands and anticipates your needs. This is not just a vision of the future; it’s the reality Nvidia is crafting today. Established in 1993, Nvidia has transformed from a humble graphics card manufacturer into a powerhouse driving significant advancements in gaming, artificial intelligence, and deep learning.

Take, for instance, the gaming industry—Nvidia's GeForce GPUs revolutionized the way we experience video games, providing stunning graphics and immersive environments. In 2015, games powered by Nvidia's technology, such as "The Witcher 3: Wild Hunt," showcased an unprecedented level of detail and realism, elevating the gaming experience. But Nvidia’s impact extends well beyond gaming; today, their GPUs are at the heart of industries like healthcare, automotive, and scie

## Use a different Model for Grading 

To use a different model for grading instead of gpt-4o, pass the `grader_llm_config` argument when initializing the `ReasoningAgent`. This ensures that the grading of trajectories is performed using the specified configuration from the `config_list`, separate from the main `llm_config`.

In [29]:
grader_config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]

grader_llm_config = {"config_list": grader_config_list}

writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    grader_llm_config=grader_llm_config,
    reason_config={"method": "lats", "nsim": 2, "max_depth": 3},
)

## Save data to future training
In this section, we will focus on saving the reasoning agent's decision-making data to help future training. 
By capturing the structure and content of the reasoning tree, we can create a valuable dataset that can be used 
to enhance the agent's learning process. This data will allow us to analyze the agent's reasoning patterns, 
improve its performance, and refine its ability to generate high-quality responses. 
The saved data can be utilized for various training methodologies, including supervised fine-tuning and 
reinforcement learning, ultimately contributing to the development of a more robust and effective reasoning agent.

In [30]:
import json

In [31]:
data = reason_agent._root.to_dict()
with open("reasoning_tree.json", "w") as f:
    json.dump(data, f)

# recover the node
with open("reasoning_tree.json", "r") as f:
    new_node = ThinkNode.from_dict(json.load(f))

In [32]:
from autogen.agentchat.contrib.reasoning_agent import extract_rlhf_preference_dataset, extract_sft_dataset

sft_data = extract_sft_dataset(reason_agent._root)
rlhf_data = extract_rlhf_preference_dataset(reason_agent._root)

In [33]:
print(rlhf_data)

[{'instruction': '# Question:\nDesign a mixed integer linear program for a coffee roasting supply chain\n---\n', 'reflection': 'The previous steps did not provide any detail regarding the design elements or constraints necessary for a mixed integer linear program (MILP) tailored specifically for a coffee roasting supply chain. There is a lack of clarity on what aspects (e.g., production, transportation, inventory) the user wishes to include in the MILP. Thus, the next steps should focus on narrowing down the requirements and addressing any potential misunderstandings regarding the specifics of the MILP.', 'preferred_response': 'Step 1: Define the key components of the coffee roasting supply chain, such as suppliers, roasting facilities, distribution centers, and retail outlets.', 'dispreferred_response': 'Step 1: TERMINATE.'}, {'instruction': '# Question:\nDesign a mixed integer linear program for a coffee roasting supply chain\n---\n', 'reflection': 'The previous steps did not provide

## Utilizing Ground Truth to Enhance Training Data Generation

Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore:
- The process of incorporating ground truth into prompts
- The methods by which the agent leverages ground truth for evaluation

In [34]:
prompt = """What is the expected maximum dice value if you can roll a 6-sided dice three times?

GROUND_TRUTH:
We define X as the highest outcome among the three rolls.
The probability that X is at least m is 1 - \\left(\frac{m-1}{6}\right)^3 for each m from 1 to 6.
Summing these probabilities gives the expectation E(X) = \\sum_{m=1}^{6} [1 - (\frac{m-1}{6})^3].
Calculating this sum results in E(X) = 6 - \frac{225}{216} = \frac{119}{24}, which approximates to 4.9583.
Therefore, the expected maximum value when rolling a six-sided die three times is \frac{119}{24} or approximately 4.9583.
"""
random.seed(1)  # setup seed for reproducibility

mcts_agent2 = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "mcts", "nsim": 5, "max_depth": 4},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)


ans = user_proxy.initiate_chat(mcts_agent2, message=prompt, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

GROUND_TRUTH:
We define X as the highest outcome among the three rolls.
ight)^3 for each m from 1 to 6.ast m is 1 - \left(rac{m-1}{6}
Summing these probabilities gives the expectation E(X) = \sum_{m=1}^{6} [1 - (rac{m-1}{6})^3].
Calculating this sum results in E(X) = 6 - rac{225}{216} = rac{119}{24}, which approximates to 4.9583.
Therefore, the expected maximum value when rolling a six-sided die three times is rac{119}{24} or approximately 4.9583.


--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not 

In [35]:
print(ans.summary)

To find the expected maximum value when rolling a 6-sided die three times, we will calculate the probability that the maximum of the three rolls is equal to each number from 1 to 6.

1. **Calculate probabilities for each maximum:**
   - The maximum value is \( k \) if at least one die shows \( k \) and none show values greater than \( k \).
   - For a maximum of \( k \):
     - The probability that a single die is less than or equal to \( k \) is \( \frac{k}{6} \).
     - The probability that all three dice are less than or equal to \( k \) is \( \left(\frac{k}{6}\right)^3 \).
     - The probability that all three dice are less than or equal to \( k-1 \) is \( \left(\frac{k-1}{6}\right)^3 \).

   Therefore, the probability that the maximum of the three dice rolls is exactly \( k \) is:
   \[
   P(\text{max} = k) = P(\text{max} \leq k) - P(\text{max} \leq k-1) = \left(\frac{k}{6}\right)^3 - \left(\frac{k-1}{6}\right)^3
   \]

2. **Calculating for each \( k \):**
   - \( k = 1 \):
     \

## Forest of Thoughts

The concept of a "Forest of Thoughts" allows us to leverage bootstrapping techniques to execute the tree of thoughts multiple times, creating a diverse set of answers. After running these independent reasoning processes, we can aggregate them to form our final answer.

In [36]:
forest_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "dfs", "max_depth": 4, "forest_size": 3},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [37]:
ans = user_proxy.initiate_chat(forest_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not provide any context or specific previous actions taken. However, the user has posed a mathematical question related to probability involving a six-sided die. The user seeks to understand the expected maximum value from rolling a die three times, which suggests that a probabilistic approach should be used. It’s important to ensure the calculations are accurate and centered on the maximum value from the rolls.

**Possible Options:**
Option 1: Calculate the 

In [38]:
print(ans.summary)

To find the expected maximum value from rolling a 6-sided die three times, let's look at the problem through a structured approach, incorporating different perspectives.

### Understanding the Problem

The goal is to determine the average of the maximum values we can get when rolling a die three times. Each face of the die ranges from 1 to 6, so we will assess the probabilities of each face appearing as the maximum among the three rolls.

### Deriving the Expected Maximum Value

Let \( M \) represent the maximum value obtained from the three rolls. We will calculate the probability \( P(M = k) \) for \( k \) from 1 to 6.

1. **Calculate \( P(M \leq k) \)**:
   This is the probability that all three rolls are less than or equal to \( k \):

   \[
   P(M \leq k) = \left(\frac{k}{6}\right)^3
   \]

2. **Calculate \( P(M = k) \)**:
   The probability that the maximum value equals exactly \( k \):

   \[
   P(M = k) = P(M \leq k) - P(M \leq k-1) = \left(\frac{k}{6}\right)^3 - \left(\frac{k-