# ReasoningAgent - Advanced LLM Reasoning with Multiple Search Strategies

## Introduction

The `ReasoningAgent` is designed to enhance language models' reasoning capabilities through systematic exploration of thought processes. By implementing the Tree of Thoughts (ToT) framework, it enables LLMs like GPT-4 and Llama to break down complex problems into manageable steps and explore multiple solution paths simultaneously.

This notebook demonstrates the key features and capabilities of the `ReasoningAgent`, showing how it can effectively reason about problems even when using smaller models like `gpt-4o-mini`.

## Search Strategies

The `ReasoningAgent` supports multiple search strategies for exploring the reasoning space:

### 1. Beam Search (Default)
- Maintains the top `k` most promising paths at each step
- Efficient for problems with clear evaluation criteria
- Configurable beam width to balance exploration vs computation
- Special case: DFS mode (beam size = 1) for linear reasoning similar to Chain-of-Thought

### 2. Monte Carlo Tree Search (MCTS)
- Balances exploration and exploitation using UCT formula
- Particularly effective for problems with delayed rewards
- Stochastic exploration helps avoid local optima
- Configurable number of simulations and exploration constant

### 3. Language Agent Tree Search (LATS)
- Provides immediate reflection feedback before the next simulation
- Helps identify poor reasoning paths early for future improvement
- Especially useful for complex multi-step reasoning

## Core Components

1. **Thinker Agent**: Generates potential next steps in the reasoning process
2. **Grader Agent**: Evaluates the quality of each reasoning step
3. **Code Execution**: a child user agent will execute code automatically during reasoning
4. **Tree Structure**: Organizes thoughts hierarchically for systematic exploration
5. **Visualization Tools**: Built-in Graphviz support for analyzing reasoning paths
6. **Logging Features**: Log and save thinking trajectories to finetune the language model
7. **Configuration Options**: The agent is highly configurable through a single `reason_config` dictionary

In [1]:
import os
import random

from autogen import AssistantAgent, ReasoningAgent, ThinkNode, UserProxyAgent

api_key = os.environ.get("OPENAI_API_KEY")

config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]

question = "What is the expected maximum dice value if you can roll a 6-sided dice three times?"
random.seed(1)  # setup seed for reproducibility

In [2]:
def last_meaningful_msg(sender, recipient, summary_args):
    import warnings

    if sender == recipient:
        return "TERMINATE"

    summary = ""
    chat_messages = recipient.chat_messages[sender]

    for msg in reversed(chat_messages):
        try:
            content = msg["content"]
            if isinstance(content, str):
                summary = content.replace("TERMINATE", "")
            elif isinstance(content, list):
                # Remove the `TERMINATE` word in the content list.
                summary = "\n".join(
                    x["text"].replace("TERMINATE", "") for x in content if isinstance(x, dict) and "text" in x
                )
            if summary.strip().rstrip():
                return summary
        except (IndexError, AttributeError) as e:
            warnings.warn(f"Cannot extract summary using last_msg: {e}. Using an empty str as summary.", UserWarning)
    return summary

## Chain-of-Thought Reasoning with DFS

The simplest form of tree-based reasoning uses depth-first search (DFS) to explore a single path, similar to OpenAI's O1 feature.
By setting `method="dfs"` in the reason_config, the agent will:
1. Generate one reasoning step at a time
2. Follow that single path until reaching a conclusion
3. Never explore alternative branches

Note: The effectiveness depends on the underlying model's training. Models not specifically trained for step-by-step reasoning
may show limited improvement with this approach.

In [3]:
reason_agent = ReasoningAgent(
    name="reason_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    reason_config={"method": "dfs", "max_depth": 3},  # Using DFS
    silent=False,
    # NOTE: it is equivalent to use beam size 1 for O1-style reasoning
    # reason_config={"method": "beam_search", "beam_size": 1, "max_depth": 3},
)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [4]:
ans = user_proxy.initiate_chat(reason_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to reason_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mreason_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to reason_agent):

REFLECTION:
The previous steps do not appear to include any concrete calculations or logical steps leading to a defined answer. The user has posed a clear question about the expected maximum dice value from multiple rolls of a 6-sided die, but an exploration of the calculation or probability has not been initiated. This is an opportunity to build a plan that clearly addresses how to approach calculating the expected maximum value.

**Possible Options:**
Option 1: Calculate the expected 

In [5]:
reason_agent._root.children

[Calculate the expected maximum value by considering the distribution of maximum values when rolling a 6-sided die three times. -> Depth: 1 Value: 0.7777777777777778 Visits: 0,
 Define the concept of maximum value in this context, ensuring clarity around what is being calculated. -> Depth: 1 Value: 0.8888888888888888 Visits: 0,
 Specify the formulas or methods used to calculate the expected maximum value, including considerations of probability. -> Depth: 1 Value: 0.7777777777777778 Visits: 0,
 Simulate the rolling of a 6-sided die three times multiple times to empirically determine the expected maximum value. -> Depth: 1 Value: 0.7777777777777778 Visits: 0,
 TERMINATE. -> Depth: 1 Value: 0.0 Visits: 0]

In [6]:
print(ans.summary)

To find the expected maximum value when rolling a 6-sided die three times, we can follow these steps:

### Step 1: Define the concept of maximum value
In this context, the "maximum value" refers to the highest result obtained from the three rolls of a 6-sided die. Each die can show a value from 1 to 6, and we want to calculate what the average of these maximum values would be when rolling three dice.

### Step 2: Analyze the theoretical approach
To find the expected maximum of three rolls of a six-sided die (denoted as \(X\)), we can use the property of expected values. The probability of the maximum value being \(k\) (where \(1 \leq k \leq 6\)) can be calculated by considering the complement event: the probability that all three rolls are less than or equal to \(k\).

The probability that all three dice are less than or equal to \(k\) is:
\[
P(X \leq k) = \left( \frac{k}{6} \right)^3
\]

Thus, the probability that the maximum value is exactly \(k\) can be calculated as:
\[
P(X = k) = 

## Beam Search in Tree of Thought

Beam Search is a powerful technique used in tree-based reasoning that allows the agent to explore multiple paths simultaneously. By setting `beam_size` greater than 1, the agent can maintain several candidate solutions at each step, evaluating them based on their potential to lead to the best final answer. This method is particularly effective when the solution space is large and complex, as it balances exploration and exploitation, ensuring that promising paths are prioritized while still considering alternative options.

In this approach, the agent generates multiple reasoning steps in parallel, allowing it to compare different trajectories and select the most promising ones for further exploration. This can lead to more robust and accurate conclusions, especially in scenarios where intermediate evaluations are critical to the final outcome.

In [7]:
reason_agent = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    reason_config={"method": "beam_search", "beam_size": 3, "max_depth": 3},
)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config={"use_docker": False},
    max_consecutive_auto_reply=10,
)

In [8]:
ans = user_proxy.initiate_chat(
    reason_agent,
    message=question,
    summary_method=last_meaningful_msg,
)

[33muser_proxy[0m (to reason_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mreason_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to reason_agent):

REFLECTION:
The previous steps do not appear to include any concrete calculations or logical steps leading to a defined answer. The user has posed a clear question about the expected maximum dice value from multiple rolls of a 6-sided die, but an exploration of the calculation or probability has not been initiated. This is an opportunity to build a plan that clearly addresses how to approach calculating the expected maximum value.

**Possible Options:**
Option 1: Calculate the expected 

In [9]:
print(ans.summary)

The empirical expected maximum value from the Monte Carlo simulation is approximately 4.9588, which is very close to the theoretical value of 4.9583 we calculated earlier. This indicates that both methods are consistent with each other.

### Further Steps to Conclude
1. **Validation of Results**: Since the empirical and theoretical values are closely aligned, we could analyze any minor differences and check for consistency across multiple runs of the simulation if desired.
2. **Theoretical Analysis Using CDF**: One could further confirm the calculations using the cumulative distribution function for the maximum of three uniform random variables. However, the prior calculations and simulations already provide strong evidence for the expected maximum value.
3. **Statistical Significance**: We have fulfilled our goal of determining the expected maximum value when rolling a die multiple times through both theoretical calculations and simulation.

Given these steps and confirmations, the va

## MCTS
This section demonstrates how to use Monte Carlo Tree Search (MCTS) with ReasoningAgent for complex reasoning tasks. MCTS provides several advantages over beam search when:

1. Ground truth evaluation is available
2. LLM-based evaluation is expensive
3. You want to generate diverse, high-quality training data

In [10]:
mcts_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "mcts", "nsim": 5, "max_depth": 4},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [11]:
ans = user_proxy.initiate_chat(mcts_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not exist, indicating no errors or successes to analyze directly. To answer the user’s question effectively, it’s important to break down the expected maximum dice value calculation and consider how multiple rolls of a six-sided die can influence that value.

**Possible Options:**
Option 1: Calculate the expected value of the maximum roll after rolling a 6-sided die three times using probability theory. This involves determining the likelihood of each possibl

In [12]:
print(ans.summary)

To find the expected maximum value when rolling a 6-sided die three times, we can follow through the systematic steps you've outlined. Let's perform the calculations step by step:

### Step 1: Understanding the Problem
When you roll a die three times, the possible outcomes for each roll range from 1 to 6. We want to calculate the expected value of the highest number rolled in those three attempts.

### Step 2: Calculating the Probability of Each Maximum

1. **Maximum is 1**: 
   - Probability that all three rolls are 1: \( P(\text{max} = 1) = \left(\frac{1}{6}\right)^3 = \frac{1}{216} \)

2. **Maximum is 2**: 
   - Probability that at least one roll is 2, and none are greater than 2: 
   \[
   P(\text{max} = 2) = \left(\frac{2}{6}\right)^3 - \left(\frac{1}{6}\right)^3 = \frac{8}{216} - \frac{1}{216} = \frac{7}{216}
   \]

3. **Maximum is 3**:
   - Probability: 
   \[
   P(\text{max} = 3) = \left(\frac{3}{6}\right)^3 - \left(\frac{2}{6}\right)^3 = \frac{27}{216} - \frac{8}{216} = \frac{

## LATS

It is important to note that our reasoning agent operates based on "process" and lacks direct access to the environment. In contrast, the LATS approach relies on feedback from the environment. To address this, we utilize our existing grader agent to generate pseudo-rewards and provide feedback. The major difference between our LATS implementation and our MCTS implementation is that the LATS approach incorporate the reflection into prompt context before next round of simulation. You can define the agent using the LATS approach as follows.

In [13]:
lats_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "lats", "nsim": 5, "max_depth": 4},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [14]:
lats_res = user_proxy.initiate_chat(recipient=lats_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

## Here are some previous trajectories and reflections



---

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps did not provide specific trajectories or reflections, but I can analyze the question based on the information given. It is clear that we need to calculate the expected maximum dice value of a 6-sided die rolled three times. The question is straightforward, but there might be a lack of clarity in how to approach this calculation, especially if the relevant mathematical principles or formulas have not 

In [15]:
print(lats_res.summary)

To find the expected maximum value when rolling a 6-sided die three times, we can follow a methodical approach:

### Step 1: Understand the Problem
When rolling a 6-sided die three times, we want to calculate the expected value of the maximum of those three rolls, which we'll denote as \( E[\max(X_1, X_2, X_3)] \) where \( X_1, X_2, \) and \( X_3 \) are the three independent die rolls.

### Step 2: Probability Calculation
We need to compute the expected value by considering the probabilities associated with each possible maximum outcome (1 through 6):

1. **Calculate the Probability that the Maximum is Less than or Equal to \( k \)**:
   \[
   P(\max(X_1, X_2, X_3) \leq k) = \left(\frac{k}{6}\right)^3
   \]
   This accounts for the fact that all three dice must show a value that is \( \leq k \).

2. **Find the Probability of Exactly the Maximum**:
   \[
   P(\max(X_1, X_2, X_3) = k) = P(\max(X_1, X_2, X_3) \leq k) - P(\max(X_1, X_2, X_3) \leq k-1)
   \]

   - For \( k = 1 \):
     \[
 

## Use a different Model for Grading 

To use a different model for grading instead of gpt-4o, pass the `grader_llm_config` argument when initializing the `ReasoningAgent`. This ensures that the grading of trajectories is performed using the specified configuration from the `config_list`, separate from the main `llm_config`.

In [16]:
grader_config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]

grader_llm_config = {"config_list": grader_config_list}

writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    grader_llm_config=grader_llm_config,
    reason_config={"method": "lats", "nsim": 2, "max_depth": 3},
)

In [17]:
import json

In [18]:
data = reason_agent._root.to_dict()
with open("reasoning_tree.json", "w") as f:
    json.dump(data, f)

# recover the node
with open("reasoning_tree.json", "r") as f:
    new_node = ThinkNode.from_dict(json.load(f))

In [19]:
from autogen.agentchat.contrib.reasoning_agent import extract_rlhf_preference_dataset, extract_sft_dataset

sft_data = extract_sft_dataset(reason_agent._root)
rlhf_data = extract_rlhf_preference_dataset(reason_agent._root)

In [20]:
print(rlhf_data)

[{'instruction': '# Question:\nexitcode: 0 (execution succeeded)\nCode output: \nThe empirical expected maximum value when rolling a 6-sided die three times is: 4.9588\n---\n', 'reflection': "The previous steps seem to have accurately calculated an empirical expected maximum value for rolling a 6-sided die three times, resulting in a value of 4.9588. However, without further context, it's unclear if additional analyses or validation checks are necessary, or if future steps should focus on practical applications of this calculation such as simulations or predictions.", 'preferred_response': 'Step 1: Validate the empirical expected maximum value calculation using a different method or simulation to ensure accuracy.', 'dispreferred_response': 'Step 1: Present the findings to stakeholders or a broader audience for their feedback and insights.'}, {'instruction': '# Question:\nexitcode: 0 (execution succeeded)\nCode output: \nThe empirical expected maximum value when rolling a 6-sided die th

## Utilizing Ground Truth to Enhance Training Data Generation

Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore:
- The process of incorporating ground truth into prompts
- The methods by which the agent leverages ground truth for evaluation

In [21]:
ans = user_proxy.initiate_chat(lats_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

## Here are some previous trajectories and reflections



---

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps did not provide specific trajectories or reflections, but I can analyze the question based on the information given. It is clear that we need to calculate the expected maximum dice value of a 6-sided die rolled three times. The question is straightforward, but there might be a lack of clarity in how to approach this calculation, especially if the relevant mathematical principles or formulas have not 

In [22]:
print(ans.summary)

To compute the expected maximum value when rolling a 6-sided die three times, we can break down the problem step by step as outlined in your thinking process.

### Step 1: Calculate the Expected Maximum Value
We want to find the expected value of the maximum of three rolls, which we will denote as \( E[\max(X_1, X_2, X_3)] \), where \( X_1, X_2, X_3 \) are the results of the three independent rolls of the die.

### Step 2: Calculate the Probabilities of Each Possible Maximum Outcome
1. **Possible maxima**: The maximum can take on values from 1 to 6.
2. **Calculate the probability that the maximum value is less than or equal to \( k \)**:
   \[
   P(\max(X_1, X_2, X_3) \leq k) = \left(\frac{k}{6}\right)^3
   \]
   This is the probability that all three dice rolls are less than or equal to \( k \).

3. **Find \( P(\max(X_1, X_2, X_3) = k) \)**:
   \[
   P(\max(X_1, X_2, X_3) = k) = P(\max(X_1, X_2, X_3) \leq k) - P(\max(X_1, X_2, X_3) \leq k-1)
   \]

### Detailed Calculations for Each M

## Code Execution During Reasoning

You can setup the parameter `code_execution_config` in reasoning agent to enable code execution during reasoning.
By default, `code_execution_config=False`, which means it will not execute code for reasoning. 

In [23]:
lats_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    reason_config={"method": "lats", "nsim": 5, "max_depth": 4},
    code_execution_config={"use_docker": False, "work_dir": "mypy_cache"},
    # Enable Code execution. We skip docker here for simplicity
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

ans = user_proxy.initiate_chat(lats_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

## Here are some previous trajectories and reflections



---

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not provide specific details about any prior actions taken or calculations performed regarding the maximum dice value from rolling a 6-sided die three times. However, I expect this question involves analyzing the mathematical expectation of the maximum value obtainable from the rolls. Evaluating expected maximum values is a common inquiry in probability, and creating an accurate simulation can enh

In [24]:
print(ans.summary)

To determine the expected maximum value when rolling a 6-sided die three times, we can follow the proposed steps.

### Step 1: Calculate the Probability Distribution of Maximum Values

1. **Define the random variables**:
   Let \( X_1, X_2, X_3 \) be the outcomes of the three rolls of a 6-sided die.

2. **Calculate the probability that the maximum value is less than or equal to \( k \)**:
   \[
   P(X_{\text{max}} \leq k) = \left(\frac{k}{6}\right)^3
   \]
   This gives the probability that all three dice show values less than or equal to \( k \).

3. **Calculate the probability of the maximum value being exactly \( k \)**:
   \[
   P(X_{\text{max}} = k) = P(X_{\text{max}} \leq k) - P(X_{\text{max}} \leq k-1)
   \]
   Thus, 
   \[
   P(X_{\text{max}} = k) = \left(\frac{k}{6}\right)^3 - \left(\frac{k-1}{6}\right)^3
   \]
   This applies for \( k = 1, 2, 3, 4, 5, 6 \).

### Step 2: Perform the Expected Value Calculation

The expected maximum value \( E[X_{\text{max}}] \) is calculated us

## Visualizing the Reasoning Tree

### Installation of Graphviz

To visualize the reasoning tree, you need to install Graphviz. Please note that using `pip install` may not be sufficient for all operating systems. In some cases, you might need to manually download and install Graphviz.

`pip install graphviz`

### To save the visualization as "tree_of_thoughts.png", run the following command:
```python
visualize_tree(mcts_agent._root)
```

## Utilizing ReasoningAgent for Nested Chat Interactions

In this example, we will explore how the ReasoningAgent can be employed to facilitate nested chat interactions, specifically for writing a blog post about NVIDIA. The agent will engage in a structured dialogue to enhance the quality of the content through iterative feedback and reasoning.

### Task: Writing a Blog Post on NVIDIA

The goal is to generate a concise yet engaging blog post about NVIDIA. The process involves one turn (for simplicity) of conversation where the agent reflects on the content, reasons about improvements, and incorporates user feedback. You can update the `max_turns` parameter to execute multiple times.

**WARNING:** It may take a long time to run this example (up to 10 minutes).

In [25]:
writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    reason_config={"method": "lats", "nsim": 2, "max_depth": 3},
)


def reflection_message(recipient, messages, sender, config):
    print("Reflecting...", "yellow")
    return f"Reflect, Reason and provide critique on the following writing. \n\n {recipient.chat_messages_for_summary(sender)[-1]['content']}"

In [26]:
user_proxy.register_nested_chats(
    [
        {
            "recipient": reason_agent_for_writer,
            "message": reflection_message,
            "summary_method": "last_msg",
            "max_turns": 1,
        }
    ],
    trigger=writer,
)

In [27]:
task = """Write a concise but engaging blogpost about Nvidia."""
res = user_proxy.initiate_chat(recipient=writer, message=task, max_turns=2, summary_method="last_msg")

[33muser_proxy[0m (to Writer):

Write a concise but engaging blogpost about Nvidia.

--------------------------------------------------------------------------------
[33mWriter[0m (to user_proxy):

**Title: Nvidia: The Powerhouse of Visual Computing and AI Innovation**

In a world increasingly defined by digital experiences, Nvidia stands as a titan, driving the future of technology with its groundbreaking advancements in graphics processing. Established in 1993, Nvidia has evolved from a graphics card manufacturer into a leader in AI, gaming, and deep learning.

At the heart of Nvidia’s success is its Graphics Processing Unit (GPU), a marvel of engineering that has transformed not just gaming but industries ranging from film to healthcare. The iconic GeForce series has become synonymous with high-performance gaming, delivering stunning graphics that bring virtual worlds to life. However, Nvidia's impact extends far beyond the gaming realm; their GPUs power some of the most complex

In [28]:
print(res.summary)

Thank you for your thoughtful feedback! I appreciate your constructive insights and will revise the blog post to incorporate your suggestions. Here's the improved version:

---

**Title: Nvidia: The Powerhouse of Visual Computing and AI Innovation**

In a world increasingly defined by digital experiences, Nvidia stands as a titan, driving the future of technology with its groundbreaking advancements in graphics processing. Established in 1993, Nvidia has evolved from a graphics card manufacturer into a leader in AI, gaming, and deep learning.

At the heart of Nvidia’s success is its Graphics Processing Unit (GPU), a marvel of engineering that has transformed not just gaming but industries ranging from film to healthcare. The iconic GeForce series has become synonymous with high-performance gaming, delivering stunning graphics that bring virtual worlds to life. In healthcare, for example, Nvidia's GPUs power complex medical imaging analysis, enabling faster and more accurate diagnoses. 

## Use a different Model for Grading 

To use a different model for grading instead of gpt-4o, pass the `grader_llm_config` argument when initializing the `ReasoningAgent`. This ensures that the grading of trajectories is performed using the specified configuration from the `config_list`, separate from the main `llm_config`.

In [29]:
grader_config_list = [{"model": "gpt-4o-mini", "api_key": api_key}]

grader_llm_config = {"config_list": grader_config_list}

writer = AssistantAgent(
    name="Writer",
    llm_config={"config_list": config_list},
    system_message="""
    You are a professional writer, known for your insightful and engaging articles.
    You transform complex concepts into compelling narratives.
    You should improve the quality of the content based on the feedback from the user.
    """,
)
reason_agent_for_writer = ReasoningAgent(
    name="reason_agent",
    llm_config={"config_list": config_list},
    grader_llm_config=grader_llm_config,
    reason_config={"method": "lats", "nsim": 2, "max_depth": 3},
)

## Save data to future training
In this section, we will focus on saving the reasoning agent's decision-making data to help future training. 
By capturing the structure and content of the reasoning tree, we can create a valuable dataset that can be used 
to enhance the agent's learning process. This data will allow us to analyze the agent's reasoning patterns, 
improve its performance, and refine its ability to generate high-quality responses. 
The saved data can be utilized for various training methodologies, including supervised fine-tuning and 
reinforcement learning, ultimately contributing to the development of a more robust and effective reasoning agent.

In [30]:
import json

In [31]:
data = reason_agent._root.to_dict()
with open("reasoning_tree.json", "w") as f:
    json.dump(data, f)

# recover the node
with open("reasoning_tree.json", "r") as f:
    new_node = ThinkNode.from_dict(json.load(f))

In [32]:
from autogen.agentchat.contrib.reasoning_agent import extract_rlhf_preference_dataset, extract_sft_dataset

sft_data = extract_sft_dataset(reason_agent._root)
rlhf_data = extract_rlhf_preference_dataset(reason_agent._root)

In [33]:
print(rlhf_data)

[{'instruction': '# Question:\nexitcode: 0 (execution succeeded)\nCode output: \nThe empirical expected maximum value when rolling a 6-sided die three times is: 4.9588\n---\n', 'reflection': "The previous steps seem to have accurately calculated an empirical expected maximum value for rolling a 6-sided die three times, resulting in a value of 4.9588. However, without further context, it's unclear if additional analyses or validation checks are necessary, or if future steps should focus on practical applications of this calculation such as simulations or predictions.", 'preferred_response': 'Step 1: Validate the empirical expected maximum value calculation using a different method or simulation to ensure accuracy.', 'dispreferred_response': 'Step 1: Present the findings to stakeholders or a broader audience for their feedback and insights.'}, {'instruction': '# Question:\nexitcode: 0 (execution succeeded)\nCode output: \nThe empirical expected maximum value when rolling a 6-sided die th

## Utilizing Ground Truth to Enhance Training Data Generation

Access to ground truth answers allows us to improve the evaluation of reasoning paths. In this section, we will explore:
- The process of incorporating ground truth into prompts
- The methods by which the agent leverages ground truth for evaluation

In [34]:
prompt = """What is the expected maximum dice value if you can roll a 6-sided dice three times?

GROUND_TRUTH:
We define X as the highest outcome among the three rolls.
The probability that X is at least m is 1 - \\left(\frac{m-1}{6}\right)^3 for each m from 1 to 6.
Summing these probabilities gives the expectation E(X) = \\sum_{m=1}^{6} [1 - (\frac{m-1}{6})^3].
Calculating this sum results in E(X) = 6 - \frac{225}{216} = \frac{119}{24}, which approximates to 4.9583.
Therefore, the expected maximum value when rolling a six-sided die three times is \frac{119}{24} or approximately 4.9583.
"""
random.seed(1)  # setup seed for reproducibility

mcts_agent2 = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "mcts", "nsim": 5, "max_depth": 4},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)


ans = user_proxy.initiate_chat(mcts_agent2, message=prompt, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

GROUND_TRUTH:
We define X as the highest outcome among the three rolls.
ight)^3 for each m from 1 to 6.ast m is 1 - \left(rac{m-1}{6}
Summing these probabilities gives the expectation E(X) = \sum_{m=1}^{6} [1 - (rac{m-1}{6})^3].
Calculating this sum results in E(X) = 6 - rac{225}{216} = rac{119}{24}, which approximates to 4.9583.
Therefore, the expected maximum value when rolling a six-sided die three times is rac{119}{24} or approximately 4.9583.


--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not 

In [35]:
print(ans.summary)

To find the expected maximum value when rolling a 6-sided die three times, we can follow through the systematic steps you've outlined. Let's perform the calculations step by step:

### Step 1: Understanding the Problem
When you roll a die three times, the possible outcomes for each roll range from 1 to 6. We want to calculate the expected value of the highest number rolled in those three attempts.

### Step 2: Calculating the Probability of Each Maximum

1. **Maximum is 1**: 
   - Probability that all three rolls are 1: \( P(\text{max} = 1) = \left(\frac{1}{6}\right)^3 = \frac{1}{216} \)

2. **Maximum is 2**: 
   - Probability that at least one roll is 2, and none are greater than 2: 
   \[
   P(\text{max} = 2) = \left(\frac{2}{6}\right)^3 - \left(\frac{1}{6}\right)^3 = \frac{8}{216} - \frac{1}{216} = \frac{7}{216}
   \]

3. **Maximum is 3**:
   - Probability: 
   \[
   P(\text{max} = 3) = \left(\frac{3}{6}\right)^3 - \left(\frac{2}{6}\right)^3 = \frac{27}{216} - \frac{8}{216} = \frac{

## Forest of Thoughts

The concept of a "Forest of Thoughts" allows us to leverage bootstrapping techniques to execute the tree of thoughts multiple times, creating a diverse set of answers. After running these independent reasoning processes, we can aggregate them to form our final answer.

In [36]:
forest_agent = ReasoningAgent(
    name="mcts_agent",
    system_message="answer math questions",
    llm_config={"config_list": config_list},
    # setup small depth and simulations for conciseness.
    reason_config={"method": "dfs", "max_depth": 4, "forest_size": 3},
)


user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

In [37]:
ans = user_proxy.initiate_chat(forest_agent, message=question, summary_method=last_meaningful_msg)

[33muser_proxy[0m (to mcts_agent):

What is the expected maximum dice value if you can roll a 6-sided dice three times?

--------------------------------------------------------------------------------
[33mmcts_agent[0m (to tot_thinker):

# Question:
What is the expected maximum dice value if you can roll a 6-sided dice three times?
---

---
What are the possible next steps?

--------------------------------------------------------------------------------
[33mtot_thinker[0m (to mcts_agent):

REFLECTION:
The previous steps do not exist, indicating no errors or successes to analyze directly. To answer the user’s question effectively, it’s important to break down the expected maximum dice value calculation and consider how multiple rolls of a six-sided die can influence that value.

**Possible Options:**
Option 1: Calculate the expected value of the maximum roll after rolling a 6-sided die three times using probability theory. This involves determining the likelihood of each possibl

In [38]:
print(ans.summary)

To calculate the expected maximum value when rolling a 6-sided die three times, we need to evaluate the probability of each possible maximum value. Here’s a structured approach to the problem, and then I’ll present varied student responses.

### Calculation Overview:

1. **Define the Problem**:
   Let \( X \) be the maximum of three rolls of a 6-sided die. The possible values of \( X \) are 1, 2, 3, 4, 5, and 6.

2. **Calculate Probabilities**:
   - **\( P(X = 1) \)**: All rolls are 1.  
     \[
     P(X = 1) = \left(\frac{1}{6}\right)^3 = \frac{1}{216}
     \]

   - **\( P(X = 2) \)**: At least one roll is 2, and none are greater than 2. The probability all rolls are less than or equal to 2:  
     \[ 
     P(\text{max} < 2) = \left(\frac{1}{3}\right)^3 = \frac{1}{27} \Rightarrow P(X = 2) = 1 - P(X < 2) - P(X = 1) = \frac{151}{216}
     \]

   - **\( P(X = 3) \)**: At least one roll is 3, none greater than 3:  
     \[ 
     P(X < 3) = \left(\frac{1}{2}\right)^3 = \frac{1}{8} \Rightar