# Evaluation of Agent Framework on math problems

## Two problems are tested

Problem 1: Completely simplify and rationalize the denominator: $$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$$  
Answer: $$\boxed{\frac{5\sqrt{42}}{27}}$$


Problem 2: There are two cameras that take pictures of a traffic intersection. Camera A starts taking pictures at $6$ AM and takes a picture every $11$ minutes. Camera B starts taking pictures at $7$ AM and takes pictures every $7$ minutes. Camera A and Camera B take a picture at the same time at four different times before noon. When Camera A and Camera B take their last picture together, how many minutes before noon is it? 
Answer: $$\boxed{41}$$

In [3]:
math_problem1 = "Completely simplify and rationalize the denominator: $$\\frac{\\sqrt{160}}{\\sqrt{252}}\\times\\frac{\\sqrt{245}}{\\sqrt{108}}$$"
math_problem2 = "There are two cameras that take pictures of a traffic intersection. Camera A starts taking pictures at $6$ AM and takes a picture every $11$ minutes. Camera B starts taking pictures at $7$ AM and takes pictures every $7$ minutes. Camera A and Camera B take a picture at the same time at four different times before noon. When Camera A and Camera B take their last picture together, how many minutes before noon is it?"

## Use Agent Chat

In [2]:
from flaml.autogen.agentchat import AssistantAgent, UserProxyAgent
from flaml.autogen import oai
import os
os.environ["OPENAI_API_KEY"] = open("key_openai.txt", "r").read().strip()
config_list = oai.config_list_gpt4_gpt35()

# create an AssistantAgent named "assistant"
assistant = AssistantAgent(
    "assistant",
    llm_config={
        "seed": 42,
        "config_list": config_list,
    }
)
# create a UserProxyAgent instance named "user"
user = UserProxyAgent(
    "user",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE") or x.get("content", "").rstrip().endswith('"TERMINATE".'),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,  # set to True or image name like "python:3" to use docker
    },
)

import flaml
print(flaml.__version__)

2.0.2


In [3]:
# the assistant receives a message from the user, which contains the task description
assistant.reset()
user.initiate_chat(
    assistant,
    message=math_problem1,
)
assistant.reset()
user.initiate_chat(
    assistant,
    message=math_problem2,
)

user (to assistant):

Completely simplify and rationalize the denominator: $$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$$

--------------------------------------------------------------------------------
assistant (to user):

First, let's simplify each square root to its simplest radical form by dividing by appropriate square numbers.

Given:

$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$

= $\frac{\sqrt{16\times10}}{\sqrt{36\times7}}\times\frac{\sqrt{49\times5}}{\sqrt{36\times3}}$

= $\frac{4\sqrt{10}}{6\sqrt{7}}\times\frac{7\sqrt{5}}{6\sqrt{3}}$

= $\frac{2\sqrt{10}}{3\sqrt{7}}\times\frac{7\sqrt{5}}{6\sqrt{3}}$

The square root of a product can be written as the product of the square roots:

= $\frac{2\sqrt{10}\times7\sqrt{5}}{3\sqrt{7}\times6\sqrt{3}}$

I'll put all this in python code and compute the final solution.
```python
import sympy as sp

# Rationalize the denominators
expression = (2*sp.sqrt(10)*7*sp.sqrt(5))/(3*sp.sqrt(7)*6*sp.sqr

In [12]:
assistant = AssistantAgent(
    "assistant",
    llm_config={
        "seed": 43,
        "config_list": config_list,
    }
)

assistant.reset()
user.initiate_chat(
    assistant,
    message=math_problem1,
)

assistant.reset()
user.initiate_chat(
    assistant,
    message=math_problem2,
)

user (to assistant):

Completely simplify and rationalize the denominator: $$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$$

--------------------------------------------------------------------------------
assistant (to user):

To simplify and rationalize the given expression, here is the Python code we can use, implementing the sympy library for symbolic computation:

```python
# Import the necessary library
from sympy import sqrt, simplify

# Define the expression 
expression = (sqrt(160) / sqrt(252)) * (sqrt(245) / sqrt(108))

# Simplify the expression
simplified_expression = simplify(expression)

# Print the result
print(simplified_expression)
```
For this Python code to work, make sure that you have sympy library installed. If not, you can install it by using the command: `pip install sympy` in your terminal/command prompt.

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is py

In [9]:
assistant = AssistantAgent(
    "assistant",
    llm_config={
        "seed": 44,
        "config_list": config_list,
    }
)

assistant.reset()
user.initiate_chat(
    assistant,
    message=math_problem1,
)

assistant.reset()
user.initiate_chat(
    assistant,
    message=math_problem2,
)

user (to assistant):

Completely simplify and rationalize the denominator: $$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$$

--------------------------------------------------------------------------------
assistant (to user):

We can solve this task by using python library called `sympy`. In SymPy, `sympy.Rational` and `sympy.sqrt` creates a number that will not be evaluated until necessary. This will allow us to perform exact computations and get an exact output, rather than an approximation.

Here is the Python code to simplify and rationalize this equation:

```python
from sympy import sqrt, simplify

# define the numbers
num1 = sqrt(160)
den1 = sqrt(252)
num2 = sqrt(245)
den2 = sqrt(108)

# perform the multiplication and simplify
result = simplify((num1/den1) * (num2/den2))

print(result)
```

This script will print out the simplified and rationalized form of the given expression.

--------------------------------------------------------------------------------

## Use LangChain ReAct + PythonREPL

In [5]:
from langchain.chat_models import ChatOpenAI
api_key = open("key_openai.txt", "r").read().strip()
llm = ChatOpenAI(model_name='gpt-4-0613', openai_api_key=api_key)

from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.tools.python.tool import PythonREPLTool
from langchain.agents.agent_toolkits import create_python_agent

agent = create_python_agent(
    llm=llm,
    tool=PythonREPLTool(),
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
)

In [4]:
result = agent.run(math_problem1)



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3mTo simplify this expression, I will first simplify each square root individually, then multiply the fractions as normal. The result should be a rational number.
Action: Python_REPL
Action Input: import math

# Calculate square roots
sqrt_160 = math.sqrt(160)
sqrt_252 = math.sqrt(252)
sqrt_245 = math.sqrt(245)
sqrt_108 = math.sqrt(108)

# Multiply fractions
numerator = sqrt_160 * sqrt_245
denominator = sqrt_252 * sqrt_108
result = numerator / denominator

print(result)[0m
Observation: [36;1m[1;3m1.200137166371826
[0m
Thought:[32;1m[1;3mI now know the final answer
Final Answer: The simplified and rationalized form of the expression is approximately 1.200137166371826.[0m

[1m> Finished chain.[0m


In [36]:
result = agent.run(math_problem1)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mFirst, I should simplify each square root separately, then I will simplify the entire equation. 
Action: [Python_REPL]
Action Input: 
```python
import math

root_160 = math.sqrt(160)
root_252 = math.sqrt(252)
root_245 = math.sqrt(245)
root_108 = math.sqrt(108)

print(root_160, root_252, root_245, root_108)
```[0m
Observation: [Python_REPL] is not a valid tool, try another one.
Thought:[32;1m[1;3mThe Python_REPL tool is not available. I will have to solve this manually. First, I need to simplify each square root individually. It may be easier to break each square root into its prime factors to simplify.
Action: [None]
Action Input: [None][0m
Observation: [None] is not a valid tool, try another one.
Thought:[32;1m[1;3mI will start by breaking each number down into its prime factors and then simplify the square roots. 
Action: [None]
Action Input: [None][0m
Observation: [None] is not a valid tool, try another one.
Thought

In [37]:
result = agent.run(math_problem1)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mFirst, I need to simplify each square root in the expression individually. I can do this by finding the prime factorization of each number under the square root, then taking the square root of each factor. If a factor appears twice, it can be taken out from under the square root. Let's start by simplifying the square roots.
Action: Python_REPL
Action Input: 
```python
import math

def simplify_sqrt(n):
    # Find the prime factors of the number
    factors = []
    while n % 2 == 0:
        factors.append(2)
        n = n / 2
    for i in range(3, int(math.sqrt(n)) + 1, 2):
        while n % i == 0:
            factors.append(i)
            n = n / i
    if n > 2:
        factors.append(n)
    
    # Simplify the square root
    sqrt_factors = []
    for i in range(0, len(factors), 2):
        if i + 1 < len(factors) and factors[i] == factors[i + 1]:
            sqrt_factors.append(factors[i])
        else:
            sqrt_f

In [17]:
result = agent.run(math_problem2)



[1m> Entering new AgentExecutor chain...[0m




[32;1m[1;3mThe question involves finding the least common multiple (LCM) of the two camera intervals (11 minutes and 7 minutes) and then determining how many times this LCM fits into the time period from 6 AM to noon, with the constraint that Camera B starts at 7 AM. 

First, I'll find the LCM of 7 and 11. Then, I'll find out how many times this LCM fits into the time period from 7 AM to noon. Since we know that the cameras take a picture at the same time at four different times before noon, I can subtract the number of minutes corresponding to three instances of the LCM from the time at noon (12 PM) to get the time for the last picture.
Action: Python_REPL
Action Input: 
```python
import math

# Find the LCM of 7 and 11
lcm = math.lcm(7, 11)
print(f"LCM of 7 and 11: {lcm} minutes")

# Find out how many times the LCM fits into the time period from 7 AM to noon
minutes_from_7_to_noon = (12-7)*60
fits = minutes_from_7_to_noon // lcm
print(f"LCM fits into the time period from 7 AM to no

In [18]:
result = agent.run(math_problem2)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe problem involves finding the least common multiple (LCM) of the time intervals of the two cameras and then finding out when they would take the last picture together before noon. 

The first step is to find the LCM of 11 and 7, which will give us the interval at which both cameras take pictures together. 

Next, we calculate how many times this interval fits into the time between 7 AM (when both cameras are taking pictures) and noon, which is 5 hours or 300 minutes.

Then, we multiply the number of intervals by the length of the interval to find out when the last picture is taken.

Finally, we subtract this from 300 to find out how many minutes before noon this is. 

We can solve this using Python. 

Action: Python_REPL
Action Input: 
```python
import math

# Define the picture-taking intervals of the two cameras
interval_A = 11
interval_B = 7

# Calculate the least common multiple (LCM) of the two intervals
lcm = interva

In [19]:
result = agent.run(math_problem2)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mTo solve this problem, we need to calculate the times when both cameras take pictures. We know that Camera A starts at 6 AM and takes a picture every 11 minutes, and Camera B starts at 7 AM and takes a picture every 7 minutes. We are looking for the fourth time these cameras take a picture at the same time. This is a least common multiple problem where we are trying to find the fourth least common multiple of 11 (Camera A's frequency) and 7 (Camera B's frequency), but we need to consider the starting times of each camera. Therefore, we'll calculate the least common multiple of 11 and 7, then multiply it by 4 to get the fourth occurrence, then subtract the starting times of each camera to find the exact time before noon. To implement this in Python, we need to first calculate the least common multiple (LCM) of 11 and 7.
Action: Python_REPL
Action Input: 
```python
def gcd(a,b):
    if b==0:
        return a
    else:
        r

## Auto-GPT

AutoGPT fails to solve both problem. Both problem are terminated by the user.


```
Welcome back!  Would you like me to return to being MathSolverGPT?
  Asking user via keyboard...
Continue with the last settings?
Name:  MathSolverGPT
Role:  an intelligent math-solving AI that assists students, professionals, and enthusiasts in solving a wide range of math problems, from basic arithmetic to
Goals: ['Provide step-by-step solutions to math problems, ensuring clarity and understanding for the user.', 'Offer explanations and insights into the underlyi
API Budget: infinite
Continue (y/n): y
NOTE:All files/directories created by this agent can be found inside its workspace at:  /app/autogpt/workspace/auto_gpt_workspace
MathSolverGPT  has been created with the following details:
Name:  MathSolverGPT
Role:  an intelligent math-solving AI that assists students, professionals, and enthusiasts in solving a wide range of math problems, from basic arithmetic to advanced calculus and beyond.
Goals: 
-  Provide step-by-step solutions to math problems, ensuring clarity and understanding for the user.
-  Offer explanations and insights into the underlying concepts and principles behind the solutions, fostering a deeper understanding of mathematics.
-  Handle a variety of math problem types, including algebra, geometry, trigonometry, calculus, and more.
-  Assist in solving complex and challenging math problems, pushing the boundaries of mathematical knowledge and problem-solving capabilities.
-  Continuously learn and improve by incorporating user feedback and staying up-to-date with the latest advancements in mathematics.
Using memory of type:  JSONFileMemory
Using Browser:  chrome
MATHSOLVERGPT THOUGHTS:  Without a specific task or question, it's difficult to determine exactly one command to use. The user hasn't provided any context or problem to solve.
REASONING:  The user's request is too vague. They haven't specified what they want me to do or what problem they want me to solve.
PLAN: 
-  Ask the user for more information or a specific task.
CRITICISM:  I should have asked the user for more information or a specific task in the first place.
SPEAK:  Could you please provide more information or specify a task?

NEXT ACTION:  COMMAND =  ARGUMENTS = {}
  Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for MathSolverGPT...
  Asking user via keyboard...
Input:Completely simplify and rationalize the denominator: $$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$$
SYSTEM:  Human feedback: Completely simplify and rationalize the denominator: $$\frac{\sqrt{160}}{\sqrt{252}}\times\frac{\sqrt{245}}{\sqrt{108}}$$
MATHSOLVERGPT THOUGHTS:  To solve this problem, I need to simplify and rationalize the denominator of the given expression. This involves simplifying the square roots and then multiplying the fractions. I can use Python's math library to calculate the square roots and perform the multiplication.
REASONING:  The problem involves simplifying square roots and multiplying fractions. Python's math library provides functions for these operations.
PLAN: 
-  Write a Python script to calculate the square roots and multiply the fractions.
-  Execute the script to get the simplified expression.
CRITICISM:  I need to ensure that I correctly calculate the square roots and perform the multiplication. I should also make sure to rationalize the denominator, as the problem specifies.
SPEAK:  Let's simplify and rationalize the denominator of the given expression.

NEXT ACTION:  COMMAND = execute_python_code ARGUMENTS = {'code': 'import math\n\n# Calculate the square roots\nsqrt_160 = math.sqrt(160)\nsqrt_252 = math.sqrt(252)\nsqrt_245 = math.sqrt(245)\nsqrt_108 = math.sqrt(108)\n\n# Multiply the fractions\nresult = (sqrt_160 / sqrt_252) * (sqrt_245 / sqrt_108)\n\n# Simplify the result\nsimplified_result = math.simplify(result)\n\nsimplified_result', 'name': 'simplify_expression'}
  Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for MathSolverGPT...
  Asking user via keyboard...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
  Executing python file '/app/autogpt/workspace/auto_gpt_workspace/MathSolverGPT/executed_code/simplify_expression.py' in working directory '/app/autogpt/workspace/auto_gpt_workspace'
SYSTEM:  Command execute_python_code returned: Error: Traceback (most recent call last): File "/app/autogpt/workspace/auto_gpt_workspace/MathSolverGPT/executed_code/simplify_expression.py", line 13, in <module> simplified_result = math.simplify(result) AttributeError: module 'math' has no attribute 'simplify'
MATHSOLVERGPT THOUGHTS:  The error message indicates that the 'math' module in Python does not have a 'simplify' function. I need to use the 'sympy' library instead, which provides a function for simplifying expressions.
REASONING:  The 'math' module in Python does not provide a function for simplifying expressions. The 'sympy' library, on the other hand, does provide such a function.
PLAN: 
-  Rewrite the Python script to use the 'sympy' library instead of the 'math' module.
-  Execute the script to get the simplified expression.
CRITICISM:  I should have known that the 'math' module does not provide a function for simplifying expressions. I should have used the 'sympy' library from the start.
SPEAK:  Let's try again using the 'sympy' library to simplify the expression.

NEXT ACTION:  COMMAND = execute_python_code ARGUMENTS = {'code': 'import sympy as sp\n\n# Calculate the square roots\nsqrt_160 = sp.sqrt(160)\nsqrt_252 = sp.sqrt(252)\nsqrt_245 = sp.sqrt(245)\nsqrt_108 = sp.sqrt(108)\n\n# Multiply the fractions\nresult = (sqrt_160 / sqrt_252) * (sqrt_245 / sqrt_108)\n\n# Simplify the result\nsimplified_result = sp.simplify(result)\n\nsimplified_result', 'name': 'simplify_expression'}
  Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for MathSolverGPT...
  Asking user via keyboard...
Input:y -5
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
  Executing python file '/app/autogpt/workspace/auto_gpt_workspace/MathSolverGPT/executed_code/simplify_expression.py' in working directory '/app/autogpt/workspace/auto_gpt_workspace'
SYSTEM:  Command execute_python_code returned:
MATHSOLVERGPT THOUGHTS:  The command execution didn't return any output, which is unusual. It seems there might be an issue with the execution of the Python code. I need to debug this.
REASONING:  The Python code execution should have returned a result. The absence of a result suggests that there might be an issue with the code or its execution.
PLAN: 
-  Review the Python code to identify any potential issues.
-  If necessary, modify the code and execute it again.
CRITICISM:  I should have ensured that the Python code was correct and would produce a result before executing it.
SPEAK:  It seems there was an issue with the execution of the Python code. Let's try again.

NEXT ACTION:  COMMAND = execute_python_code ARGUMENTS = {'code': 'import sympy as sp\n\n# Calculate the square roots\nsqrt_160 = sp.sqrt(160)\nsqrt_252 = sp.sqrt(252)\nsqrt_245 = sp.sqrt(245)\nsqrt_108 = sp.sqrt(108)\n\n# Multiply the fractions\nresult = (sqrt_160 / sqrt_252) * (sqrt_245 / sqrt_108)\n\n# Simplify the result\nsimplified_result = sp.simplify(result)\n\nsimplified_result', 'name': 'simplify_expression'}

AUTHORISED COMMANDS LEFT: 4 
  Executing python file '/app/autogpt/workspace/auto_gpt_workspace/MathSolverGPT/executed_code/simplify_expression.py' in working directory '/app/autogpt/workspace/auto_gpt_workspace'
SYSTEM:  Command execute_python_code returned:
MATHSOLVERGPT THOUGHTS:  The Python code execution still isn't returning any output. It's possible that the issue lies with the calculation of the square roots or the multiplication of the fractions. I need to isolate these operations to identify the problem.
REASONING: ^C
Aborted!
```