### Test AI Agent 🤖 Tool Calling with DeepEval 🧪

Testing AI Agent involves testing of the Tools being invoked by an AI Agent. Here, AI Agent will invoke the necessary tools based on the given input and respond with the help of the tools being bounded with the AI Agent


<img src="./img/AIAGent.png" width="800" height="400" style="display: block; margin: auto;">

In [5]:
#!pip install -qU duckduckgo-search


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [17]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    base_url="http://localhost:11434",
    model = "qwen2.5:latest",
    temperature=0.5,
    max_tokens = 250
)

#### AI Agent with Tools

In [19]:
from langchain.tools import tool
from langchain.agents import initialize_agent, AgentType
from langchain_community.tools import DuckDuckGoSearchRun

search_tool = DuckDuckGoSearchRun()

@tool
def add_numbers(a: int, b: int) -> int:
    "Add two numbers and return results."
    return int(a) + int(b)

@tool
def subtract_numbers(a: int, b: int) -> int:
    "Subtract two numbers and return results."
    return int(a) - int(b)

tools = [add_numbers, subtract_numbers, search_tool]

agent = initialize_agent(
    tools= tools,
    llm=llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    return_intermediate_steps=True
)

response = agent.invoke("Who is the current president of USA in 2025, just give the name")

print(response)

def query_ai_agent(question):
    response = agent.invoke(question)
    intermediate_steps = response['intermediate_steps']
    agent_action, results = intermediate_steps[0]
    tool = agent_action.tool
    tool_input = agent_action.tool_input
    return response,tool, tool_input,




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to search for the current president of USA in 2025.

Action:
```
{
  "action": "duckduckgo_search",
  "action_input": "current president of USA in 2025"
}
```[0m
Observation: [38;5;200m[1;3mTrump will take office at the presidential inauguration on Monday, 20 January 2025. At that point he will legally assume the role of president. Just how big was Donald Trump's election... Donald Trump has taken the oath of office to become the 47th president of the United States, returning to the White House four years after he left it in defeat. In his inaugural speech on... Donald Trump has officially become the 47th president of the United States. In a special ceremony called an inauguration, President Trump took over from Joe Biden as America's new leader. WASHINGTON − Donald Trump was sworn in Monday as the 47th president of the United States, returning to the White House after overcoming four criminal indictments a

In [80]:
response,tool, tool_input = query_ai_agent("Who is the president of USA in 2025, just give me the name")

print(response)

print(tool)

print(tool_input)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to search for the current and future presidents to determine who will be the president in 2025.

Action:
```
{
  "action": "duckduckgo_search",
  "action_input": "president of USA in 2025"
}
```[0m
Observation: [38;5;200m[1;3mThe incumbent president is Donald Trump, who assumed office on January 20, 2025. [ 5 ] [ 6 ] Since the office was established in 1789, 45 men have served in 47 presidencies; the discrepancy arises from two individuals elected to non-consecutive terms: Grover Cleveland is counted as the 22nd and 24th president of the United States ... Trump will take office at the presidential inauguration on Monday, 20 January 2025. At that point he will legally assume the role of president. Just how big was Donald Trump's election... Donald Trump has officially become the 47th president of the United States. In a special ceremony called an inauguration, President Trump took over from Joe Biden as Amer

### Testing AI Agent with DeepEval

In [22]:
import deepeval

deepeval.login_with_confident_api_key("chf7LtTWtK1foTOAiK+vHFZ622I16kZtcpzfC+7FAVU=")



In [23]:
!deepeval set-ollama qwen2.5:latest

🙌 Congratulations! You're now using a local Ollama model for all evals that 
require an LLM.


In [81]:
from deepeval.test_case import ToolCall

test_data = [
    {
        "input": "What is the sum of 20 and 40",
        "expected_output": "60",
        "tool_called": [
            ToolCall(name = "add_numbers")
        ]
    },
     {
        "input": "Who is the president of USA in 2025, just give me the name",
        "expected_output": "Donald Trump",
        "tool_called": [
            ToolCall(name = "duckduckgo_search")
        ]
    }
]

In [82]:
test_data

[{'input': 'What is the sum of 20 and 40',
  'expected_output': '60',
  'tool_called': [ToolCall(
       name="add_numbers"
   )]},
 {'input': 'Who is the president of USA in 2025, just give me the name',
  'expected_output': 'Donald Trump',
  'tool_called': [ToolCall(
       name="duckduckgo_search"
   )]}]

In [31]:
test_data[0]['tool_called']

[ToolCall(
     name="add_numbers"
 )]

In [85]:
from deepeval.test_case import LLMTestCase
from deepeval.metrics import ToolCorrectnessMetric

test_cases = []
for testcase in test_data:
  response,tool, tool_input = query_ai_agent(testcase['input'])
  test_case = LLMTestCase(
    input=testcase['input'],
    tools_called=[ToolCall(name=tool)],
    actual_output=response,
    expected_tools=testcase['tool_called']
  )

  test_cases.append(test_case)





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```
{
  "action": "add_numbers",
  "action_input": {
    "a": 20,
    "b": 40
  }
}
```[0m
Observation: [36;1m[1;3m60[0m
Thought:[32;1m[1;3mAction:
```
{
  "action": "Final Answer",
  "action_input": "The sum of 20 and 40 is 60."
}
```[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I can use DuckDuckGo to search for this information.

Action:
```
{
  "action": "duckduckgo_search",
  "action_input": "who is the president of USA in 2025"
}
```[0m
Observation: [38;5;200m[1;3mThe incumbent president is Donald Trump, who assumed office on January 20, 2025. [5][6] Since the office was established in 1789, 45 men have served in 47 presidencies; the discrepancy arises from two individuals elected to non-consecutive terms: Grover Cleveland is counted as the 22nd and 24th president of the United States, while Donald Trump ... Trump will take office at the presidenti

In [86]:
test_cases

[LLMTestCase(input='What is the sum of 20 and 40', actual_output={'input': 'What is the sum of 20 and 40', 'output': 'The sum of 20 and 40 is 60.', 'intermediate_steps': [(AgentAction(tool='add_numbers', tool_input={'a': 20, 'b': 40}, log='Action:\n```\n{\n  "action": "add_numbers",\n  "action_input": {\n    "a": 20,\n    "b": 40\n  }\n}\n```'), 60)]}, expected_output=None, context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=[ToolCall(
     name="add_numbers"
 )], expected_tools=[ToolCall(
     name="add_numbers"
 )], reasoning=None, name=None),
 LLMTestCase(input='Who is the president of USA in 2025, just give me the name', actual_output={'input': 'Who is the president of USA in 2025, just give me the name', 'output': 'Donald Trump', 'intermediate_steps': [(AgentAction(tool='duckduckgo_search', tool_input='who is the president of USA in 2025', log='Thought: I can use DuckDuckGo to search for this information.\n\nAction:\n```\n{\n  "action": "duc

In [88]:
metrics = ToolCorrectnessMetric()

for testcase in test_cases:
    metrics.measure(test_case=testcase)
    print(metrics.score)
    print(metrics.reason)
    print(metrics.expected_tools)

1.0
All expected tools ['add_numbers'] were called (order not considered).
[ToolCall(
    name="add_numbers"
)]


1.0
All expected tools ['duckduckgo_search'] were called (order not considered).
[ToolCall(
    name="duckduckgo_search"
)]
