Skip to content

Evaluations: record a tool that is used in an AgentTool #3493

@zdenulo

Description

@zdenulo

Is your feature request related to a problem? Please describe.

I'm trying to set/run evaluation for the following setup:
RootAgent -> AgentTool -> tool

When I create evaluation in Web UI, it tracks only request to AgentTool, not to the tool that Agent is using.
When Agent is used as a Sub Agent instead of an AgentTool, it works as expected.
I'm using ADK 1.17.0

Describe the solution you'd like
I would expect to record correct chain of execution, i.e. RootAgent -> AgentTool -> tool

Describe alternatives you've considered
Not sure if I'm doing something wrong or this behaviour is by design. In my case I have a root agent that has multiple AgentTools and each AgentTool has multiple tools. In my evaluations I would like to correctly test a trajectory, i.e. if the root agent calls correct AgentTool and AgentTool is calling correct tool for various prompts / test cases.

Additional context
This is the evalutation config created by Web UI when using AgentTool

{
  "eval_set_id": "tool",
  "name": "tool",
  "eval_cases": [
    {
      "eval_id": "case81d53f",
      "conversation": [
        {
          "invocation_id": "e-10eb2908-18da-4388-92e7-cc82d2b0f0e8",
          "user_content": {
            "parts": [
              {
                "text": "How many words are in the text: Hello world."
              }
            ],
            "role": "user"
          },
          "final_response": {
            "parts": [
              {
                "text": "There are 2 words in the text.\n"
              }
            ],
            "role": "model"
          },
          "intermediate_data": {
            "invocation_events": [
              {
                "author": "root_agent",
                "content": {
                  "parts": [
                    {
                      "function_call": {
                        "id": "adk-6cb02254-f650-4fd5-a63d-cc4697516bb6",
                        "args": {
                          "request": "How many words are in the text: Hello world."
                        },
                        "name": "text_analyzer_agent"
                      }
                    }
                  ],
                  "role": "model"
                }
              },
              {
                "author": "root_agent",
                "content": {
                  "parts": [
                    {
                      "function_response": {
                        "id": "adk-6cb02254-f650-4fd5-a63d-cc4697516bb6",
                        "name": "text_analyzer_agent",
                        "response": {
                          "result": "There are 2 words in the text.\n"
                        }
                      }
                    }
                  ],
                  "role": "user"
                }
              }
            ]
          },
          "creation_timestamp": 1762852203.639618
        }
      ],
      "session_input": {
        "app_name": "eval_demo_agent",
        "user_id": "user"
      },
      "creation_timestamp": 1762852264.835553
    }
  ],
  "creation_timestamp": 1762852229.347887
}

This is the evalutation config created by Web UI when using a SubAgent:

{
  "eval_set_id": "sub_agent",
  "name": "sub_agent",
  "eval_cases": [
    {
      "eval_id": "caseedc7e8",
      "conversation": [
        {
          "invocation_id": "e-c811df8d-a6f4-4a99-8672-cda07ef3f789",
          "user_content": {
            "parts": [
              {
                "text": "how many words are in this text: \nHello world."
              }
            ],
            "role": "user"
          },
          "final_response": {
            "parts": [
              {
                "text": "There are 2 words in the text.\n"
              }
            ],
            "role": "model"
          },
          "intermediate_data": {
            "invocation_events": [
              {
                "author": "root_agent",
                "content": {
                  "parts": [
                    {
                      "function_call": {
                        "id": "adk-8e73d8e8-e443-4fba-8026-c37102f0b1a1",
                        "args": {
                          "agent_name": "text_analyzer_agent"
                        },
                        "name": "transfer_to_agent"
                      }
                    }
                  ],
                  "role": "model"
                }
              },
              {
                "author": "root_agent",
                "content": {
                  "parts": [
                    {
                      "function_response": {
                        "id": "adk-8e73d8e8-e443-4fba-8026-c37102f0b1a1",
                        "name": "transfer_to_agent",
                        "response": {
                          "result": null
                        }
                      }
                    }
                  ],
                  "role": "user"
                }
              },
              {
                "author": "text_analyzer_agent",
                "content": {
                  "parts": [
                    {
                      "function_call": {
                        "id": "adk-d5a9ba4a-cfbb-4965-9a23-d8e0bc8c6c6e",
                        "args": {
                          "text": "Hello world."
                        },
                        "name": "count_words"
                      }
                    }
                  ],
                  "role": "model"
                }
              },
              {
                "author": "text_analyzer_agent",
                "content": {
                  "parts": [
                    {
                      "function_response": {
                        "id": "adk-d5a9ba4a-cfbb-4965-9a23-d8e0bc8c6c6e",
                        "name": "count_words",
                        "response": {
                          "result": 2
                        }
                      }
                    }
                  ],
                  "role": "user"
                }
              }
            ]
          },
          "creation_timestamp": 1762790870.616564
        }
      ],
      "session_input": {
        "app_name": "eval_demo_agent",
        "user_id": "user"
      },
      "creation_timestamp": 1762791281.553806
    }
  ],
  "creation_timestamp": 1762791266.380299
}

Here is minimal setup for testing:

from textwrap import dedent
from google.adk import Agent
from google.adk.tools import AgentTool
from google.adk.tools.function_tool import FunctionTool

LLM_MODEL = 'gemini-2.5-flash'


def count_words(text: str) -> int:
    """Counts the number of words in a text.

    :param text: The text to count the words in.
    :return: The number of words in the text.
    """
    print("\n****** calling count_words\n")
    words_count = len(text.split(' '))
    print(f"\n****** got {words_count} words\n")
    return words_count


def count_sentences(text: str) -> int:
    """Counts the number of sentences in a text.

    :param text: The text to count the sentences in.
    :return: The number of sentences in the text."""

    print("\n****** calling count_sentences\n")
    sentences = text.split('.')
    sentences_count = len([x for x in sentences if x])
    print(f"\n****** got {sentences_count} sentences\n")
    return sentences_count


text_analyzer_agent = Agent(
    model=LLM_MODEL,
    name='text_analyzer_agent',
    description='Agent capable of analyzing text, like how many words or how many sentences are in the text, etc.',
    instruction=dedent("""
    Use available tools to answer user questions.    
    - If a user asks to how many sentences are in the text use `count_sentences` tool. 
    - If a user asks how many words are in the text, use `count_words` tool.
    
    - If a user asks something else, respond that you can't do it, do not try to use it on your own.
    """),
    tools=[
        FunctionTool(count_words),
        FunctionTool(count_sentences),
    ],
)

root_agent = Agent(
    model=LLM_MODEL,
    name='root_agent',
    description="""An agent that answers user questions.""",
    instruction=dedent("""
    You are an agent whose job is to answer user questions.
    
    - If a user ask question about the text, like how many words are in the text or how many sentences use `text_analyzer_agent` tool.
    - If a user asks something else, respond that you can't do it, do not try to use it on your own.
"""),
    tools=[AgentTool(text_analyzer_agent, skip_summarization=False)],
    # sub_agents=[text_analyzer_agent],
)

Metadata

Metadata

Assignees

Labels

eval[Component] This issue is related to evaluation

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions