Skip to content

Agent Evaluation not evaluating the tool trajectory properly #3439

@miyannishar

Description

@miyannishar

During Agent Evaluation:

According to this sentence mentioned in the doc, for my case:

tests/agent_evaluations/test_agent_evaluations.py::test_agent_evaluations[journal_agent.test] Summary: `EvalStatus.NOT_EVALUATED` for Metric: `tool_trajectory_avg_score`. Expected threshold: `0.6`, actual value: `None`.
+----+-------------------+---------+-------------+---------------------------------------------------------------+---------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|    | eval_status       |   score |   threshold | prompt                                                        | expected_response   | actual_response   | expected_tool_calls                                                                                                                                                                                                                       | actual_tool_calls                                                                                                                                                                                                                              |
+====+===================+=========+=============+===============================================================+=====================+===================+===========================================================================================================================================================================================================================================+================================================================================================================================================================================================================================================+
|  0 | EvalStatus.FAILED |       0 |         0.6 | I need to create a journal entry to accrue $50,000 in revenue |                     |                   | id=None args={'agent_name': 'journal_agent'} name='transfer_to_agent'                                                                                                                                                                     | id='call_AAN5F6rdDkHFtytOr1UZvsmc' args={'agent_name': 'journal_agent'} name='transfer_to_agent'                                                                                                                                               |
|    |                   |         |             |                                                               |                     |                   | id=None args={'request': "Create a journal entry to accrue $50,000 in revenue. Set journal type to Accrual and amount to 50,000. Leave ledger and period for user selection.'} name='form_generation_agent"} name='form_generation_agent' | id='call_Vd8tuvc74i9UFKhkJOJUqjUX' args={'request': 'Create a journal entry to accrue $50,000 in revenue. Journal type: Accrual, Amount: 50,000 (credit revenue, debit accounts receivable or accrued revenue).'} name='form_generation_agent' |
+----+-------------------+---------+-------------+---------------------------------------------------------------+---------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | EvalStatus.FAILED |       0 |         0.6 | I need to create a journal entry to accrue $50,000 in revenue |                     |                   | id=None args={'agent_name': 'journal_agent'} name='transfer_to_agent'                                                                                                                                                                     | id='call_Ds9YDzxii51DyWt62Erzvxc9' args={'agent_name': 'journal_agent'} name='transfer_to_agent'                                                                                                                                               |
|    |                   |         |             |                                                               |                     |                   | id=None args={'request': "Create a journal entry to accrue $50,000 in revenue. Set journal type to Accrual and amount to 50,000. Leave ledger and period for user selection.'} name='form_generation_agent"} name='form_generation_agent' | id='call_mHIHBlNcx0WQKYP1VbhasbZM' args={'request': 'Create a journal entry to accrue $50,000 in revenue. Amount: 50000, type: Accrual, purpose: revenue accrual'} name='form_generation_agent'                                                |
+----+-------------------+---------+-------------+---------------------------------------------------------------+---------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

For the first test case, the score should be 0.5 because the first tool call match exactly giving it 1 if we don't consider id and second tool call will be 0 as it doesn't match but its still saying 0 in the tool trajectory avg score.

Desktop (please complete the following information):

  • OS: [e.g. macOS, Linux, Windows] Mac
  • Python version(python -V): 3.12
  • ADK version(pip show google-adk): 2.0.0

Model Information:

  • Are you using LiteLLM: Yes

Metadata

Metadata

Assignees

Labels

eval[Component] This issue is related to evaluation

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions