Skip to content

fix: make _EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support#5215

Merged
ankursharmas merged 2 commits intogoogle:mainfrom
ASRagab:fix/optional-expected-invocation
Apr 14, 2026
Merged

fix: make _EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support#5215
ankursharmas merged 2 commits intogoogle:mainfrom
ASRagab:fix/optional-expected-invocation

Conversation

@ASRagab
Copy link
Copy Markdown
Contributor

@ASRagab ASRagab commented Apr 8, 2026

Summary

  • _EvalMetricResultWithInvocation.expected_invocation is typed as Invocation (required), but local_eval_service.py:285-287 intentionally sets it to None when eval_case.conversation is None (i.e., conversation_scenario user-simulation cases)
  • The public model EvalMetricResultPerInvocation in eval_metrics.py:323 already types this field as Optional[Invocation] = None
  • This mismatch causes a pydantic ValidationError during post-processing in _get_eval_metric_results_with_invocation, after all metrics have been computed

Changes

  • Make expected_invocation Optional[Invocation] = None in _EvalMetricResultWithInvocation
  • Guard the three attribute accesses in _print_details to handle None (fall back to actual_invocation.user_content for the prompt column, None for expected response/tool calls)
  • Both _convert_content_to_text and _convert_tool_calls_to_text already accept Optional parameters

Testing Plan

Verified with a pytest-based evaluation using AgentEvaluator.evaluate() against an evalset containing conversation_scenario cases (LLM-backed user simulation, no explicit conversation arrays).

Before fix — crashes after ~33 minutes of metric computation during post-processing:

pydantic_core._pydantic_core.ValidationError: 1 validation error for _EvalMetricResultWithInvocation
expected_invocation
  Input should be a valid dictionary or instance of Invocation [type=model_type, input_value=None, input_type=NoneType]

.venv/lib/python3.11/site-packages/google/adk/evaluation/agent_evaluator.py:639: ValidationError

After fix — the ValidationError is eliminated. The None expected_invocation flows through correctly because:

  1. The field now accepts Optional[Invocation], matching the upstream EvalMetricResultPerInvocation model
  2. _print_details gracefully handles None by falling back to actual_invocation.user_content for the prompt column and passing None to _convert_content_to_text/_convert_tool_calls_to_text (both already accept Optional inputs)

Reproduction evalset (any evalset with conversation_scenario triggers this):

{
  "eval_set_id": "test",
  "eval_cases": [{
    "eval_id": "scenario_1",
    "conversation_scenario": {
      "starting_prompt": "Hello",
      "conversation_plan": "Ask the agent a question and accept the answer."
    },
    "session_input": {"app_name": "my_agent", "user_id": "user1", "state": {}}
  }]
}
@pytest.mark.asyncio
async def test_scenario():
    await AgentEvaluator.evaluate("my_agent", "path/to/evalset.json", num_runs=1)

Fixes #5214

When using conversation_scenario for user simulation, expected_invocation
is None because conversations are dynamically generated. The public model
EvalMetricResultPerInvocation already types this as Optional[Invocation],
but the private _EvalMetricResultWithInvocation requires non-None, causing
a pydantic ValidationError during post-processing.

- Make expected_invocation Optional[Invocation] = None
- Guard attribute accesses in _print_details to handle None
- Fall back to actual_invocation.user_content for the prompt column

Fixes google#5214
@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Apr 8, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Apr 8, 2026

Response from ADK Triaging Agent

Hello @ASRagab, thank you for submitting this pull request!

To help the reviewers, could you please add a testing plan section to your PR description explaining how you verified the fix? For example, did you run the evaluation with a conversation_scenario?

Including the logs or a screenshot showing that the ValidationError is gone after your change would also be very helpful.

You can find more details in our contribution guidelines. Thanks!

Comment thread src/google/adk/evaluation/agent_evaluator.py
@rohityan rohityan self-assigned this Apr 9, 2026
@ASRagab ASRagab changed the title fix: make _EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support fix: make _EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support Apr 9, 2026
@ASRagab
Copy link
Copy Markdown
Contributor Author

ASRagab commented Apr 9, 2026

Testing Evidence for PR #5215

Reproduction Script

A targeted script that exercises the exact codepath fixed by this PR:

  1. Constructs _EvalMetricResultWithInvocation with expected_invocation=None (the conversation_scenario path)
  2. Exercises the three guard paths in _print_details where .user_content, .final_response, and .intermediate_data are accessed on expected_invocation
  3. Verifies the non-None path still works (regression check)

Before (PyPI google-adk==1.28.0, unfixed)

============================================================
TEST 1: _EvalMetricResultWithInvocation(expected_invocation=None)
============================================================
  FAIL: ValidationError: 1 validation error for _EvalMetricResultWithInvocation
expected_invocation
  Input should be a valid dictionary or instance of Invocation [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.12/v/model_type

Pydantic rejects None because the field is typed as Invocation (non-optional).

After (local editable install from fix/optional-expected-invocation branch)

============================================================
TEST 1: _EvalMetricResultWithInvocation(expected_invocation=None)
============================================================
  PASS: constructed successfully
  expected_invocation is None: True

============================================================
TEST 2: Guard paths for None expected_invocation
============================================================
  PASS: prompt = 'Hello'
  PASS: expected_response = ''
  PASS: expected_tool_calls = ''

============================================================
TEST 3: _EvalMetricResultWithInvocation(expected_invocation=<Invocation>)
============================================================
  PASS: constructed with real invocation
  PASS: prompt = 'Hello'

============================================================
ALL TESTS PASSED
============================================================

What was verified

Check Result
expected_invocation: Optional[Invocation] = None (line 93) None accepted without ValidationError
_print_details prompt fallback to actual_invocation.user_content Works correctly
_print_details expected_response fallback to None _convert_content_to_text(None) returns ""
_print_details expected_tool_calls fallback to None _convert_tool_calls_to_text(None) returns ""
Non-None expected_invocation (regression) Still works as before

Context

This was tested using a conversation_scenario-based evalset from an agent project. The multi-turn evalset has 5 cases that all use conversation_scenario (no explicit conversation array), which is exactly the codepath where local_eval_service.py sets expected_invocation=None during post-processing.

@rohityan rohityan added the needs review [Status] The PR/issue is awaiting review from the maintainer label Apr 13, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @ASRagab , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

@rohityan
Copy link
Copy Markdown
Collaborator

Hi @wukath , can you please review this.

@wukath
Copy link
Copy Markdown
Collaborator

wukath commented Apr 13, 2026

cc @ankursharmas if you could take a look

@ankursharmas ankursharmas merged commit a4c9387 into google:main Apr 14, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentEvaluator crashes with ValidationError when evaluating conversation_scenario eval cases

6 participants