fix: make _EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support#5215
Conversation
When using conversation_scenario for user simulation, expected_invocation is None because conversations are dynamically generated. The public model EvalMetricResultPerInvocation already types this as Optional[Invocation], but the private _EvalMetricResultWithInvocation requires non-None, causing a pydantic ValidationError during post-processing. - Make expected_invocation Optional[Invocation] = None - Guard attribute accesses in _print_details to handle None - Fall back to actual_invocation.user_content for the prompt column Fixes google#5214
|
Response from ADK Triaging Agent Hello @ASRagab, thank you for submitting this pull request! To help the reviewers, could you please add a Including the logs or a screenshot showing that the You can find more details in our contribution guidelines. Thanks! |
_EvalMetricResultWithInvocation.expected_invocation Optional for conversation_scenario support
Testing Evidence for PR #5215Reproduction ScriptA targeted script that exercises the exact codepath fixed by this PR:
Before (PyPI
|
| Check | Result |
|---|---|
expected_invocation: Optional[Invocation] = None (line 93) |
None accepted without ValidationError |
_print_details prompt fallback to actual_invocation.user_content |
Works correctly |
_print_details expected_response fallback to None |
_convert_content_to_text(None) returns "" |
_print_details expected_tool_calls fallback to None |
_convert_tool_calls_to_text(None) returns "" |
Non-None expected_invocation (regression) |
Still works as before |
Context
This was tested using a conversation_scenario-based evalset from an agent project. The multi-turn evalset has 5 cases that all use conversation_scenario (no explicit conversation array), which is exactly the codepath where local_eval_service.py sets expected_invocation=None during post-processing.
|
Hi @ASRagab , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @wukath , can you please review this. |
|
cc @ankursharmas if you could take a look |
Summary
_EvalMetricResultWithInvocation.expected_invocationis typed asInvocation(required), butlocal_eval_service.py:285-287intentionally sets it toNonewheneval_case.conversationisNone(i.e.,conversation_scenariouser-simulation cases)EvalMetricResultPerInvocationineval_metrics.py:323already types this field asOptional[Invocation] = NoneValidationErrorduring post-processing in_get_eval_metric_results_with_invocation, after all metrics have been computedChanges
expected_invocationOptional[Invocation] = Nonein_EvalMetricResultWithInvocation_print_detailsto handleNone(fall back toactual_invocation.user_contentfor the prompt column,Nonefor expected response/tool calls)_convert_content_to_textand_convert_tool_calls_to_textalready acceptOptionalparametersTesting Plan
Verified with a pytest-based evaluation using
AgentEvaluator.evaluate()against an evalset containingconversation_scenariocases (LLM-backed user simulation, no explicitconversationarrays).Before fix — crashes after ~33 minutes of metric computation during post-processing:
After fix — the
ValidationErroris eliminated. TheNoneexpected_invocation flows through correctly because:Optional[Invocation], matching the upstreamEvalMetricResultPerInvocationmodel_print_detailsgracefully handlesNoneby falling back toactual_invocation.user_contentfor the prompt column and passingNoneto_convert_content_to_text/_convert_tool_calls_to_text(both already acceptOptionalinputs)Reproduction evalset (any evalset with
conversation_scenariotriggers this):{ "eval_set_id": "test", "eval_cases": [{ "eval_id": "scenario_1", "conversation_scenario": { "starting_prompt": "Hello", "conversation_plan": "Ask the agent a question and accept the answer." }, "session_input": {"app_name": "my_agent", "user_id": "user1", "state": {}} }] }Fixes #5214