Skip to content

Conversation

@quad2524
Copy link

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

Testing Plan

I verified the changes using a reproduction script that tested both the default behavior and the new opt-in behavior.

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Manual End-to-End (E2E) Tests:

I created a reproduction script reproduce_issue.py (which I have since cleaned up) that invoked AgentEvaluator.evaluate on an existing test fixture (tests/integration/fixture/home_automation_agent/simple_test.test.json).

  1. Case 1 (Default): Called with print_detailed_results=True (and implied print_detailed_results_on_success=False). Verified that no detailed metrics table was printed for a passing test.
  2. Case 2 (Opt-in): Called with print_detailed_results=True and print_detailed_results_on_success=True. Verified that the detailed metrics table was printed for the passing test.

CLI Verification:

I also verified the new CLI flag --print_detailed_results_on_success with the adk eval command.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

This change introduces a backward-compatible optional argument print_detailed_results_on_success to AgentEvaluator.evaluate and evaluate_eval_set.
It also adds the --print_detailed_results_on_success flag to the adk eval CLI command.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @quad2524, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the agent evaluation system by providing an opt-in mechanism to display detailed evaluation metrics for tests that successfully pass. Previously, detailed results were only presented for failing tests. This change introduces a new parameter and a corresponding CLI flag, allowing users greater control over the verbosity of evaluation output, particularly useful for in-depth analysis of successful agent performance.

Highlights

  • New Feature: Introduced the ability to print detailed evaluation results for agent tests even when all tests pass, which was previously only shown for failures.
  • API Changes: Added a new optional boolean parameter, print_detailed_results_on_success, to the AgentEvaluator.evaluate and evaluate_eval_set functions. This parameter defaults to False to maintain backward compatibility.
  • CLI Integration: A new CLI flag, --print_detailed_results_on_success, has been added to the adk eval command, allowing users to control this behavior directly from the command line.
  • Conditional Output Logic: Modified the internal logic for printing detailed evaluation results to respect the new print_detailed_results_on_success flag, ensuring detailed output is only shown for passing tests when explicitly requested.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Dec 10, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new feature to print detailed evaluation results even for passing tests, controlled by the print_detailed_results_on_success argument. The changes are well-implemented across both the CLI command adk eval and the programmatic AgentEvaluator methods, ensuring consistent behavior. The logic correctly handles the new option. I've included a couple of suggestions to refactor the conditional logic for improved readability and maintainability.

@ryanaiagent ryanaiagent self-assigned this Dec 12, 2025
@ryanaiagent ryanaiagent added the request clarification [Status] The maintainer need clarification or more information from the author label Dec 12, 2025
@ryanaiagent
Copy link
Collaborator

Hi @quad2524, Thank you for your contribution! We appreciate you taking the time to submit this pull request.
Can you fix the lint error using autoformat.sh.
It looks like the test failures here are unrelated to your changes. They were caused by a temporary issue on our main branch that has since been fixed.
Could you please update your branch with the latest changes from main? This will pull in the fix and should get the tests to pass.

@quad2524 quad2524 force-pushed the feat/eval_print_results_on_success branch from 29bcb87 to 852da24 Compare December 12, 2025 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation request clarification [Status] The maintainer need clarification or more information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Agent Evaluations print detailed results even when all tests pass

3 participants