Conversation
|
Size Change: -4 B (0%) Total Size: 26.1 MB
ℹ️ View Unchanged
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses issues with Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the ask_user evaluation tests to run in an interactive environment using appEvalTest to fix failures in non-interactive mode.
My review focuses on improving the robustness and clarity of the new test setups. I've identified a conflicting policy configuration in the ask_user tests and an unreliable setTimeout in test-helper.ts, both of which could lead to test flakiness. Please see my detailed comments for suggestions on how to address these issues.
There was a problem hiding this comment.
Code Review
This pull request refactors the AskUser evaluation tests to run in an interactive environment, which is a necessary fix since the AskUser tool is not supported in non-interactive mode. The changes involve introducing a new appEvalTest helper, updating test assertions to wait for interactive confirmations, and enhancing the test rig to support waiting for multiple possible tool calls. A potential issue was found in how default configurations are merged with test-specific overrides, which could lead to incorrect test setups, violating a rule about consistent merge operations.
|
Nightly evals workflow run after recent changes: https://github.com/google-gemini/gemini-cli/actions/runs/23008601171 |
evals/test-helper.ts
Outdated
| ); | ||
| } | ||
|
|
||
| // Bypassing terminal keybindings setup prompt for interactive tests |
There was a problem hiding this comment.
We run the tests in parallel. Is updating the home dir going to cause intermittent test failures?
There was a problem hiding this comment.
added _createStateFile to make sure that state.json mocking happens within in the isolated TestRig.homeDir unique to each test run
evals/ask_user.eval.ts
Outdated
| }, | ||
| }, | ||
| files: { | ||
| '.gemini/state.json': JSON.stringify({ terminalSetupPromptShown: true }), |
There was a problem hiding this comment.
We're writing this file in the test-helper.ts. Do we need it both here and there?
Also, can we avoid the [action at a distance](https://en.wikipedia.org/wiki/Action_at_a_distance_(computer_programming) phenomenon by plumbing this option through to the TestRig and having it pass a cmd line arg or write the config file in one place? We should ideally directly write and read state.json in just one place in the codebase to minimize risk of regression.
There was a problem hiding this comment.
created _createStateFile to read and write state.json once
…/write state.json in one place
Summary
Some AskUser evals seemed to be failing, after some debugging it looks like it was failing because it was running in non-interactive mode which doesn't support the AskUser tool. Changes were made to support running evals in an interactive environment.
Details
Related Issues
Fixes #22073
How to Validate
Should pass locally and I've verified in the nightly evals workflow that the evals pass
Two workflows
https://github.com/google-gemini/gemini-cli/actions/runs/22971547342
https://github.com/google-gemini/gemini-cli/actions/runs/22974083453
Pre-Merge Checklist