Skip to content

Improve e2e and eval workflows.#1443

Merged
polina-c merged 16 commits into
mainfrom
merge-ci
May 15, 2026
Merged

Improve e2e and eval workflows.#1443
polina-c merged 16 commits into
mainfrom
merge-ci

Conversation

@polina-c
Copy link
Copy Markdown
Member

@polina-c polina-c commented May 15, 2026

Fixes #1440

  1. Update eval to run on schedule too.
  2. Factor out issue creation to script and invoke from both workflows.
  3. Update the script to check if an open issue with this label and title already exists.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Bash script, scripts/create_issue.sh, designed to automate the creation of GitHub issues when CI workflows fail. The script utilizes the GitHub CLI to gather context about the failure and the associated pull request. Reviewers suggested several improvements to enhance the script's robustness and flexibility, including adopting set -o pipefail, using modern Bash conditional syntax [[ ... ]], and replacing hardcoded branch names with environment variables. Additionally, a recommendation was made to include logic that prevents the creation of duplicate issues for repeated failures.

Comment thread scripts/create_issue.sh
Comment thread scripts/create_issue.sh Outdated
Comment thread scripts/create_issue.sh Outdated
Comment thread scripts/create_issue.sh Outdated
Comment thread scripts/create_issue.sh Outdated
Comment thread scripts/create_issue.sh
Comment on lines +68 to +71
gh issue create \
--title "$TITLE" \
--body "$BODY" \
--label "$LABEL_NAME"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This script creates a new issue every time a workflow fails. For scheduled workflows that might fail repeatedly, this can lead to many duplicate issues. Consider adding logic to check if an open issue with the same label already exists using gh issue list --label "$LABEL_NAME" --state open before creating a new one.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable enhancement. What do you think? I'd check for more than the label, though: I'd check for one with the label and an identical title.

Copy link
Copy Markdown
Member Author

@polina-c polina-c May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script is already checking for label. Added check for title. Thank you!

polina-c and others added 9 commits May 15, 2026 10:11
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Comment thread .github/workflows/run_evals.yml Outdated
# - catch regressions introduced by changes in environment
# - have more data for observing dynamics of performance degradation
schedule:
- cron: "0 * * * *" # hourly
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good use of our AI budget. The next time we submit something that triggers evals, it will catch any updated environment changes anyhow, so this just feels like a waste.

It also seems way too fast. If we did this, I'd run it at most once a day, since otherwise a single change in environment could result in 48 issues being filed over a weekend.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to daily. Thanks.

Comment thread .github/workflows/e2e_test.yaml Outdated
# Do not run on forked branches,
# because the test does not have access to secrets in forks.
if: github.repository == 'google/a2ui'
if: github.repository == 'flutter/genui'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, so I guess it wasn't running?

Copy link
Copy Markdown
Member Author

@polina-c polina-c May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope

we are in A2UI

reverted

Comment thread .github/workflows/docs.yml Outdated
actions: read

if: github.repository == 'google/A2UI'
if: github.repository == 'flutter/genui'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do our docs end up? Is it published anywhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I missed it

reverted

thank you

Comment thread scripts/create_issue.sh
Comment on lines +68 to +71
gh issue create \
--title "$TITLE" \
--body "$BODY" \
--label "$LABEL_NAME"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable enhancement. What do you think? I'd check for more than the label, though: I'd check for one with the label and an identical title.

@polina-c polina-c requested a review from gspencergoog May 15, 2026 19:53
Comment thread .github/workflows/run_evals.yml Outdated
@polina-c polina-c merged commit c4d6ad1 into main May 15, 2026
19 checks passed
@polina-c polina-c deleted the merge-ci branch May 15, 2026 21:23
@github-project-automation github-project-automation Bot moved this from Todo to Done in A2UI May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[ci] e2e_tests seems to be flaky

2 participants