feat: update daily-experiment-report to use experiments CLI commands#30044
Merged
feat: update daily-experiment-report to use experiments CLI commands#30044
Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ba8a385e-7c44-442a-87af-63cad2bb8251 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ba8a385e-7c44-442a-87af-63cad2bb8251 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot stopped work on behalf of
pelikhan due to an error
May 4, 2026 02:51
Copilot created this pull request from a session on behalf of
pelikhan
May 4, 2026 02:51
View session
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the daily-experiment-report workflow spec to rely on the gh aw experiments CLI for experiment discovery/analysis, and regenerates compiled workflow lockfiles.
Changes:
- Update
daily-experiment-report.mdto usegh aw experiments list/analyzeoutputs (and addtools.cli-proxy: true). - Recompile
daily-experiment-report.lock.ymlto reflect the updated spec. - Regenerate multiple other
*.lock.ymlfiles, changing Copilot CLI--allow-tool shell(...)allowlist prefixes.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/daily-experiment-report.md | Switches the workflow instructions to CLI-driven experiment discovery/analysis; adds cli-proxy: true. |
| .github/workflows/daily-experiment-report.lock.yml | Recompiled lockfile reflecting updated prompt/spec and environment. |
| .github/workflows/workflow-skill-extractor.lock.yml | Lockfile recompile; --allow-tool shell(...) prefixes became more generic. |
| .github/workflows/ubuntu-image-analyzer.lock.yml | Lockfile recompile; broader find allowlist prefix. |
| .github/workflows/spec-librarian.lock.yml | Lockfile recompile; broader find/grep/git log allowlist prefixes. |
| .github/workflows/spec-extractor.lock.yml | Lockfile recompile; broader find/grep allowlist prefixes. |
| .github/workflows/layout-spec-maintainer.lock.yml | Lockfile recompile; broader find/grep allowlist prefixes and removed constrained yq form. |
| .github/workflows/discussion-task-miner.lock.yml | Lockfile recompile; broader find allowlist prefix. |
| .github/workflows/delight.lock.yml | Lockfile recompile; broader find allowlist prefixes. |
| .github/workflows/daily-testify-uber-super-expert.lock.yml | Lockfile recompile; broader find/grep allowlist prefixes. |
| .github/workflows/daily-safe-output-integrator.lock.yml | Lockfile recompile; broader find/grep allowlist prefixes. |
| .github/workflows/daily-mcp-concurrency-analysis.lock.yml | Lockfile recompile; broader find/git log/grep/jq allowlist prefixes. |
| .github/workflows/daily-file-diet.lock.yml | Lockfile recompile; broader find/grep allowlist prefixes. |
| .github/workflows/daily-compiler-quality.lock.yml | Lockfile recompile; broader find/git log/grep allowlist prefixes. |
| .github/workflows/copilot-cli-deep-research.lock.yml | Lockfile recompile; broader find allowlist prefixes. |
| .github/workflows/ab-testing-advisor.lock.yml | Lockfile recompile; broader find/grep allowlist prefixes. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (3)
.github/workflows/daily-safe-output-integrator.lock.yml:768
--allow-tool shell(grep -n)/shell(grep -rn)are very generic prefixes compared to the previous file/pattern-scoped grep invocations. With prefix matching, this broadens allowed shell activity; consider updating the source workflow’stools.bashcommands to avoid single quotes so the compiled lockfile can keep more restrictive prefixes.
# --allow-tool shell(grep -n)
# --allow-tool shell(grep -rn)
.github/workflows/layout-spec-maintainer.lock.yml:724
--allow-tool shell(grep -r)is much broader than the previous constrainedgrep -r '.*' pkg/workflow/...commands. With prefix matching, this expands what can be searched/parsed via shell beyond the apparent intent. Consider changing the underlyingtools.bashpatterns to avoid single quotes (use double quotes) so the lockfile can keep tighter prefixes.
# --allow-tool shell(grep -r)
.github/workflows/daily-testify-uber-super-expert.lock.yml:822
--allow-tool shell(grep -r)is a very generic prefix compared to the previous constrainedgrep -r ... --include=...entries. With prefix matching, this increases the shell surface area beyond the original intent; consider adjusting the underlyingtools.bashentries to avoid single quotes so the lockfile can retain narrower prefixes.
# --allow-tool shell(grep -r)
- Files reviewed: 16/16 changed files
- Comments generated: 18
Comment on lines
+780
to
+781
| # --allow-tool shell(find pkg -name) | ||
| # --allow-tool shell(find pkg -type f -name) |
Comment on lines
+777
to
+778
| # --allow-tool shell(find .github/workflows -name) | ||
| # --allow-tool shell(find docs/src/content/docs -name) |
Comment on lines
+240
to
+242
| recommendation (regardless of p-value) and show the per-variant progress toward `min_samples` | ||
| from `analyses[].variants[].min_samples_reached`. Only proceed with `PROMOTE` or `ABANDON` when | ||
| the CLI returns `READY_FOR_ANALYSIS`. |
| # --allow-tool shell(date) | ||
| # --allow-tool shell(echo) | ||
| # --allow-tool shell(find pkg/cli/workflows -name 'test-*.md' -type f) | ||
| # --allow-tool shell(find pkg/cli/workflows -name) |
Comment on lines
+816
to
+818
| # --allow-tool shell(find . -name) | ||
| # --allow-tool shell(find pkg -name) | ||
| # --allow-tool shell(find pkg -type f -name) |
Comment on lines
+713
to
+715
| # --allow-tool shell(find .github -name) | ||
| # --allow-tool shell(find .github -type f -exec cat {} +) | ||
| # --allow-tool shell(find pkg -name 'copilot*.go') | ||
| # --allow-tool shell(find pkg -name) |
| # For more information: https://github.github.com/gh-aw/introduction/overview/ | ||
| # | ||
| # Daily statistical report that aggregates experiment-state artifacts across recent runs, computes per-variant statistics (mean, variance, 95% CI, success rate), detects significance via Welch t-test or two-proportion z-test (p < 0.05), checks guardrail metric thresholds, renders bar charts and an ASCII comparison table per experiment, and posts a discussion with a promote/extend/abandon recommendation; notifies tracking issues when experiments reach statistical significance or min_samples | ||
| # Daily statistical report that uses the experiments CLI command to list active experiments and the experiments analyze tool to get per-variant statistics and statistical significance, then computes per-variant success rates and durations from run artifacts, renders bar charts and an ASCII comparison table per experiment, and posts a discussion with a promote/extend/abandon recommendation; notifies tracking issues when experiments reach statistical significance or min_samples |
Comment on lines
+848
to
852
| # --allow-tool shell(find actions/setup/js -name) | ||
| # --allow-tool shell(git log -1 --format=) | ||
| # --allow-tool shell(git log -3 --format=) | ||
| # --allow-tool shell(grep -r) | ||
| # --allow-tool shell(grep) |
Comment on lines
+724
to
+726
| # --allow-tool shell(grep -l) | ||
| # --allow-tool shell(grep -rL) | ||
| # --allow-tool shell(grep -rn) |
Comment on lines
+152
to
+156
| Use the `analyses` array from `gh aw experiments analyze` (Step 1) for the following fields — no | ||
| recomputation is needed: | ||
|
|
||
| - **n** (variant count): from `analyses[].variants[].count` | ||
| - **min_samples**: from `analyses[].min_samples` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Updates the
daily-experiment-reportworkflow to leverage thegh aw experimentsCLI command andgh aw experiments analyzetool, replacing the manual frontmatter-parsing approach with the dedicated CLI tooling.Changes
cli-proxy: trueto the tools section so the agent can callgh awcommandsgh aw experiments list --json --repo ${{ github.repository }}to discover active experiments, then callsgh aw experiments analyze <id> --json --repo ${{ github.repository }}per experiment to retrieve:chi_square,p_value,is_balanced)recommendation:EXTEND/READY_FOR_ANALYSIS)recent_runsarray from the analyze output for explicit variant assignments, supplemented by GitHub MCP tools for per-run outcome data (success rates, durations)count,min_samples, balance test) instead of recomputing them; only outcome metrics (success rate, duration CI) are computed from per-run recordsrecommendationfield for the min-samples gate instead of computing it independentlydaily-experiment-report.lock.yml)