You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Triggered by /q review the ab Testing Advisor to conduct deep research on existing workflows before suggesting changes — research performed by a dedicated sub-agent analyzing run history, code, and workflow comparisons.
Issues Found (from live data + sub-agent research)
ab-testing-advisor
Run IDs Analyzed: 25276903561, 25250048575, 25240168844, 25229568147 (all success — issues were silent degradations, not hard failures)
Issues Identified:
🔴 sort and basename missing from bash allowlist
The prompt's Step 1 uses find ... | sort and Step 2 uses $(basename "$SELECTED" .md) — both commands were not in the allowlist. The agent silently worked around these failures, degrading run history analysis quality.
🔴 Side quest proposed already-implemented infrastructure features
ExperimentConfig already supports description, hypothesis, metric, secondary_metrics, guardrail_metrics, min_samples, owner, weight, start_date, end_date, issue — all parsed by compiler_experiments.go and consumed by pick_experiment.cjs. The side quest's Area 1 was instructing the agent to propose adding these fields, resulting in misleading issues every day.
🔴 Campaign issue template used bare array form only
The experiments: YAML snippet in the issue template showed only [variant1, variant2]. The infrastructure fully supports the rich object form with all metadata fields. This made output quality depend on the agent serendipitously discovering the object form.
🟡 No strict: true
Other advisor workflows (daily-otel-instrumentation-advisor, daily-observability-report) use strict: true. Without it, blocked bash commands cause silent degradation.
🟡 timeout-minutes: 20 was too low
The side quest requires reading compiler_experiments.go + pick_experiment.cjs (~500 LOC) plus creating 2 structured issues. Similar advisors use 30 minutes.
🟡 No cache-memory — same workflow could be re-selected on consecutive days
With 205 eligible workflows and shuf -n 1, the same workflow could be selected back-to-back. Each re-run closes the previous campaign issue before it's implemented.
Changes Made
.github/workflows/ab-testing-advisor.md
Change
Rationale
Added sort, basename, tail, uniq to bash allowlist
Fixes silent failures in Steps 1 & 2
Added strict: true
Surfaces tool failures instead of silent degradation
Increased timeout-minutes 20 → 30
Prevents premature truncation of side quest
Added cache-memory: true under tools
Enables persistent recently-analyzed tracking
Added Step 1 cache-load instructions
Reads recently-analyzed.json before selection
Added Step 5 cache-write instructions
Saves selected workflow (rolling window of 14)
Updated experiment template to rich object form
Uses full ExperimentConfig schema with all metadata fields
Rewrote side quest Area 1
Documents what's already built; redirects to genuine gaps (analysis_type, tags, notify); gates issue creation on actual findings
Expected Improvements
✅ Reliable workflow run history lookup (bash commands no longer blocked)
✅ Side quest issues will be actionable instead of re-proposing existing features
✅ All campaign issues will use rich experiment metadata (progress bars, weighted selection, self-documenting hypotheses)
✅ Better coverage across 205 eligible workflows via cache-based dedup
✅ Strict mode catches tool failures early
Validation
✅ ab-testing-advisor compiled successfully with no errors or warnings.
Note: .lock.yml regeneration will happen automatically via CI after merge.
References
Research sub-agent analyzed runs: 25276903561, 25250048575, 25240168844, 25229568147
The push was rejected because GitHub Actions does not have workflows permission to push these changes, and is never allowed to make such changes, or other authorization being used does not have this permission.
Create the pull request manually
# Download the patch from the workflow run
gh run download 25287000457 -n agent -D /tmp/agent-25287000457
# Create a new branch
git checkout -b q/ab-testing-advisor-improvements-93545ad86f5fc13f main
# Apply the patch (--3way handles cross-repo patches)
git am --3way /tmp/agent-25287000457/aw-q-ab-testing-advisor-improvements.patch
# Push the branch and create the pull request
git push origin q/ab-testing-advisor-improvements-93545ad86f5fc13f
gh pr create --title '[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode' --base main --head q/ab-testing-advisor-improvements-93545ad86f5fc13f --repo github/gh-aw
Q Workflow Optimization Report
Issues Found (from live data + sub-agent research)
ab-testing-advisor
Run IDs Analyzed: 25276903561, 25250048575, 25240168844, 25229568147 (all
success— issues were silent degradations, not hard failures)Issues Identified:
🔴
sortandbasenamemissing from bash allowlistThe prompt's Step 1 uses
find ... | sortand Step 2 uses$(basename "$SELECTED" .md)— both commands were not in the allowlist. The agent silently worked around these failures, degrading run history analysis quality.🔴 Side quest proposed already-implemented infrastructure features
ExperimentConfigalready supportsdescription,hypothesis,metric,secondary_metrics,guardrail_metrics,min_samples,owner,weight,start_date,end_date,issue— all parsed bycompiler_experiments.goand consumed bypick_experiment.cjs. The side quest's Area 1 was instructing the agent to propose adding these fields, resulting in misleading issues every day.🔴 Campaign issue template used bare array form only
The
experiments:YAML snippet in the issue template showed only[variant1, variant2]. The infrastructure fully supports the rich object form with all metadata fields. This made output quality depend on the agent serendipitously discovering the object form.🟡 No
strict: trueOther advisor workflows (
daily-otel-instrumentation-advisor,daily-observability-report) usestrict: true. Without it, blocked bash commands cause silent degradation.🟡
timeout-minutes: 20was too lowThe side quest requires reading
compiler_experiments.go+pick_experiment.cjs(~500 LOC) plus creating 2 structured issues. Similar advisors use 30 minutes.🟡 No
cache-memory— same workflow could be re-selected on consecutive daysWith 205 eligible workflows and
shuf -n 1, the same workflow could be selected back-to-back. Each re-run closes the previous campaign issue before it's implemented.Changes Made
.github/workflows/ab-testing-advisor.mdsort,basename,tail,uniqto bash allowliststrict: truetimeout-minutes20 → 30cache-memory: trueundertoolsrecently-analyzed.jsonbefore selectionExperimentConfigschema with all metadata fieldsanalysis_type,tags,notify); gates issue creation on actual findingsExpected Improvements
Validation
✅
ab-testing-advisorcompiled successfully with no errors or warnings.Note:
.lock.ymlregeneration will happen automatically via CI after merge.References
Research sub-agent analyzed runs: 25276903561, 25250048575, 25240168844, 25229568147
Compared against:
daily-otel-instrumentation-advisor,daily-observability-reportInfrastructure verified in:
pkg/workflow/frontmatter_types.go,pkg/workflow/compiler_experiments.go,actions/setup/js/pick_experiment.cjsFixes [ab-advisor] Improve experiment infrastructure: schema, reporting & audit #29911
Warning
Protected Files — Push Permission Denied
This was originally intended as a pull request, but the patch modifies protected files. A human must create the pull request manually.
Protected files
The push was rejected because GitHub Actions does not have
workflowspermission to push these changes, and is never allowed to make such changes, or other authorization being used does not have this permission.Create the pull request manually