Skip to content

[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode #29989

@github-actions

Description

@github-actions

Q Workflow Optimization Report

Triggered by /q review the ab Testing Advisor to conduct deep research on existing workflows before suggesting changes — research performed by a dedicated sub-agent analyzing run history, code, and workflow comparisons.

Issues Found (from live data + sub-agent research)

ab-testing-advisor

Run IDs Analyzed: 25276903561, 25250048575, 25240168844, 25229568147 (all success — issues were silent degradations, not hard failures)

Issues Identified:

🔴 sort and basename missing from bash allowlist

The prompt's Step 1 uses find ... | sort and Step 2 uses $(basename "$SELECTED" .md) — both commands were not in the allowlist. The agent silently worked around these failures, degrading run history analysis quality.

🔴 Side quest proposed already-implemented infrastructure features

ExperimentConfig already supports description, hypothesis, metric, secondary_metrics, guardrail_metrics, min_samples, owner, weight, start_date, end_date, issue — all parsed by compiler_experiments.go and consumed by pick_experiment.cjs. The side quest's Area 1 was instructing the agent to propose adding these fields, resulting in misleading issues every day.

🔴 Campaign issue template used bare array form only

The experiments: YAML snippet in the issue template showed only [variant1, variant2]. The infrastructure fully supports the rich object form with all metadata fields. This made output quality depend on the agent serendipitously discovering the object form.

🟡 No strict: true

Other advisor workflows (daily-otel-instrumentation-advisor, daily-observability-report) use strict: true. Without it, blocked bash commands cause silent degradation.

🟡 timeout-minutes: 20 was too low

The side quest requires reading compiler_experiments.go + pick_experiment.cjs (~500 LOC) plus creating 2 structured issues. Similar advisors use 30 minutes.

🟡 No cache-memory — same workflow could be re-selected on consecutive days

With 205 eligible workflows and shuf -n 1, the same workflow could be selected back-to-back. Each re-run closes the previous campaign issue before it's implemented.

Changes Made

.github/workflows/ab-testing-advisor.md

Change Rationale
Added sort, basename, tail, uniq to bash allowlist Fixes silent failures in Steps 1 & 2
Added strict: true Surfaces tool failures instead of silent degradation
Increased timeout-minutes 20 → 30 Prevents premature truncation of side quest
Added cache-memory: true under tools Enables persistent recently-analyzed tracking
Added Step 1 cache-load instructions Reads recently-analyzed.json before selection
Added Step 5 cache-write instructions Saves selected workflow (rolling window of 14)
Updated experiment template to rich object form Uses full ExperimentConfig schema with all metadata fields
Rewrote side quest Area 1 Documents what's already built; redirects to genuine gaps (analysis_type, tags, notify); gates issue creation on actual findings

Expected Improvements

  • ✅ Reliable workflow run history lookup (bash commands no longer blocked)
  • ✅ Side quest issues will be actionable instead of re-proposing existing features
  • ✅ All campaign issues will use rich experiment metadata (progress bars, weighted selection, self-documenting hypotheses)
  • ✅ Better coverage across 205 eligible workflows via cache-based dedup
  • ✅ Strict mode catches tool failures early

Validation

ab-testing-advisor compiled successfully with no errors or warnings.

Note: .lock.yml regeneration will happen automatically via CI after merge.

References


Warning

Protected Files — Push Permission Denied

This was originally intended as a pull request, but the patch modifies protected files. A human must create the pull request manually.

Protected files

The push was rejected because GitHub Actions does not have workflows permission to push these changes, and is never allowed to make such changes, or other authorization being used does not have this permission.

Create the pull request manually
# Download the patch from the workflow run
gh run download 25287000457 -n agent -D /tmp/agent-25287000457

# Create a new branch
git checkout -b q/ab-testing-advisor-improvements-93545ad86f5fc13f main

# Apply the patch (--3way handles cross-repo patches)
git am --3way /tmp/agent-25287000457/aw-q-ab-testing-advisor-improvements.patch

# Push the branch and create the pull request
git push origin q/ab-testing-advisor-improvements-93545ad86f5fc13f
gh pr create --title '[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode' --base main --head q/ab-testing-advisor-improvements-93545ad86f5fc13f --repo github/gh-aw

🎩 Equipped by Q · ● 2.4M ·

  • expires on May 5, 2026, 6:31 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions