[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode

# Q Workflow Optimization Report

> Triggered by `/q review the ab Testing Advisor to conduct deep research on existing workflows before suggesting changes` — research performed by a dedicated sub-agent analyzing run history, code, and workflow comparisons.

## Issues Found (from live data + sub-agent research)

### ab-testing-advisor

**Run IDs Analyzed**: 25276903561, 25250048575, 25240168844, 25229568147 (all `success` — issues were silent degradations, not hard failures)

**Issues Identified**:

#### 🔴 `sort` and `basename` missing from bash allowlist
The prompt's Step 1 uses `find ... | sort` and Step 2 uses `$(basename "$SELECTED" .md)` — both commands were not in the allowlist. The agent silently worked around these failures, degrading run history analysis quality.

#### 🔴 Side quest proposed already-implemented infrastructure features
`ExperimentConfig` already supports `description`, `hypothesis`, `metric`, `secondary_metrics`, `guardrail_metrics`, `min_samples`, `owner`, `weight`, `start_date`, `end_date`, `issue` — all parsed by `compiler_experiments.go` and consumed by `pick_experiment.cjs`. The side quest's Area 1 was instructing the agent to propose adding these fields, resulting in misleading issues every day.

#### 🔴 Campaign issue template used bare array form only
The `experiments:` YAML snippet in the issue template showed only `[variant1, variant2]`. The infrastructure fully supports the rich object form with all metadata fields. This made output quality depend on the agent serendipitously discovering the object form.

#### 🟡 No `strict: true`
Other advisor workflows (`daily-otel-instrumentation-advisor`, `daily-observability-report`) use `strict: true`. Without it, blocked bash commands cause silent degradation.

#### 🟡 `timeout-minutes: 20` was too low
The side quest requires reading `compiler_experiments.go` + `pick_experiment.cjs` (~500 LOC) plus creating 2 structured issues. Similar advisors use 30 minutes.

#### 🟡 No `cache-memory` — same workflow could be re-selected on consecutive days
With 205 eligible workflows and `shuf -n 1`, the same workflow could be selected back-to-back. Each re-run closes the previous campaign issue before it's implemented.

## Changes Made

### `.github/workflows/ab-testing-advisor.md`

| Change | Rationale |
|--------|----------|
| Added `sort`, `basename`, `tail`, `uniq` to bash allowlist | Fixes silent failures in Steps 1 & 2 |
| Added `strict: true` | Surfaces tool failures instead of silent degradation |
| Increased `timeout-minutes` 20 → 30 | Prevents premature truncation of side quest |
| Added `cache-memory: true` under `tools` | Enables persistent recently-analyzed tracking |
| Added Step 1 cache-load instructions | Reads `recently-analyzed.json` before selection |
| Added Step 5 cache-write instructions | Saves selected workflow (rolling window of 14) |
| Updated experiment template to rich object form | Uses full `ExperimentConfig` schema with all metadata fields |
| Rewrote side quest Area 1 | Documents what's already built; redirects to genuine gaps (`analysis_type`, `tags`, `notify`); gates issue creation on actual findings |

## Expected Improvements

- ✅ Reliable workflow run history lookup (bash commands no longer blocked)
- ✅ Side quest issues will be actionable instead of re-proposing existing features
- ✅ All campaign issues will use rich experiment metadata (progress bars, weighted selection, self-documenting hypotheses)
- ✅ Better coverage across 205 eligible workflows via cache-based dedup
- ✅ Strict mode catches tool failures early

## Validation

✅ `ab-testing-advisor` compiled successfully with no errors or warnings.

Note: `.lock.yml` regeneration will happen automatically via CI after merge.

## References

- Research sub-agent analyzed runs: 25276903561, 25250048575, 25240168844, 25229568147
- Compared against: `daily-otel-instrumentation-advisor`, `daily-observability-report`
- Infrastructure verified in: `pkg/workflow/frontmatter_types.go`, `pkg/workflow/compiler_experiments.go`, `actions/setup/js/pick_experiment.cjs`

- Fixes #29911

---

> [!WARNING]
> **Protected Files — Push Permission Denied**
>
> This was originally intended as a pull request, but the patch modifies protected files. A human must create the pull request manually.
>
> <details>
> <summary>Protected files</summary>
>
> - [.github/workflows/ab-testing-advisor.md](https://github.com/github/gh-aw/blob/main/.github/workflows/ab-testing-advisor.md)
>
> The push was rejected because GitHub Actions does not have `workflows` permission to push these changes, and is never allowed to make such changes, or other authorization being used does not have this permission.
>
> </details>

<details>
<summary>Create the pull request manually</summary>

```sh
# Download the patch from the workflow run
gh run download 25287000457 -n agent -D /tmp/agent-25287000457

# Create a new branch
git checkout -b q/ab-testing-advisor-improvements-93545ad86f5fc13f main

# Apply the patch (--3way handles cross-repo patches)
git am --3way /tmp/agent-25287000457/aw-q-ab-testing-advisor-improvements.patch

# Push the branch and create the pull request
git push origin q/ab-testing-advisor-improvements-93545ad86f5fc13f
gh pr create --title '[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode' --base main --head q/ab-testing-advisor-improvements-93545ad86f5fc13f --repo github/gh-aw
```

</details>



> 🎩 *Equipped by [Q](https://github.com/github/gh-aw/actions/runs/25287000457)* · ● 2.4M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+%22gh-aw-workflow-id%3A+q%22&type=pullrequests)
> - [x] expires  on May 5, 2026, 6:31 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode #29989

Q Workflow Optimization Report

Issues Found (from live data + sub-agent research)

ab-testing-advisor

🔴 `sort` and `basename` missing from bash allowlist

🔴 Side quest proposed already-implemented infrastructure features

🔴 Campaign issue template used bare array form only

🟡 No `strict: true`

🟡 `timeout-minutes: 20` was too low

🟡 No `cache-memory` — same workflow could be re-selected on consecutive days

Changes Made

`.github/workflows/ab-testing-advisor.md`

Expected Improvements

Validation

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Change	Rationale
Added `sort`, `basename`, `tail`, `uniq` to bash allowlist	Fixes silent failures in Steps 1 & 2
Added `strict: true`	Surfaces tool failures instead of silent degradation
Increased `timeout-minutes` 20 → 30	Prevents premature truncation of side quest
Added `cache-memory: true` under `tools`	Enables persistent recently-analyzed tracking
Added Step 1 cache-load instructions	Reads `recently-analyzed.json` before selection
Added Step 5 cache-write instructions	Saves selected workflow (rolling window of 14)
Updated experiment template to rich object form	Uses full `ExperimentConfig` schema with all metadata fields
Rewrote side quest Area 1	Documents what's already built; redirects to genuine gaps (`analysis_type`, `tags`, `notify`); gates issue creation on actual findings

[q] fix(ab-testing-advisor): add missing bash commands, rich experiment template, cache-memory, and strict mode #29989

Description

Q Workflow Optimization Report

Issues Found (from live data + sub-agent research)

ab-testing-advisor

🔴 sort and basename missing from bash allowlist

🔴 Side quest proposed already-implemented infrastructure features

🔴 Campaign issue template used bare array form only

🟡 No strict: true

🟡 timeout-minutes: 20 was too low

🟡 No cache-memory — same workflow could be re-selected on consecutive days

Changes Made

.github/workflows/ab-testing-advisor.md

Expected Improvements

Validation

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

🔴 `sort` and `basename` missing from bash allowlist

🟡 No `strict: true`

🟡 `timeout-minutes: 20` was too low

🟡 No `cache-memory` — same workflow could be re-selected on consecutive days

`.github/workflows/ab-testing-advisor.md`