Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/aw/actions-lock.json
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,11 @@
"version": "v6.0.0",
"sha": "030e881283bb7a6894de51c315a6bfe6a94e05cf"
},
"docker/setup-buildx-action@v4": {
"repo": "docker/setup-buildx-action",
"version": "v4",
"sha": "d7f5e7f509e45cec5c76c4d5afdd7de93d0b3df5"
},
"docker/setup-buildx-action@v4.0.0": {
"repo": "docker/setup-buildx-action",
"version": "v4.0.0",
Expand Down
112 changes: 82 additions & 30 deletions .github/workflows/ab-testing-advisor.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,34 +115,29 @@ grep -rl 'experiments:' .github/workflows/*.md 2>/dev/null || echo "none"
grep -rL 'experiments:' .github/workflows/*.md 2>/dev/null | grep -v shared | sort
```

From the list of workflows **without** an `experiments:` section, pick one at random — **excluding any workflow whose basename appears in the `recently_analyzed` list above** — using:
From the list of workflows **without** an `experiments:` section, pick one at random — **excluding any workflow whose basename appears in the `recently_analyzed` list above** — and store the chosen path in `SELECTED`:

```bash
grep -rL 'experiments:' .github/workflows/*.md 2>/dev/null | grep -v shared | shuf -n 1
SELECTED=$(grep -rL 'experiments:' .github/workflows/*.md 2>/dev/null | grep -v shared | shuf -n 1)
echo "$SELECTED"
```

If after filtering out recently-analyzed workflows the candidate list is empty, fall back to any eligible workflow (the dedup window has been exhausted):

```bash
grep -rL 'experiments:' .github/workflows/*.md 2>/dev/null | grep -v shared | shuf -n 1
SELECTED=$(grep -rL 'experiments:' .github/workflows/*.md 2>/dev/null | grep -v shared | shuf -n 1)
echo "$SELECTED"
Comment on lines +118 to +129
```

### Step 2 — Analyze the Selected Workflow

Read the selected workflow file in full. Study:

1. **Purpose & trigger** — What problem does it solve? What events trigger it?
2. **Engine & model** — Which AI engine is used? Is there a specific model set?
3. **Prompt design** — What instructions does the agent receive? How verbose/prescriptive are they?
4. **Tool configuration** — Which tools and MCP servers are enabled?
5. **Output structure** — What safe-outputs are configured? What does it produce?
6. **Current performance characteristics** — Look at recent workflow run history using the path returned by the `shuf` command above. For example, if the selected workflow is `.github/workflows/daily-news.md`, run:
```bash
# Check recent runs (last 10) — replace WORKFLOW_BASENAME with the name from shuf output
SELECTED=$(grep -rL 'experiments:' .github/workflows/*.md 2>/dev/null | grep -v shared | shuf -n 1)
gh run list --workflow="$(basename "$SELECTED" .md).lock.yml" --limit 10 --json conclusion,createdAt,displayTitle,durationMS
```
7. **Existing quality signals** — Are there any reported issues, quality labels, or patterns in runs?
Use the `workflow-characterizer` agent with the selected workflow file path. Use the returned characterization (`purpose`, `triggers`, `engine`, `prompt_density`, `tools`, `outputs`, `quality_signals`) as the basis for Step 3.

Then check recent run performance with:

```bash
gh run list --workflow="$(basename "$SELECTED" .md).lock.yml" --limit 10 --json conclusion,createdAt,displayTitle,durationMS
```

### Step 3 — Devise an Experiment Campaign

Expand Down Expand Up @@ -308,25 +303,17 @@ echo "✅ Cache updated — recently analyzed: $(echo "$UPDATED" | jq -r '.recen

## Side Quest: Improve the Experiment Infrastructure

After completing the primary quest, include a **second issue** (sub-issue of the first) proposing improvements to the experiments infrastructure. Assess the current implementation by reading:
After completing the primary quest, include a **second issue** (sub-issue of the first) proposing improvements to the experiments infrastructure.

```bash
cat pkg/workflow/compiler_experiments.go
cat actions/setup/js/pick_experiment.cjs
```
Use the `field-presence-checker` agent with file paths `pkg/workflow/compiler_experiments.go` and `actions/setup/js/pick_experiment.cjs`, and field names `analysis_type`, `tags`, `notify`. Use the returned `present`/`evidence` results when deciding which fields are genuinely absent.

Then review what data is currently captured per experiment run (the artifact uploaded to `/tmp/gh-aw/experiments/state.json`) and consider what would be needed for a complete experiment analytics pipeline.

Propose concrete improvements in the following areas:

### Area 1: Frontmatter Schema — Verify Genuine Gaps Before Filing

**Important**: Before proposing additions, verify what is already implemented by reading the source files:

```bash
cat pkg/workflow/compiler_experiments.go
cat actions/setup/js/pick_experiment.cjs
```
**Important**: Before proposing additions, rely on the `field-presence-checker` results rather than re-reading the source files in the main prompt.

The current `ExperimentConfig` already supports the following fields — **do not propose adding these**, they are fully operational:

Expand All @@ -344,7 +331,7 @@ The current `ExperimentConfig` already supports the following fields — **do no
| `start_date` / `end_date` | ISO-8601 date range for time-boxed experiments |
| `issue` | GitHub issue number tracking the experiment |

After reading the compiler and `pick_experiment.cjs`, check whether the following **genuinely unimplemented** fields have been added yet:
Using the `field-presence-checker` results, check whether the following **genuinely unimplemented** fields have been added yet:

- **`analysis_type`** — declares the statistical test for automated reporting (`t_test`, `mann_whitney`, `proportion_test`, `bayesian_ab`)
- **`tags`** — free-form labels for filtering experiments in dashboards
Expand Down Expand Up @@ -383,4 +370,69 @@ Propose how experiments should integrate with `gh aw audit` and OTEL observabili
- Do not create issues for workflows that already have `experiments:` defined
- If all eligible workflows are filtered out (all have experiments), create a single issue celebrating this and suggesting advanced multi-experiment designs

{{#runtime-import shared/noop-reminder.md}}
{{#runtime-import shared/noop-reminder.md}}

## agent: `workflow-characterizer`
---
description: Read a selected workflow file and return a concise structured characterization for experiment design
model: small
---
You receive a single file path to a `.github/workflows/<name>.md` workflow file.

Read the file using `cat <filepath>` via bash and return only JSON with this structure:

```json
{
"purpose": "",
"triggers": [],
"engine": "",
"prompt_density": "",
"tools": [],
"outputs": [],
"quality_signals": []
}
```

Requirements:
- `purpose`: 1-2 sentences describing what problem the workflow solves.
- `triggers`: list the workflow trigger types you find in frontmatter.
- `engine`: identify the engine and any explicit model/bare-mode details visible in the file.
- `prompt_density`: brief characterization such as `minimal`, `moderate`, or `dense`, with a short reason.
- `tools`: concise list of enabled tools or MCP capabilities that materially affect the workflow.
- `outputs`: concise list of safe outputs or other concrete artifacts the workflow produces.
- `quality_signals`: list any notable quality-related signals already visible in the file itself, such as strict mode, validation steps, review guidance, retry/guardrail instructions, or obvious gaps.
- Be extractive and factual. Do not propose changes.

## agent: `field-presence-checker`
---
description: Check whether named experiment fields are genuinely implemented in the compiler and picker code
model: small
---
You receive two file paths:
- `pkg/workflow/compiler_experiments.go`
- `actions/setup/js/pick_experiment.cjs`

And three field names to check:
- `analysis_type`
- `tags`
- `notify`

Read both files using `cat <filepath>` via bash. Return only JSON with this structure:

```json
{
"analysis_type": { "present": "yes", "evidence": "" },
"tags": { "present": "yes", "evidence": "" },
"notify": { "present": "yes", "evidence": "" }
Comment on lines +424 to +426
}
```

Use strict semantics for `present`:
- `yes`: the field is parsed into the compiled config and meaningfully used or surfaced at runtime.
- `partial`: the field is mentioned in types/comments or parsed only on one side, but is not clearly end-to-end implemented.
- `no`: the field is absent.

Evidence rules:
- Cite concrete evidence from both files when possible.
- Keep each `evidence` value to 1-3 sentences.
- If the implementation is only documented, typed, or parsed without observable runtime use, mark it `partial`, not `yes`.
34 changes: 17 additions & 17 deletions .github/workflows/developer-docs-consolidator.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading