[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-12

### Summary
- Run mode: dry-run
- Status: ⚠️ Skipped (no `OPENROUTER_API_KEY` configured — suite execution did not run)

### Key Findings

1. **No benchmark suite was executed** — the optimizer cannot generate meaningful pass-rate scores or improvement candidates without an API key. The `dry-run` path only validates tooling setup; all skill quality signals are absent.
   - **Expected impact**: Enabling benchmark mode would surface concrete pass-rate gaps per skill and drive data-driven improvements.

2. **`SKILL.md` surface description is too sparse for task generation** — `SKILL.md` (the benchmark target) is only ~40 lines and lacks representative edge-case examples. The skill-optimizer's `taskGeneration` (`maxTasks: 20`) relies on richly described surfaces; a thin skill file produces low-diversity, low-quality eval tasks.
   - **Expected impact**: Expanding `SKILL.md` with more concrete usage patterns and failure modes would increase eval diversity and improve benchmark reliability.

3. **`allowedPaths` is locked to `["SKILL.md"]`** — the optimizer's `optimize.allowedPaths` only permits editing `SKILL.md`. However, the majority of detailed guidance lives in `skills/*/SKILL.md` domain files. The optimizer can never improve those skill files in automated passes, limiting optimization scope to the thin top-level surface.
   - **Expected impact**: Widening `allowedPaths` (e.g., `["SKILL.md", "skills/*/SKILL.md"]`) would let the optimizer iteratively improve the domain skill files that agents actually read most often.

<details>
<summary><b>Evidence from Artifact</b></summary>

**`summary.json`**
```json
{
  "repository": "github/gh-aw",
  "run_mode": "dry-run",
  "run_status": 0,
  "run_url": "https://github.com/github/gh-aw/actions/runs/25712309353"
}
```

**`run.log`**
```
dry-run: Docker available but OPENROUTER_API_KEY not set; skipping suite execution
```

**`.skill-optimizer/skill-optimizer.json`** (relevant excerpt)
```json
{
  "target": { "skill": "../SKILL.md" },
  "benchmark": {
    "taskGeneration": { "enabled": true, "maxTasks": 20 },
    "verdict": { "perModelFloor": 0.6, "targetWeightedAverage": 0.8 }
  },
  "optimize": {
    "allowedPaths": ["SKILL.md"],
    "maxIterations": 3
  }
}
```

**`SKILL.md`** is ~40 lines with high-level usage examples but no edge-case, failure-mode, or detailed workflow frontmatter patterns.

</details>

### Recommendations

1. **Add `OPENROUTER_API_KEY` to repository Actions secrets** so the daily workflow runs in `benchmark` mode instead of `dry-run`. This is the prerequisite for all other optimizer improvements; without it the tool produces no actionable data.

2. **Expand `SKILL.md` with richer content**: add 5-10 representative frontmatter snippets (engines, MCP tool configs, `safe-outputs`, `network` restrictions), a short troubleshooting / common-error section, and at least one multi-step usage walkthrough. This gives the task-generation component material to produce diverse, realistic eval cases.

3. **Widen `optimize.allowedPaths` in `.skill-optimizer/skill-optimizer.json`** from `["SKILL.md"]` to include `["skills/*/SKILL.md"]` (or individual high-traffic skill files such as `skills/github-mcp-server/SKILL.md` and `skills/developer/SKILL.md`). This lets the optimizer improve the domain-specific guidance that developers and agents rely on most.







> Generated by [Daily Skill Optimizer Improvements](https://github.com/github/gh-aw/actions/runs/25712309353) · ● 4M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-skill-optimizer%22&type=issues)
> - [x] expires  on May 19, 2026, 3:58 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-12 #31624

Summary

Key Findings

Recommendations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-12 #31624

Description

Summary

Key Findings

Recommendations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions