Replies: 1 comment
-
|
📋 Initiative planned by the BMAD Scrum Master (Bob). Epic #581 — Initiative: Eval-gated, human-reviewed self-improving skills (SkillOpt-style) 6 stories created (inert — labelled
Open questions for review:
Review the epic and its sub-issue DAG, adjust as needed, then add |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Stand up an eval-gated, human-reviewed self-improvement loop for our agent skills and prompts, modeled on Microsoft's open-source SkillOpt pattern. Today every skill and prompt in
frameworks/,prompts/, andagents/is hand-authored markdown that only changes when a human writes a commit — there is no feedback loop turning agent run outcomes back into better skills. The proposal: let agents propose bounded edits to their own skill/prompt markdown, validate each candidate against a held-out eval set, and route the winner through the existingpr-review+ release-channel gates as a normal PR. Weight-free (no fine-tuning), fully version-controlled, and human-gated — improvement becomes a reviewable diff, never a silent mutation.Market Signal
Self-evolving agent skills (not weights) became a distinct, productized category in 2025–2026:
best_skill.md— author-reported 52/52 wins (microsoft/SkillOpt, MS Research: "Executive Strategy for Self-Evolving Agent Skills", VentureBeat).skill-creatorexplicitly closes the loop with eval + blind A/B testing to measure and refine Agent Skills — which are themselves composable markdown folders, exactly our format (Improving skill-creator, Agent Skills, anthropics/skills).Hype filter: fully agent-authored skills remain aspirational. What is shippable today is the SkillOpt/skill-creator shape — agent proposes a bounded edit, an offline eval gate decides, a human reviews the PR (GitHub: reviewing agent PRs).
User Signal
This org already has every prerequisite except the loop itself:
frameworks/bmad-method/(25+ skills),prompts/*.md(triage, deep-review, synthesize, dev-lead phases),agents/*.md— SkillOpt's exact input format.idea:approved→initiative-planner→ epic/DAG →dev-lead→pr-review. We can run skill improvements through it unchanged.pr-reviewhuman-override rates, and Token Cost Observatory spend are all captured-but-unused signals that could seed an eval set.Shadow-mode dual-run (Idea 566) and health-gated promotion (#501) are the natural enforcement layer for rolling out an improved skill safely.
Technical Opportunity
A minimal, weight-free loop reusing what we already run:
skills/evals/<skill>/cases.jsonlheld-out set per high-traffic skill (start withprompts/triage.mdandprompts/deep-review.md), scored by a deterministic check or a Haiku-tier LLM-judge. No gate, no loop — so this lands first.pr-review+ a human CODEOWNER — identical to any other change.stable.This is additive — no changes to the release-tag ruleset, no agent self-promotion, no model fine-tuning.
Assessment
Adversarial Review
Strongest objection: This is a reward-hacking and drift magnet. An agent optimizing its own skill against a metric will learn to game the eval rather than genuinely improve — and a self-modifying skill loop is exactly the kind of "agent edits its own infrastructure" the org deliberately fenced off with the release-channel ruleset.
Rebuttal: The design neutralizes both. (1) Reward hacking is bounded because the eval set is version-controlled separately and never writable by the proposer — the agent cannot edit the test, only the skill, and a strict-improvement gate rejects ties/regressions; SkillOpt's held-out-validation discipline and Anthropic's blind-A/B exist precisely for this. (2) Self-modification is not what's proposed — the agent never moves a tag or merges anything; it opens a diff that goes through the identical human +
pr-review+ promotion gates as any contributor's PR. It is strictly more conservative than today, where a human can hand-edit a skill with no eval gate at all. The genuine cost is building and maintaining honest eval sets — which is why the first deliverable is the eval harness alone, with the loop gated on it proving stable.Suggested Next Step
Pilot on a single high-traffic skill (
prompts/triage.md). Deliverable 1: a versioned held-out eval set + scorer wired into CI as a non-blocking report (no loop yet). Deliverable 2: a manual "propose → validate → PR" dry run by a human operator to prove the gate rejects regressions. Only then automate the proposer. Gate the rollout layer on Safe Release Strategy Phase 2 (#501) and shadow-mode dual-run (Idea 566).Beta Was this translation helpful? Give feedback.
All reactions