Replies: 2 comments
-
Enhancement: Operational Review → Discussions — Prior Art, Options & Durability StrategySharpened problem & goal Context & existing infrastructure
Prior art / competitive signal
Key insight from SRE tooling: teams that automate the documentation layer (Rootly, incident.io) free engineers to focus on analysis. This proposal does exactly that — automating the capture of review findings into a durable Discussion. Options table
Recommendation: Option B — rolling Discussion per skill — best fits the existing de-dup-by-label pattern already used by Risks / unknowns
Phased pilot path
Success metrics
|
Beta Was this translation helpful? Give feedback.
-
|
Sharpened problem & goal The proposal is to route pipeline-review improvement findings into durable GitHub Discussions rather than leaving them as transient action items embedded in the operational review body. Worth clarifying before implementing: should these Discussions use the existing Ideas category (so they flow through idea-triage → idea-enhancer → initiative-planner automatically), or does a dedicated "Pipeline Improvements" category better prevent signal pollution and keep the Ideas queue focused on product-facing features? Context The org already has a full idea-lifecycle pipeline ( One important constraint: Posting a Discussion is already a thin Impact / Effort
Suggested acceptance criteria
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Operational review of the self-improving-skills pipeline (epic #581, Discussion #572), 14-day window ending 2026-06-26. Report-only — this informs, it does not gate any merge.
TL;DR
The pipeline is operationally healthy with no open regression or scorer error. The latest
triagerun passed, all trackers are recovered/closed, and the #920 throttle-as-regression gap is now fixed in code. The remaining findings are coverage/hygiene gaps, not skill defects. This Discussion also proposes a small process change: raise pipeline-review improvements here as Discussions, rather than only as transientaction_itemsin the review JSON.Health summary (last 14 days)
Skill Eval Report; all workflow-levelsuccess(the workflow is non-blocking by design).eval-healthissues are closed; noeval-infraissues exist.triageregression trackers were raised and all recovered:gotpatternnullnull, 25s runOnly #814 was a genuine behavioral regression; #762 and #911 were throttle runs mis-scored as regressions (all-
nullgoton implausibly short runs — #911's run was 25s vs ~60–90s healthy).Positive: the #920 throttle-misclassification gap is fixed in code —
run-eval.shnow captures the engine exit code and exits2→outcome=erroron all-infra runs, andnotify-eval-health.shroutes that to a separateeval-infratracker instead of a falseeval-healthregression.Improvements identified
deep-reviewon the schedule (or document it as dispatch-only)skill-eval-report.ymlhardcodesSKILL=triageon cron, so its holdout is never scored on a schedule.lsp-pilot's definition gapprompts/lsp-pilot.mdand no scheduled run — wire it + add a prompt, or mark it experimental.example-skillexclusionevals/READMEfixture (no prompt, no refs) — confirm it's intentionally not scored so it isn't mistaken for an unmonitored skill.Minor notes (no action): #762 was closed without a recovery comment (manual closure); the strict-improvement gate (
gate.sh/review-skill.sh) is present inscripts/evals/but not yet wired to any workflow.Proposal: raise pipeline-review improvements as Discussions
Today, improvements surface in two ways: routine eval regressions auto-raise
eval-health/eval-infratracker issues, and the operational review emitsaction_itemsin its JSON (each taggedsuggested_issue_labels: ["dev-lead"]) for a human to file. The review JSON is transient and not easily browsable over time.Proposed: each operational-review window posts (or updates) a Discussion like this one as the durable home for its findings and improvement ideas. Concrete
skill_fixitems still spawndev-leadissues for execution; the Discussion is the running, browsable record and the place to debate process changes. This keeps epic #581's improvement loop visible without adding merge-blocking noise.Beta Was this translation helpful? Give feedback.
All reactions