Skip to content

Harden skill-availability check: require two exact-form failures before noop#133

Merged
theletterf merged 1 commit intomainfrom
harden-skill-availability-check
May 5, 2026
Merged

Harden skill-availability check: require two exact-form failures before noop#133
theletterf merged 1 commit intomainfrom
harden-skill-availability-check

Conversation

@theletterf
Copy link
Copy Markdown
Member

Summary

The openings sweep noop'd on its first run after the master+subs refactor on elastic/docs-content despite docs-page-opening-optimizer being available. Cause was a stochastic agent retry pattern interacting with an over-aggressive abort rule:

11:33:22  ✗ skill(docs-page-opening-optimizer)         Skill not found
11:33:27  ● noop  "docs-page-opening-optimizer skill unavailable..."
11:33:33  ✓ skill(skill: docs-page-opening-optimizer)  succeeded

The agent first invoked the skill with a shortened form (no skill: prefix), got "Skill not found", honored the prompt's "abort with noop on skill failure" rule — and then 6 seconds later retried with the correct exact form skill(skill: docs-page-opening-optimizer) which worked. By that point the noop was already committed; no fix-issue ever landed.

Frontmatter, applies-to, and style each produced their fix-issues in the same orchestrator run. They got lucky on the exact form first try, OR (in frontmatter's case) the prompt explicitly tolerates partial skill failures. Openings just had the strictest abort rule.

Fix

Lift the docs-review hardening from PRs #117 / #118: only treat a skill as unavailable after a confirmed exact-form failure. Each of the three strict-abort sweeps now spells out the procedure:

  1. Invoke with the exact form skill(skill: docs-X).
  2. On "Skill not found", retry once with the same exact form.
  3. Only after a second exact-form failure, call noop.

A single first-attempt failure (especially with a non-exact form) is explicitly not sufficient evidence to noop.

Applied to:

  • gh-aw-docs-openings-sweep.md (the sweep that actually got bitten)
  • gh-aw-docs-applies-to-sweep.md (same abort wording, would hit next time the agent picks the wrong form)
  • gh-aw-docs-style-sweep.md (same)

Sweeps left unchanged because their failure handling is already softer or uses a different mechanism:

  • frontmatter — explicitly tolerates partial failures ("merge what works, note in Notes")
  • coherence — uses a >50%-MCP-failure threshold instead of single-call abort
  • staleness — deterministic pre-step + MCP for one optional category; no skill import
  • typos — no skill import at all

Test plan

  • Merge + cut release that moves v1.
  • Re-run docs-openings-sweep directly via gh workflow run docs-openings-sweep.yml from elastic/docs-content. Confirm a docs-fix:openings issue is opened (or at minimum that the agent log shows the second exact-form retry happening before any noop).
  • Re-run docs-applies-to-sweep and docs-style-sweep similarly. Confirm no regression — they kept producing issues in the previous run, should keep doing so now.
  • Spot-check the agent log for one of those runs — the new prompt language should be visible at the top of the prompt-extract section.

🤖 Generated with Claude Code

…re noop

The openings sweep noop'd on its first real run on
elastic/docs-content despite the skill being available, because
the agent stochastically invoked the skill with a non-exact form
the first time:

  11:33:22  ✗ skill(docs-page-opening-optimizer)        Skill not found
  11:33:27  ● noop  "docs-page-opening-optimizer skill unavailable..."
  11:33:33  ✓ skill(skill: docs-page-opening-optimizer)        succeeded

The prompt's "abort with noop on skill failure" rule fired on the
first non-exact attempt, before the agent retried with the correct
`skill(skill: …)` form and got results. By the time the exact
invocation succeeded, the noop was already committed.

Frontmatter, applies-to, and style produced fix-issues fine in the
same orchestrator run. The difference: the frontmatter sweep
prompt explicitly tolerates partial skill failures ("if one fails,
report only the other"); applies-to and style happened to land on
the exact form first try. Openings was the unlucky one.

Lift the docs-review pattern (PRs #117/#118) where skill
availability is only declared after a confirmed exact-form
failure. Apply to the three sweeps that had the strict
abort-on-first-failure rule:

- openings (was hit by this in real run 25373751367)
- applies-to (would hit it next time the agent picks the wrong form)
- style (same)

Each now spells out the procedure: try the exact form, retry
once on failure, only noop after the second exact-form failure.
A single non-exact failure is not sufficient evidence.

The frontmatter prompt already had a softer fallback ("merge what
works, note any failure in Notes"); coherence and staleness use
different mechanisms (>50% MCP failure threshold and full-repo
deterministic respectively); typos doesn't import any skill.
None of those need this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@theletterf theletterf requested a review from a team as a code owner May 5, 2026 12:26
@theletterf theletterf requested a review from cotti May 5, 2026 12:26
@theletterf theletterf self-assigned this May 5, 2026
@theletterf theletterf added the fix label May 5, 2026
@theletterf theletterf merged commit de00cb2 into main May 5, 2026
3 of 4 checks passed
@theletterf theletterf deleted the harden-skill-availability-check branch May 5, 2026 12:28
theletterf added a commit that referenced this pull request May 5, 2026
Reverses the strict abort rule from #133 in the openings, applies-to,
and style sweeps. Real-run evidence (elastic/docs-content runs
25377518194 + 25378183549, both openings) showed the hardened
"two failures = noop" rule firing prematurely:

- 13:09:34  ✗ skill(docs-page-opening-optimizer) Skill not found
- 13:09:37  ✗ skill(docs-page-opening-optimizer) Skill not found
- 13:09:41  ● noop "skill unavailable — confirmed after exact-form attempts"
- 13:14:36  ● create_issue "shard 20/28 — 20 pages"   ← agent eventually produced
                                                        an issue, but suppressed
                                                        because noop already fired

The agent's tool serialization keeps logging the skill call as
`skill(docs-X)` — without the `skill:` prefix — which always
returns "Skill not found" from Copilot CLI. Whether the agent's
underlying invocation was actually reformatted, or whether it's a
log-rendering quirk, doesn't matter: my hardening committed the
agent to noop after two of these "failures", before it had a chance
to do useful work.

Replace with frontmatter's softer pattern: try the skill once; if
"Skill not found" comes back, fall back to manual analysis using
bash + the agent's own judgment, and note the skill failure once
in the issue body's Notes section. Only noop if even manual
analysis produces no high-confidence findings.

Applied uniformly to:
- gh-aw-docs-openings-sweep.md (the sweep that's been noop'ing)
- gh-aw-docs-applies-to-sweep.md
- gh-aw-docs-style-sweep.md

Frontmatter, coherence, staleness, typos already had softer or
deterministic-only fallback paths and don't need this change.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant