docs: add unit testing guidance to swamp-extension-model skill by stack72 · Pull Request #926 · systeminit/swamp

stack72 · 2026-03-30T16:15:31Z

Summary

Add "Unit Testing" section to the swamp-extension-model skill with a concise createModelTestContext example
Add references/testing.md with the full testing guide (options, inspection helpers, CRUD lifecycle, injectable client pattern)
Add reference link in the skill's References table

This follows the skill-creator progressive disclosure pattern: SKILL.md has the essential example and links to the reference file for details.

Depends on #925 (merged) which published @systeminit/swamp-testing to JSR.

Test plan

Skill triggers correctly on "test extension" / "unit test model" queries
Reference file is loaded when testing details are needed

🤖 Generated with Claude Code

Add testing guidance to the extension model skill now that @systeminit/swamp-testing is published on JSR. - Add "Unit Testing" section to SKILL.md with concise example and progressive disclosure link to reference - Add references/testing.md with createModelTestContext options, inspection helpers, CRUD lifecycle testing, injectable client pattern, and extracted function pattern - Add reference link in the References table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the standard Zod types table — Claude already knows Zod's type system. Keep only the swamp-specific modifier (.meta({ sensitive: true })). Saves ~16 lines of context window. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- swamp-vault (86%→higher): Trim security best practices Claude already knows, condense interactive prompt explanation, narrow generic trigger terms (remove 'password', 'encrypt', etc.) - swamp-issue (86%→higher): Trim Requirements/Troubleshooting sections for gh CLI (Claude knows this), add explicit numbered workflow - swamp-report (89%→higher): Trim redundant "When to Create" section, add end-to-end workflow with validation checkpoints - swamp-data (89%→higher): Move Data Concepts section (lifetime types, tags, version GC) to references/concepts.md for progressive disclosure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- swamp-issue (90%→100%): Move formatting guidelines to references/formatting.md, remove troubleshooting for gh CLI - swamp-data (90%→100%): Move JSON output shapes to references/output-shapes.md, trim search examples from 14 to 3, clarify platform context in description, remove generic trigger terms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

With EVAL_RUNS=1, every query is all-or-nothing — a single LLM variance causes a failure even when the trigger terms are correct. Bumping to 3 runs with the default 0.5 threshold means a query passes if it triggers at least 2/3 times, dramatically reducing flakiness. Doubled workers from 25 to 50 to keep wall-clock time similar despite tripling the total tasks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions

CI Security Review

The only workflow change in this PR is in .github/workflows/ci.yml (lines 163-164), modifying two hardcoded environment variable values:

EVAL_RUNS: "1" → "3"
EVAL_WORKERS: "25" → "50"

All other changed files are .claude/skills/ documentation (SKILL.md and reference markdown files) — no workflow or security impact.

Checklist Results

Prompt Injection: No changes to LLM prompts or tool scoping. N/A.
Expression Injection: No new expression interpolation introduced. N/A.
Dangerous Triggers: No trigger changes. N/A.
Supply Chain: No action additions or version changes. N/A.
Permissions: No permission changes. N/A.
Secret Exposure: No new secret usage. N/A.
Auto-merge & Trust Boundaries: No changes to merge logic. N/A.

Verdict

PASS — Security-neutral change. Only modifies eval concurrency parameters (runs and workers) with hardcoded integer values.

github-actions

Code Review

Blocking Issues

None.

Suggestions

The swamp-issue/SKILL.md refactoring removed troubleshooting tips (gh CLI auth, editor wait issues) that weren't moved to references/formatting.md. This is minor since it's generic gh CLI knowledge, but worth noting in case users relied on those hints. Consider adding a references/troubleshooting.md if the skill had frequent questions about these.
The CI eval config change (EVAL_RUNS 1→3, EVAL_WORKERS 25→50) is bundled with the docs PR — fine for now, but could be its own commit for clearer git history.

stack72 force-pushed the docs-extension-model-testing-skill branch from 73f5db4 to 9f87be0 Compare March 30, 2026 16:19

stack72 and others added 3 commits March 30, 2026 17:25

stack72 force-pushed the docs-extension-model-testing-skill branch from fa01cac to 4beaa5e Compare March 30, 2026 16:54

github-actions bot approved these changes Mar 30, 2026

View reviewed changes

stack72 merged commit af544d8 into main Mar 30, 2026
10 of 11 checks passed

stack72 deleted the docs-extension-model-testing-skill branch March 30, 2026 17:04

stack72 mentioned this pull request Mar 30, 2026

Skill trigger evals are systemically flaky in CI #929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add unit testing guidance to swamp-extension-model skill#926

docs: add unit testing guidance to swamp-extension-model skill#926
stack72 merged 5 commits intomainfrom
docs-extension-model-testing-skill

stack72 commented Mar 30, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stack72 commented Mar 30, 2026

Summary

Test plan

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

CI Security Review

Checklist Results

Verdict

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant