moonpixels · adamwhp · Apr 30, 2026 · Apr 30, 2026
diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
     "name": "propulsion",
-    "version": "0.9.2",
+    "version": "0.10.0",
     "description": "Compact workflow skills for agentic coding in OpenCode.",
     "homepage": "https://github.com/moonpixels/propulsion#readme",
     "bugs": {

diff --git a/skills/tdd/SKILL.md b/skills/tdd/SKILL.md
@@ -6,13 +6,13 @@ description: Build observable behaviour with one failing test at a time through
 
 # TDD
 
-Prove observable behaviour with a failing test before writing production code. Use red-green-refactor to build confidence in the change and keep the code clean.
+Default to red-green-refactor when a valuable behavioural test exists. Do not fabricate brittle tests when work cannot be proven through a public interface or stable seam.
 
 ## Prerequisites
 
 ALL prerequisites MUST be true before following this skill.
 
-- The work includes a change to observable user-facing behaviour, a public contract, or durable business logic that can be proven through testing.
+- The work includes a change to observable user-facing behaviour, a public contract, or durable business logic.
 - The codebase has a test framework installed, and tests can be run locally.
 - The work is not solely for CI-only changes, linting, formatting, dependency maintenance, build or development script changes, repo hygiene, or internal refactors with no behaviour change.
 
@@ -23,41 +23,38 @@ If the work mixes behaviour change with tooling or maintenance updates, use `tdd
 Follow these steps IN ORDER. Do NOT skip steps.
 
 1. Choose the smallest thin vertical slice that delivers one observable behaviour end-to-end.
-2. Write one test for that single behaviour through a public interface or stable seam. Refer to [references/testing-patterns.md](references/testing-patterns.md) for guidance.
-3. Run the test and verify it fails for the expected reason.
-4. Write the smallest amount of production code to pass the test.
-5. Run the test again and verify it now passes.
-6. Repeat steps 1-5 for the next behaviour, building on the previous code, until the work is complete.
-7. Refactor only while all tests are green, and verify tests remain green after refactor. Refer to [references/refactor-candidates.md](references/refactor-candidates.md) for guidance.
-8. For bug fixes, write a regression test that reproduces the bug before fixing it, then verify the test passes after the fix.
+2. Apply the test-quality/applicability gate in [references/testing-patterns.md](references/testing-patterns.md) before writing or keeping a test.
+3. If a valuable behavioural test exists, write one failing test for that behaviour through a public interface or stable seam, verify it fails for the expected reason, implement the smallest passing code, then verify it passes.
+4. If no valuable behavioural test exists, document the no-test rationale, run the strongest appropriate fallback verification, implement the smallest change, then rerun fallback verification.
+5. Repeat for the next behaviour until complete; refactor only while tests or fallback checks are green. Refer to [references/refactor-candidates.md](references/refactor-candidates.md).
+6. For bug fixes, prefer a regression test that reproduces the bug; if none is valuable, document why and use the strongest fallback verification.
 
 ## Rules
 
 These rules are MANDATORY.
 
 - ONLY use `tdd` on observable user-visible behaviour or business logic changes.
-- NO PRODUCTION CODE BEFORE A FAILING TEST.
+- NO production code before a failing test WHEN a valuable behavioural test exists.
 - ALWAYS write ONE test at a time for ONE observable behaviour.
 - ENSURE the test initially fails for the EXPECTED reason before writing production code.
 - ONLY write the minimal amount of code to make the test pass.
 - ALWAYS use the public interface for testing, and test through stable seams if necessary.
-- NEVER write speculative tests or code for behaviour that is not yet required.
+- NEVER write speculative, brittle, implementation-detail, or private-structure tests.
+- ALWAYS document no-test rationale plus fallback verification when no valuable behavioural test exists.
 - ALWAYS look for refactor opportunities AFTER the test is green.
 
 ## Completion Gate
 
 Do NOT leave this skill until ALL items are complete.
 
 - [ ] Work was implemented in thin vertical slices.
-- [ ] Each slice started with a failing test.
-- [ ] Each failing test was verified to fail for the expected reason.
-- [ ] Each slice was completed with passing tests.
-- [ ] Refactors only happened from green and remained green.
-- [ ] All tests for the work are now passing.
+- [ ] Each slice passed the test-quality/applicability gate.
+- [ ] Each testable slice started with a failing test that failed for the expected reason.
+- [ ] Untestable slices documented no-test rationale and strongest appropriate fallback verification.
+- [ ] Each slice was completed with passing tests or fallback checks.
+- [ ] Where possible, refactors were applied after the tests were green.
 
 ## References
 
-Use these references when you need detail.
-
 - [references/testing-patterns.md](references/testing-patterns.md) - Testing patterns for guidance on how to write effective tests.
 - [references/refactor-candidates.md](references/refactor-candidates.md) - Refactor candidates to identify good opportunities for refactor after the tests are green.
diff --git a/skills/tdd/references/testing-patterns.md b/skills/tdd/references/testing-patterns.md
@@ -2,6 +2,8 @@
 
 Use this reference during red-green. Choose tests that prove behaviour, not today's implementation.
 
+**Refactor-safe tests are the standard:** good tests keep passing when internals are rewritten but observable behaviour stays the same. Test through public interfaces or stable seams with domain meaning, and avoid assertions about private helpers, call order, source shape, or other implementation details.
+
 ## Default Move
 
 Start with the highest-level public interface that proves the behaviour cheaply.
@@ -95,6 +97,8 @@ The seam should still represent behaviour another part of the system could reaso
 
 ## Anti-Patterns
 
+Reject tests that only inspect source strings, private structure, implementation details, brittle snapshots, or speculative behaviour. These are not acceptable substitutes for behavioural coverage; use fallback verification instead when no valuable behavioural test exists.
+
 ### Implementation-detail tests
 
 These tests fail when refactoring changes structure without changing behaviour.
@@ -119,6 +123,20 @@ test('sends audit event after saving', async () => {
 
 Prefer a result that matters to a caller, such as the user being created and an audit entry being visible through a supported query.
 
+### Source-string and private-structure tests
+
+Do not read source files as strings or inspect private modules, hidden fields, AST shape, CSS class names, hook order, folder layout, or helper presence to prove behaviour.
+
+- Bad: asserting a file contains `aria-label` or calls `useMemo()`.
+- Better: render the UI and query the accessible control, or verify the public API result.
+
+### Brittle snapshots
+
+Do not use broad snapshots for behaviour changes. Snapshots that mostly capture markup, class churn, generated IDs, timestamps, or component structure fail on harmless refactors.
+
+- Bad: snapshotting an entire page to prove a button opens a menu.
+- Better: interact as a user and assert the menu content is visible.
+
 ### Over-mocking
 
 If most of the test is mock setup, the test is probably proving that the mocks agree with each other.
@@ -195,6 +213,17 @@ Ask these questions before keeping a test:
 
 If any answer is "no" or "I am not sure", simplify the test before proceeding.
 
+If no valuable behavioural test remains, do not keep a weak test. Document why no new test was written and run the strongest appropriate fallback verification, such as an existing related test suite, typecheck, lint, build, CLI smoke check, manual reproduction, or browser check.
+
+## Frontend Guidance
+
+For UI behaviour changes, prefer user-level tests that render the UI, interact through accessible controls, and assert visible or announced outcomes.
+
+- Good: click "Save" and assert the success toast appears.
+- Bad: assert a component state setter was called or a specific class name exists.
+
+For visual-only changes or UI states that are hard to cover with valuable automated tests, prefer Playwright or browser verification when available. Capture the no-test rationale and the browser checks performed.
+
 ## Red-Green Heuristics
 
 When choosing the next test: