Replies: 3 comments 7 replies
-
|
From the core SDD process perspective this was delivered by #3001 so I am closing this out |
Beta Was this translation helpful? Give feedback.
-
|
You've drawn a real distinction — converge does structural/textual fidelity, and what you're describing is behavioral execution. I shouldn't have implied #3001 fully subsumes it on its own, so let me sharpen the reasoning. The behavioral oracle you're after is already reachable today through the supported TDD path + converge, not converge alone:
Together that triad is a behavioral oracle for anyone who opts into testing. Stripped down, the genuinely new pieces Golden Demo adds on top are: (1) an auto-generated second reference implementation as a differential oracle, and (2) synthesizing and running these vectors even when the user hasn't opted into TDD. That second piece is exactly why this shouldn't be a core process step: SDD treats tests as optional by design ( The encouraging part: your design already fits the extension model perfectly — it's defined as |
Beta Was this translation helpful? Give feedback.
-
|
That framing really helps — I'd been thinking about TDD and converge as You're right that the two things Golden Demo would actually add on top are Makes complete sense to keep that out of core though. Extension hooks are I'll dig into the EXTENSION-DEVELOPMENT-GUIDE and start putting something |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
In Spec-Driven Development, specs guide code generation, but there's
little automatic assurance that the implemented code behaves exactly
as the spec intended. Existing extensions (Architecture Guard,
Security Review, CDD enforcement tools) catch architectural rule
violations, security anti-patterns, and documentation drift — all
static checks on text/code structure. None of them execute the code
and compare it against a runnable reference.
This isn't a hypothetical gap — see #1686, where a user describes
exactly this pain: no tool currently distinguishes "issue in the spec"
from "issue in the implementation," and asks whether something like
SRE's "golden signals" could exist for spec-to-implementation fidelity.
Proposed solution — v1 scope (intentionally narrow)
A "Golden Demo" extension that, for pure functions with explicit
input/output examples in spec.md (no I/O, no side effects, no
randomness/time dependency — out of scope for v1):
after_planhook: generates a minimal, deterministic referenceimplementation + test vectors from the spec's acceptance criteria.
after_implementhook: runs both the golden reference and thereal implementation against the same test vectors, and produces a
pass/fail drift report (no LLM-as-judge — deterministic execution
only, to avoid noisy false positives).
Scope is deliberately limited to pure functions for v1 to keep
generation reliable and avoid false-positive drift on legitimate
design changes. Side-effecting code, multi-service features, and
non-deterministic behavior are explicitly out of scope until the
core mechanism is validated.
How this differs from Architecture Guard / CDD
Those tools validate structure and documentation against rules.
This extension validates behavior against an executable reference —
complementary, not competing. Happy to coordinate scope with both
maintainers if there's interest in overlap.
@raccioly tagging you since this overlaps conceptually with your CDD (Canonical-Driven Development) work — curious if you'd see value in this as a separate extension or as an addition to yours.
Open questions for the community
.spec-kit/golden/a reasonable place to store referenceartifacts, or should this live under
.specify/?should know about before building?
Beta Was this translation helpful? Give feedback.
All reactions