Crucible State Machine — Conformance Harness #7

joshua-temple · 2026-05-28T23:00:42Z

joshua-temple
May 28, 2026
Maintainer

See the Kernel Core discussion for Fire and Effect, and JSON, Mermaid & DOT for the Scenario/Trace formats this harness leans on.

The conformance harness answers one question: does a machine implementation behave correctly? It rests on three pillars:

Oracle comparison — compare the effects a machine's Fire produces against a trusted reference implementation for the same input.
Golden scenarios — replay committed event sequences and assert final state, effects, and trace.
Round-trip identity — prove a machine built in code and the same machine loaded from JSON (then bound via Provide) behave identically. This pillar exists because the config/implementation split makes the JSON IR a co-equal authoring format; round-trip identity is what makes that promise enforceable.

The first matters most when a machine is meant to be the canonical model of behavior that some other code path also implements — for example, when a state machine is introduced alongside an existing hand-written handler, and you want to prove the two agree before you let the machine take over. The harness is the drift detector.

The shape of the problem

You have two implementations of the same behavior:

The machine — m.Cast(state).Fire(ctx, event) returns effects directly (no IO).
A reference — some existing code path that, given the same input, produces the same effects (typically by publishing messages or writing to a store).

Conformance proves: for the same starting entity and the same event, both sides emit equivalent effects.

flowchart TD
    Entity[precondition entity] --> Snap[snapshot entity]
    Snap --> Ref[run reference implementation<br/>capture effects via intercepting sink]
    Ref --> Reset[reset entity to snapshot]
    Reset --> Fire[m.Cast state then Fire ctx event<br/>capture result.Effects]
    Fire --> Diff[DiffEffects with tolerances]
    Diff --> Verdict{equal?}
    Verdict -->|yes| Pass[MATCH]
    Verdict -->|no| Fail[MISMATCH: side-by-side field diff]

The snapshot-and-reset on a single entity instance is deliberate: running both halves against the same entity avoids cosmetic timestamp/ID drift that two freshly-built fixtures would introduce, so tolerances stay as tight as possible.

Capturing the reference's effects

The reference implementation reaches the outside world through some seam — a publisher, a writer, an RPC client. Swap an intercepting sink into that seam for the duration of the test: it records every effect and performs no real IO, while the reference code proceeds as if the call succeeded. The machine half needs no sink — Fire returns its effects directly.

Equivalence & the diff

DiffEffects compares two ordered slices of captured effects positionally: effect i against effect i. For each pair:

Type check — same message type. A type mismatch is fatal; no field diff is attempted.
Field walk — descend recursively. Scalars compare exactly; timestamps and generated IDs honor tolerances (below); repeated fields compare positionally; map fields compare by key.
Length mismatch — extra entries on either side surface as missing / extra mismatches.

On mismatch the harness prints a side-by-side, field-level diff so the reviewer can see exactly which field diverged and on which side — and whether a field that passed did so only because it was within tolerance:

[FAIL] Approve conformance: effects diverged

  Reference emitted:                    Machine emitted:
  ──────────────────────                ──────────────────────
  ApprovedEvent {                       ApprovedEvent {
    doc_id: "7a3f..."                     doc_id: "3c2d..."   <-- MISMATCH: doc.doc_id
    status: APPROVED                      status: APPROVED
    approved_at: ...01.020Z               approved_at: ...01.024Z   (within +/-100ms tolerance)
  }                                     }

Tolerances

Some fields can differ legitimately. Tolerances are configurable, with sensible defaults:

Field type	Default	Rationale
Timestamp	+/-100 ms	The two halves capture "now" at slightly different moments.
Generated UUID	exact match	If both halves are wired to the same entity, IDs must match — a mismatch is a wiring bug. Opt into "any valid UUID" per-field only where the reference generates an ID the machine can't predict.
float	+/-1e-6	Round-trip noise.
string / bool / enum / bytes	exact	Deterministic by construction.

type Tolerances struct {
    TimestampSkew time.Duration
    FloatEpsilon  float64
    AnyUUID       []string  // field paths that accept any valid UUID
    IgnoreFields  []string  // field paths to skip entirely
}

AnyUUID and IgnoreFields are escape hatches — every entry is a coverage hole, so use them sparingly.

Round-trip identity (pillar 3)

The config/implementation split promises that a machine authored in Go and a machine loaded from JSON are the same machine. Round-trip identity is the test that holds that promise honest. It is a v1-core conformance check, run for every domain machine:

RoundTripIdentity(forged, registry, scenarios):
    1. data := forged.ToJSON()
    2. loaded := LoadFromJSON(data).Provide(registry).Quench()
    3. data2 := loaded.ToJSON()
    4. assert data == data2                         // IR is stable under round-trip
    5. for each scenario s:
           assert s.RunAgainst(forged) == s.RunAgainst(loaded)
           // same final state, same effects (by ref+params), same trace, same errors

Two checks, both required: structural (the IR serializes to a byte-stable form — step 4) and behavioral (every golden scenario produces an identical ScenarioResult against the code-built and the JSON-loaded machine — step 5). Because behavior is named-ref + params bound from the same registry, "identical" is exact, not approximate — no tolerances needed on this pillar. A divergence here means the IR is lossy or the registry binding drifted, and is a hard failure.

Golden scenarios (pillar 2)

Beyond pairwise oracle comparison, the harness replays golden scenarios (the Scenario JSON format from the JSON, Mermaid & DOT discussion). A scenario fires a known event sequence and asserts the final state, the set of emitted effects, the trace length, and that no errors occurred. Golden scenarios are committed fixtures; a change to the machine that breaks one is a visible diff in CI. They also feed pillar 3 — the same scenarios run against both the code-built and JSON-loaded machine.

Runs as a normal test

Conformance tests run inside the standard test command (go test ./...) — no separate pipeline, no new CI step. A conformance failure is an ordinary test failure that blocks merge. When a reference implementation eventually retires (the machine becomes the sole source of truth), its conformance test is deleted in the same change. The mapping is 1:1, so there is no orphan risk.

Why this is the right place to invest

The harness is what lets a machine be introduced safely next to existing behavior: you get a continuously-verified proof that the declarative model and the imperative code agree, field for field, before you cut over. It is also the foundation the Phase 2 Visualizer builds on — the same scenario-runs-against-a-machine path powers a stakeholder scenario builder.

Crucible State Machine series: Overview & Roadmap · Kernel Core · HSM · Path Planning · JSON / Mermaid / DOT · Evolution Guide · Conformance · Phase 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crucible State Machine — Conformance Harness #7

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Crucible State Machine — Conformance Harness #7

Uh oh!

Uh oh!

joshua-temple May 28, 2026 Maintainer

The shape of the problem

Capturing the reference's effects

Equivalence & the diff

Tolerances

Round-trip identity (pillar 3)

Golden scenarios (pillar 2)

Runs as a normal test

Why this is the right place to invest

Replies: 0 comments

joshua-temple
May 28, 2026
Maintainer