From df7e29c01e0e07261a69b2b9b20650defb2a3259 Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Tue, 14 Apr 2026 22:21:47 -0400 Subject: [PATCH 1/8] docs: add RFC for JSONL streaming format Defines a line-oriented JSONL companion to the canonical Toolpath JSON format, optimized for incremental persistence of Path documents. The format supports live agent traces and multi-writer append logs while round-tripping losslessly with canonical JSON. Requires one additive schema change: an optional graph_ref field on PathIdentity so a streamed path can name the graph it belongs to. --- docs/RFC-jsonl.md | 562 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 562 insertions(+) create mode 100644 docs/RFC-jsonl.md diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md new file mode 100644 index 0000000..b2b5f89 --- /dev/null +++ b/docs/RFC-jsonl.md @@ -0,0 +1,562 @@ +# RFC: JSONL Streaming Format for Toolpath + +**Status:** Draft +**Authors:** with Alex Kesling +**Created:** 2026-04-14 +**Extends:** [RFC: Toolpath - A Format for Artifact Transformation Provenance](../RFC.md) + +## Abstract + +This RFC defines a line-oriented JSONL format as a peer to Toolpath's canonical +JSON format, optimized for incremental persistence of `Path` documents. The +format expresses a `Path` as a stream of self-describing lines — each an +instruction that contributes to the final document — so that writers can append +one complete step at a time rather than buffering the entire path before +serializing. + +The JSONL format and the canonical JSON format round-trip losslessly in both +directions. Every field reachable in the canonical JSON is reachable through +the JSONL line set, and signatures computed over canonical JSON remain valid +across any number of `stream ↔ seal` cycles. + +One additive schema change is required: an optional `graph_ref` field on +`PathIdentity` so a streamed path can name the graph it belongs to up front. + +## Motivation + +### The Problem + +The canonical Toolpath format serializes a `Path` as a single JSON document. +Producers must buffer the entire path before emitting a valid document. This +is a poor fit for two recurring use cases: + +1. **Live agent traces.** A single writer (e.g., Claude Code) records steps + as it works, and a consumer tails the file to display progress. With the + canonical format the writer must either defer writing until the session + completes, or repeatedly rewrite a growing JSON blob — neither of which + supports a readable tailed file. + +2. **Multi-writer append logs.** Heterogeneous producers — CI, IDE, + formatter, git hook — all record changes to the same path. Each producer + emits one or more complete steps; no single producer has global knowledge + of the path. Producers cannot coordinate to write a single JSON document. + +Both cases want an append-only, line-oriented file where each line is a +self-describing unit that incrementally constructs a `Path` document. + +### Goals + +1. **Append-only writes.** Writers emit complete lines; no line ever needs + to be rewritten or removed. +2. **Peer format with canonical JSON.** Bidirectional, lossless round-trip. +3. **Complete expressive power.** Every field reachable in canonical JSON is + reachable through the line set. +4. **Signature preservation.** Signatures computed over canonical JSON + survive any number of `stream → file → seal` cycles. +5. **Strict, unambiguous parsing.** Malformed input is a fatal error. No + recovery heuristics. + +### Non-Goals + +1. **Streaming individual steps.** Each step is atomic per line. Partial + step construction (e.g., emitting a step identity now and its diff + later) is out of scope. +2. **Graph-scoped streams.** A JSONL file contains exactly one `Path`. + Graphs compose streaming and non-streaming path files via existing + `$ref` semantics. +3. **Mid-file resync or corruption recovery.** Readers always start from + line 1. +4. **Transport semantics.** The format is equally usable as a storage + format, a tailed log, or a transport payload, but no transport is + specified. + +## Scope + +v1 covers the `Path` document type. `Step` documents are already a single +JSON blob and do not benefit from streaming. `Graph` documents are a +container of path references; a streaming graph is a graph that references +streaming path files. + +## File Structure + +### Extension and Encoding + +| Property | Value | +| -------- | ----- | +| Extension | `.toolpath.jsonl` | +| Encoding | UTF-8 | +| Line terminator | LF (`\n`) | +| Line format | One JSON object per line | + +No blank lines, no comments, no trailing commas. Each line is a single +externally tagged JSON object, mirroring the canonical `Document` envelope: + +``` +{"": } +``` + +Valid variants: `PathOpen`, `Step`, `ActorDef`, `Signature`, `PathMeta`, +`Head`, `PathClose`. + +### Order Constraints + +| Constraint | | +| ---------- | --- | +| `PathOpen` | Exactly once, as line 1. | +| `PathClose` | At most once, as the last line (if present). | +| `Head` | Zero or more; last occurrence wins. | +| `Step`, `ActorDef`, `Signature`, `PathMeta` | Zero or more, any order, between `PathOpen` and `PathClose`. | +| `Signature` with `target: "step:"` | Must appear after the referenced `Step` line. | + +### Parsing Strictness + +Readers MUST treat the following as fatal errors: + +- First line is not a valid `PathOpen`. +- Malformed JSON on any line. +- Unknown variant at the top level of a line. +- Unknown `version` value in `PathOpen`. +- `Signature` targeting a step that has not yet appeared. +- Ambiguous head at EOF when no `Head` line was emitted (see *Sealing*). + +Forward compatibility is handled by bumping the format version in +`PathOpen`, not by tolerating unknown line kinds. Older readers reject +newer files deterministically. + +### Durability + +Writer durability is out of band. The format does not mandate an `fsync` +policy. Implementations typically sit somewhere on this spectrum: + +| Policy | Tradeoff | +| ------ | -------- | +| No sync (OS-buffered) | Fastest. Unflushed lines lost on crash. Appropriate for ephemeral traces. | +| `fsync` per line | Strongest durability. Highest overhead. Appropriate for multi-writer append logs where each line represents a committed fact. | +| `fsync` per batch / on close | Compromise. Suits single-writer agent traces that emit many steps in close succession. | + +POSIX append semantics guarantee atomic concatenation only for writes +less than or equal to `PIPE_BUF` (commonly 4096 bytes) when multiple +writers share a file. Larger lines require external locking or a +single-writer discipline. + +## Line Kinds + +Each line body is a JSON object with the fields below. Fields marked +optional may be omitted. + +### `PathOpen` + +Exactly once, as the first line. Carries the initial path identity and any +path-level metadata known at open time. + +```json +{"PathOpen": { + "version": "1", + "id": "pr-42", + "base": {"uri": "github:org/repo", "ref": "abc123"}, + "graph_ref": "toolpath://archive/release-v2", + "meta": { + "title": "Add email validation", + "source": "github:myorg/myrepo/pull/42", + "intent": "...", + "refs": [{"rel": "fixes", "href": "issue://..."}] + } +}} +``` + +| Field | Required | Description | +| ----- | -------- | ----------- | +| `version` | yes | Format version string. v1 = `"1"`. | +| `id` | yes | `PathIdentity.id`. | +| `base` | no | `PathIdentity.base` — same shape as canonical JSON. | +| `graph_ref` | no | `$ref`-style URL naming a graph this path belongs to. See *Schema Change*. | +| `meta` | no | Initial `PathMeta` excluding `actors` and `signatures` (those have dedicated line kinds). | + +### `Step` + +Zero or more. The body is the existing `Step` JSON shape verbatim — including +`step.meta`, which may carry step-level signatures, intent, refs, or actor +definitions. + +```json +{"Step": { + "step": { + "id": "step-003", + "parents": ["step-002"], + "actor": "agent:claude-code", + "timestamp": "2026-04-14T15:30:00Z" + }, + "change": { + "src/auth/validator.rs": {"raw": "@@ -1,5 +1,25 @@\n..."} + }, + "meta": { + "intent": "Add email validation" + } +}} +``` + +### `ActorDef` + +Zero or more. Inserts or overwrites an entry in `path.meta.actors`. + +```json +{"ActorDef": { + "actor": "human:alex", + "definition": { + "name": "Alex Kesling", + "identities": [{"system": "github", "id": "akesling"}], + "keys": [{"type": "gpg", "fingerprint": "ABCD1234..."}] + } +}} +``` + +Overwrite (rather than merge) is deliberate: an actor definition is the +complete current record for that actor. A writer that wants to add a key +re-emits the full definition. + +### `Signature` + +Zero or more. Appends to the signature array at either path or step scope. + +```json +{"Signature": { + "target": "path", + "signature": { + "signer": "human:bob", + "key": "gpg:EFGH5678", + "scope": "reviewer", + "sig": "-----BEGIN PGP SIGNATURE-----\n...", + "timestamp": "2026-04-14T16:00:00Z" + } +}} +``` + +| `target` | Effect | +| -------- | ------ | +| `"path"` | Append to `path.meta.signatures`. | +| `"step:"` | Append to the named step's `meta.signatures`. Referenced `Step` line MUST already have appeared. | + +### `PathMeta` + +Zero or more. Field-wise merge into `path.meta`. Last-write-wins per field. +Array fields (e.g. `refs`) **replace**; they do not append. + +```json +{"PathMeta": { + "patch": { + "title": "Revised title", + "intent": "...", + "refs": [{"rel": "tracks", "href": "..."}], + "extra": {"custom-field": "..."} + } +}} +``` + +Append-like semantics for `actors` and `signatures` are handled by the +dedicated `ActorDef` and `Signature` line kinds; `PathMeta` does not carry +those fields. + +### `Head` + +Zero or more. Explicitly names the current head step. Last occurrence wins. + +```json +{"Head": {"step_id": "step-005"}} +``` + +### `PathClose` + +Optional terminator. Indicates the writer is done and no more lines will +follow. Readers MAY use this to distinguish a completed file from one still +being appended to. Readers MUST tolerate its absence. + +```json +{"PathClose": {}} +``` + +## Sealing Algorithm (JSONL → JSON) + +Produces a canonical `{"Path": {...}}` document from a JSONL stream. + +``` +state: + path_id, base, graph_ref ← from PathOpen + path_meta ← PathOpen.meta or empty; actors={}, signatures=[] + steps = [] (in arrival order) + step_index = {} (step.id → &steps[i]) + head = null (last Head line's step_id, if any) + +for each line after PathOpen, in order: + Step: + append to steps; record in step_index + ActorDef { actor, definition }: + path_meta.actors[actor] = definition + Signature { target: "path", signature }: + path_meta.signatures.append(signature) + Signature { target: "step:", signature }: + require step_index[id] exists (else fatal error) + step_index[id].meta.signatures.append(signature) + PathMeta { patch }: + for each field in patch: + path_meta[field] = patch[field] # arrays replace + Head { step_id }: + head = step_id + PathClose: + end-of-stream sentinel; no state change + +on EOF: + if head is null: + candidates = { s in steps : no other s' in steps has s.id in s'.parents } + if |candidates| != 1: fatal error ("ambiguous or missing head") + head = the one candidate's id + emit: {"Path": { + "path": { + "id": path_id, + "base": base, + "graph_ref": graph_ref, + "head": head + }, + "steps": steps, + "meta": path_meta + }} +``` + +The single-tip DAG rule applies only when `head` is unspecified. Paths with +intentional dead ends SHOULD emit an explicit `Head` line. + +## Streaming Algorithm (JSON → JSONL) + +Produces a deterministic line sequence from a canonical `Path`. Used for +round-trip testing and for re-streaming stored documents. + +``` +emit PathOpen { + version: "1", + id: path.id, + base: path.base, + graph_ref: path.graph_ref, + meta: { # path.meta minus actors, signatures + title, source, intent, refs, extra + } +} + +for actor_key in sorted(path.meta.actors.keys()): + emit ActorDef { actor: actor_key, definition: path.meta.actors[actor_key] } + +for step in path.steps: # original array order + emit Step { ...step... } + for sig in step.meta.signatures: # original order + emit Signature { target: "step:" + step.id, signature: sig } + +for sig in path.meta.signatures: # original order + emit Signature { target: "path", signature: sig } + +emit Head { step_id: path.head } +emit PathClose {} +``` + +## Round-Trip Guarantees + +- **Seal after stream:** `seal(stream(P)) == P` field-for-field for any + canonical `Path` `P`. +- **Stream after seal:** `stream(seal(L))` may differ from `L` in line + ordering — `stream` emits a normalized order — but sealed documents are + equal: `seal(stream(seal(L))) == seal(L)`. +- **Idempotent normalization:** `stream(seal(stream(P))) == stream(P)` + line-for-line. + +Round-trip is lossless because every canonical JSON field has a +corresponding line kind: + +| Canonical field | Line kind | +| --------------- | --------- | +| `path.id`, `path.base`, `path.graph_ref` | `PathOpen` | +| `path.head` | `Head` (or inferred) | +| `path.meta.title` / `source` / `intent` / `refs` / `extra` | `PathOpen.meta` or `PathMeta` | +| `path.meta.actors` entries | `ActorDef` | +| `path.meta.signatures` entries | `Signature` with `target: "path"` | +| `steps[*]` (entire step including inner `meta`) | `Step` | +| `steps[*].meta.signatures` entries | `Signature` with `target: "step:"` | + +## Signatures + +Canonicalization is unchanged. Signatures are computed over the canonical +JSON form per JCS (RFC 8785), as defined in the base RFC. Because the +JSONL format is lossless, a signature computed on a sealed JSON document +remains valid across any number of `stream ↔ seal` cycles. + +Late-arriving signatures — for example, a reviewer approval added after +the steps are recorded — arrive as `Signature` lines appended to the file. +The signed payload is still the canonical JSON form: the reviewer seals +the current file, computes the signature over the sealed JSON per the +base RFC, and emits a `Signature` line. + +## Schema Change + +One additive change to the canonical schema, in `crates/toolpath/src/types.rs`: + +```rust +pub struct PathIdentity { + pub id: String, + pub base: Option, + pub head: String, + pub graph_ref: Option, // NEW +} +``` + +Backwards compatible: `graph_ref` is optional and omitted when empty. +Existing documents and signatures validate unchanged. This field lets a +path name the graph it belongs to, supporting cross-referencing from +streaming contexts where the containing graph is known up front. + +The value of `graph_ref` uses the same `$ref`-style URL conventions as +`Graph.paths[*].$ref`: + +- `https://...`, `s3://...`, `file:///...` — external references +- `toolpath:///` — named archive references + +## Example + +A minimal streaming session recording a two-step path: + +``` +{"PathOpen":{"version":"1","id":"pr-42","base":{"uri":"github:org/repo","ref":"abc123"},"meta":{"title":"Add email validation"}}} +{"ActorDef":{"actor":"agent:claude-code","definition":{"name":"Claude Code","provider":"anthropic"}}} +{"Step":{"step":{"id":"step-001","actor":"agent:claude-code","timestamp":"2026-04-14T10:00:00Z"},"change":{"src/validator.rs":{"raw":"@@ -0,0 +1,20 @@\n+pub struct Validator..."}},"meta":{"intent":"Add email validation struct"}}} +{"Step":{"step":{"id":"step-002","parents":["step-001"],"actor":"tool:rustfmt","timestamp":"2026-04-14T10:00:05Z"},"change":{"src/validator.rs":{"raw":"@@ -3,3 +3,3 @@\n..."}},"meta":{"intent":"Auto-format"}}} +{"Head":{"step_id":"step-002"}} +{"Signature":{"target":"path","signature":{"signer":"human:alex","key":"gpg:ABCD1234","scope":"author","sig":"-----BEGIN PGP SIGNATURE-----\n..."}}} +{"PathClose":{}} +``` + +Sealing this stream produces a canonical `{"Path": {...}}` document with: + +- `path.id = "pr-42"`, `path.base = {...}`, `path.head = "step-002"` +- `steps = [step-001, step-002]` +- `path.meta.title = "Add email validation"` +- `path.meta.actors = {"agent:claude-code": {...}}` +- `path.meta.signatures = []` + +Streaming that canonical document back produces an equivalent line +sequence (modulo the normalized `stream` ordering). + +## Design Rationale + +### Why atomic steps instead of open/patch/close? + +An earlier sketch considered a `StepOpen` / `StepPatch` / `StepClose` +lifecycle so that a single step could stream in pieces — for example, the +intent becomes known at t=0, the diff at t=20s. This was rejected for v1: + +- Atomic steps preserve append-only semantics — no line is ever + conditional on a later line's arrival. +- Multi-writer append logs (use case B) naturally emit complete steps; + there is no partial-step use case there. +- Live agent traces (use case A) can emit a step as soon as the agent + knows both the artifact and the change. The "progress before diff is + ready" case is addressed by emitting a coarser step on arrival and a + finer step once the diff stabilizes, or by emitting progress updates + outside the toolpath file entirely. + +The mechanism can be added in a future revision via new line kinds, +guarded by a version bump. + +### Why strict parsing? + +Skip-on-error parsing is tempting for a streaming format — readers could +ignore unknown variants from future versions and keep going. It was +rejected because: + +- Toolpath documents are provenance records. Silently skipping an + unrecognized line means silently losing provenance. +- Forward compatibility is handled more transparently by a version bump + in `PathOpen`: an older reader fails fast with a clear message rather + than producing a subtly incorrect sealed document. +- Round-trip losslessness requires that readers see every line. + +### Why one path per file? + +A single file could in principle carry an entire graph — `GraphOpen`, +then interleaved `PathOpen` / `Step` / `PathClose` blocks scoped by a +`path` field on every line. That design was rejected because: + +- The primary use cases (live agent trace, multi-writer log) have one + writer per path. Multiplexing multiple paths into one file adds + coordination complexity that none of the use cases need. +- Graphs already have a composition mechanism (`$ref`). A graph that + references streaming path files is simpler than a graph-scoped stream + and composes with the existing model. +- Per-line path scoping makes every line heavier and makes "tail this + path" harder. + +### Why `graph_ref` on `PathIdentity` rather than only in `PathOpen`? + +A streaming writer that knows its containing graph up front (a release +automation pipeline recording PRs) benefits from stamping `graph_ref` on +open. But round-trip requires the sealed JSON to carry the same +information, which means the canonical `PathIdentity` must have a place +to store it. Adding the field to `PathIdentity` is a small additive +change and keeps the JSONL format a pure serialization of the canonical +model. + +### Why `PathMeta.patch` arrays replace instead of append? + +Last-write-wins per field is simple and symmetric. Append semantics for +arrays creates asymmetry (`refs` appends but scalars replace), and the +fields most likely to "grow over time" — `actors`, `signatures` — already +have dedicated line kinds with the right semantics. Writers that want to +add a ref without disturbing existing refs emit a `PathMeta` line with the +complete new array. + +## Compatibility + +- **Canonical JSON readers** are unaffected. The JSONL format is a peer, + not a replacement. +- **Existing canonical documents** remain valid. The only schema change — + `graph_ref` on `PathIdentity` — is additive and optional. +- **Existing signatures** remain valid. Canonicalization is unchanged. +- **Tooling** that operates on canonical JSON (validate, render, query) + can accept `.toolpath.jsonl` input by sealing internally before + operating. + +## Open Questions + +### Should `stream` canonicalize line bodies with JCS? + +A stronger "normalized stream" guarantee would canonicalize each line +body per JCS, so that `stream(P)` is byte-identical across +implementations. This would allow signing a stream directly rather than +signing the sealed JSON. Current answer: not in v1. The canonical signing +path (seal → JCS → sign) is sufficient and avoids introducing a second +canonicalization rule. + +### Should `seal` warn or error on a truncated file? + +A file missing `PathClose` and ending mid-line (the last line is not +newline-terminated) is ambiguous: still being written, or crashed +mid-write? Current answer: `seal` treats an unterminated final line as a +fatal error and treats EOF-without-`PathClose` as informational. Readers +tailing a live file use heuristics (file still open, recent mtime) to +distinguish the cases. + +### Should there be a `StepRef` line for late step-level metadata? + +A writer might learn, after emitting a step, that the step's `meta.intent` +should change or that a new ref should be attached. v1 has no mechanism +for this — a step is immutable once emitted. A future `StepPatch` line +could cover the case, guarded by a version bump. + +## Prior Art + +- **JSON Lines** (jsonlines.org): Line-oriented JSON for log files. + Direct inspiration for the container format. +- **NDJSON**: Same idea, different name. Widely used in data pipelines. +- **Server-Sent Events (SSE)**: Line-oriented event streams over HTTP. + The `event:` / `data:` framing influenced the variant-per-line + approach. +- **Git pack protocol**: Variant-tagged messages over a stream. Similar + structural pattern at a lower level of abstraction. +- **OpenTelemetry OTLP**: Structured append-only telemetry streams. + Shares the "each record is self-describing" property. +- **JCS (RFC 8785)**: JSON Canonicalization Scheme. Used unchanged for + the signature path. From 76449ed64f8131cb77e95877a84b9539f21780a2 Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:11:39 -0400 Subject: [PATCH 2/8] docs: add file extension conventions and $ref constraints - Define .path.json (canonical) and .path.jsonl (streaming) extensions - Replace .toolpath.jsonl with .path.jsonl in the JSONL RFC - Add constraint: graph $ref entries must point to sealed .path.json - Add File Extensions section to the base RFC --- RFC.md | 17 +++++++++++++++++ docs/RFC-jsonl.md | 31 ++++++++++++++++++++++++++++--- 2 files changed, 45 insertions(+), 3 deletions(-) diff --git a/RFC.md b/RFC.md index 855e85f..e9948db 100644 --- a/RFC.md +++ b/RFC.md @@ -103,6 +103,23 @@ the inner fields. PascalCase variant names visually distinguish the type tag from the lowercase structural fields inside (`step`, `path`, `graph`). +### File Extensions + +Toolpath documents use a two-part extension encoding the document type and +serialization format: + +| Extension | Document type | Description | +| --------- | ------------- | ----------- | +| `.path.json` | Path (canonical) | A complete `{"Path": {...}}` JSON document. | +| `.path.jsonl` | Path (streaming) | A line-oriented JSONL stream that seals to a `Path`. See the [JSONL Streaming RFC](docs/RFC-jsonl.md). | + +`Step` and `Graph` documents use plain `.json` files with the appropriate +`{"Step": ...}` or `{"Graph": ...}` envelope. Only `Path` has a streaming +peer format. + +Graph `$ref` entries MUST point to sealed `.path.json` files, not to +`.path.jsonl` streams. + ### ID Uniqueness IDs must be unique within their containing scope: diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md index b2b5f89..379ce15 100644 --- a/docs/RFC-jsonl.md +++ b/docs/RFC-jsonl.md @@ -77,13 +77,38 @@ JSON blob and do not benefit from streaming. `Graph` documents are a container of path references; a streaming graph is a graph that references streaming path files. +## File Extensions + +Toolpath `Path` documents use a two-part extension that encodes both the +document type and the serialization format: + +| Extension | Format | Description | +| --------- | ------ | ----------- | +| `.path.json` | Canonical JSON | A complete `Path` document as a single `{"Path": {...}}` JSON blob. This is the "whole" format — the entire path is buffered and serialized at once. | +| `.path.jsonl` | Streaming JSONL | A `Path` document expressed as a sequence of self-describing JSON lines, one per line. This is the streaming format defined by this RFC. | + +Both extensions identify `Path` documents. The suffix (`.json` vs `.jsonl`) +distinguishes the serialization strategy. Tools that accept `.path.jsonl` +input seal it internally to produce the same in-memory representation as +reading a `.path.json` file. + +`Step` and `Graph` documents retain their existing conventions (`.json` +extension, `{"Step": ...}` / `{"Graph": ...}` envelope). Only `Path` has a +streaming peer format. + +Graph `$ref` entries MUST point to sealed `.path.json` files, not to +`.path.jsonl` streams. A `$ref` is a promise that the target is a complete, +valid document; a streaming file may be incomplete, unsealed, or mid-write. +Tools that consume `.path.jsonl` files should seal them before incorporating +them into a graph. + ## File Structure -### Extension and Encoding +### Encoding | Property | Value | | -------- | ----- | -| Extension | `.toolpath.jsonl` | +| Extension | `.path.jsonl` | | Encoding | UTF-8 | | Line terminator | LF (`\n`) | | Line format | One JSON object per line | @@ -516,7 +541,7 @@ complete new array. `graph_ref` on `PathIdentity` — is additive and optional. - **Existing signatures** remain valid. Canonicalization is unchanged. - **Tooling** that operates on canonical JSON (validate, render, query) - can accept `.toolpath.jsonl` input by sealing internally before + can accept `.path.jsonl` input by sealing internally before operating. ## Open Questions From 5cc2ebf9978a2e1123bff4517a3053bee66cd72d Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:16:33 -0400 Subject: [PATCH 3/8] docs: add Eliot Hedeman to JSONL RFC authors --- docs/RFC-jsonl.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md index 379ce15..8cfb7a1 100644 --- a/docs/RFC-jsonl.md +++ b/docs/RFC-jsonl.md @@ -1,7 +1,7 @@ # RFC: JSONL Streaming Format for Toolpath **Status:** Draft -**Authors:** with Alex Kesling +**Authors:** Eliot Hedeman , Alex Kesling **Created:** 2026-04-14 **Extends:** [RFC: Toolpath - A Format for Artifact Transformation Provenance](../RFC.md) From 03ebbdde8847499f8be392e44a932649dff6bda5 Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:31:12 -0400 Subject: [PATCH 4/8] docs: remove multi-writer use case from JSONL RFC Multi-writer append logs are not a realistic use case. Simplify the motivation, durability, and design rationale sections to focus solely on the live agent trace use case. --- docs/RFC-jsonl.md | 41 ++++++++++++++--------------------------- 1 file changed, 14 insertions(+), 27 deletions(-) diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md index 8cfb7a1..773ce98 100644 --- a/docs/RFC-jsonl.md +++ b/docs/RFC-jsonl.md @@ -28,21 +28,15 @@ One additive schema change is required: an optional `graph_ref` field on The canonical Toolpath format serializes a `Path` as a single JSON document. Producers must buffer the entire path before emitting a valid document. This -is a poor fit for two recurring use cases: +is a poor fit for live agent traces: a single writer (e.g., Claude Code) +records steps as it works, and a consumer tails the file to display +progress. With the canonical format the writer must either defer writing +until the session completes, or repeatedly rewrite a growing JSON blob — +neither of which supports a readable tailed file. -1. **Live agent traces.** A single writer (e.g., Claude Code) records steps - as it works, and a consumer tails the file to display progress. With the - canonical format the writer must either defer writing until the session - completes, or repeatedly rewrite a growing JSON blob — neither of which - supports a readable tailed file. - -2. **Multi-writer append logs.** Heterogeneous producers — CI, IDE, - formatter, git hook — all record changes to the same path. Each producer - emits one or more complete steps; no single producer has global knowledge - of the path. Producers cannot coordinate to write a single JSON document. - -Both cases want an append-only, line-oriented file where each line is a -self-describing unit that incrementally constructs a `Path` document. +The streaming format provides an append-only, line-oriented file where +each line is a self-describing unit that incrementally constructs a `Path` +document. ### Goals @@ -156,13 +150,8 @@ policy. Implementations typically sit somewhere on this spectrum: | Policy | Tradeoff | | ------ | -------- | | No sync (OS-buffered) | Fastest. Unflushed lines lost on crash. Appropriate for ephemeral traces. | -| `fsync` per line | Strongest durability. Highest overhead. Appropriate for multi-writer append logs where each line represents a committed fact. | -| `fsync` per batch / on close | Compromise. Suits single-writer agent traces that emit many steps in close succession. | - -POSIX append semantics guarantee atomic concatenation only for writes -less than or equal to `PIPE_BUF` (commonly 4096 bytes) when multiple -writers share a file. Larger lines require external locking or a -single-writer discipline. +| `fsync` per line | Strongest durability. Highest overhead. | +| `fsync` per batch / on close | Compromise. Suits agent traces that emit many steps in close succession. | ## Line Kinds @@ -475,9 +464,7 @@ intent becomes known at t=0, the diff at t=20s. This was rejected for v1: - Atomic steps preserve append-only semantics — no line is ever conditional on a later line's arrival. -- Multi-writer append logs (use case B) naturally emit complete steps; - there is no partial-step use case there. -- Live agent traces (use case A) can emit a step as soon as the agent +- Live agent traces can emit a step as soon as the agent knows both the artifact and the change. The "progress before diff is ready" case is addressed by emitting a coarser step on arrival and a finer step once the diff stabilizes, or by emitting progress updates @@ -505,9 +492,9 @@ A single file could in principle carry an entire graph — `GraphOpen`, then interleaved `PathOpen` / `Step` / `PathClose` blocks scoped by a `path` field on every line. That design was rejected because: -- The primary use cases (live agent trace, multi-writer log) have one - writer per path. Multiplexing multiple paths into one file adds - coordination complexity that none of the use cases need. +- The primary use case (live agent trace) has one writer per path. + Multiplexing multiple paths into one file adds coordination + complexity that the use case doesn't need. - Graphs already have a composition mechanism (`$ref`). A graph that references streaming path files is simpler than a graph-scoped stream and composes with the existing model. From 31befa617e0603b9b77d138fd971d26f818fc7f9 Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:35:02 -0400 Subject: [PATCH 5/8] docs: correct round-trip guarantees in JSONL RFC Sealing collapses intermediate PathMeta overwrites and does not preserve line ordering, so JSONL -> JSON -> JSONL is lossy. The actual guarantees are: - JSON -> JSONL -> JSON is lossless: seal(stream(P)) == P - The sealed form is stable: seal(stream(seal(L))) == seal(L) Updated the abstract, goals, round-trip section, signatures section, and strict parsing rationale to reflect this. --- docs/RFC-jsonl.md | 50 ++++++++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 18 deletions(-) diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md index 773ce98..5a001e6 100644 --- a/docs/RFC-jsonl.md +++ b/docs/RFC-jsonl.md @@ -14,10 +14,11 @@ instruction that contributes to the final document — so that writers can appen one complete step at a time rather than buffering the entire path before serializing. -The JSONL format and the canonical JSON format round-trip losslessly in both -directions. Every field reachable in the canonical JSON is reachable through -the JSONL line set, and signatures computed over canonical JSON remain valid -across any number of `stream ↔ seal` cycles. +Any canonical JSON `Path` can be streamed to JSONL and sealed back to an +identical document (`seal(stream(P)) == P`). The reverse is lossy — sealing +collapses intermediate metadata — but the sealed form is stable: +`seal(stream(seal(L))) == seal(L)`. Signatures computed over canonical JSON +remain valid after any number of `seal → stream → seal` cycles. One additive schema change is required: an optional `graph_ref` field on `PathIdentity` so a streamed path can name the graph it belongs to up front. @@ -42,7 +43,8 @@ document. 1. **Append-only writes.** Writers emit complete lines; no line ever needs to be rewritten or removed. -2. **Peer format with canonical JSON.** Bidirectional, lossless round-trip. +2. **Peer format with canonical JSON.** JSON → JSONL → JSON is lossless; + the sealed form is stable across cycles. 3. **Complete expressive power.** Every field reachable in canonical JSON is reachable through the line set. 4. **Signature preservation.** Signatures computed over canonical JSON @@ -371,16 +373,28 @@ emit PathClose {} ## Round-Trip Guarantees -- **Seal after stream:** `seal(stream(P)) == P` field-for-field for any - canonical `Path` `P`. -- **Stream after seal:** `stream(seal(L))` may differ from `L` in line - ordering — `stream` emits a normalized order — but sealed documents are - equal: `seal(stream(seal(L))) == seal(L)`. -- **Idempotent normalization:** `stream(seal(stream(P))) == stream(P)` - line-for-line. +Sealing is a lossy operation on JSONL: `PathMeta` last-write-wins +semantics collapse intermediate metadata states, and line ordering is +not preserved. This means JSONL → JSON → JSONL does not reproduce the +original line sequence. However, the *sealed* form is stable: -Round-trip is lossless because every canonical JSON field has a -corresponding line kind: +- **JSON → JSONL → JSON:** `seal(stream(P)) == P` for any canonical + `Path` `P`. The streaming algorithm emits every field, and sealing + reconstructs the original document. +- **JSONL → JSON → JSONL → JSON:** `seal(stream(seal(L))) == seal(L)` + for any valid JSONL stream `L`. Sealing collapses metadata and + resolves head; re-streaming and re-sealing produces the same + canonical document. + +The format does **not** guarantee: + +- **JSONL round-trip:** `stream(seal(L))` may differ from `L` in line + count, line ordering, and metadata line content (intermediate + `PathMeta` and `ActorDef` overwrites are collapsed by sealing). +- **Canonical JSONL:** Two different JSONL streams may seal to the same + JSON document. There is no unique JSONL representation. + +Every canonical JSON field has a corresponding line kind: | Canonical field | Line kind | | --------------- | --------- | @@ -395,9 +409,9 @@ corresponding line kind: ## Signatures Canonicalization is unchanged. Signatures are computed over the canonical -JSON form per JCS (RFC 8785), as defined in the base RFC. Because the -JSONL format is lossless, a signature computed on a sealed JSON document -remains valid across any number of `stream ↔ seal` cycles. +JSON form per JCS (RFC 8785), as defined in the base RFC. Because +`seal(stream(seal(L))) == seal(L)`, a signature computed on a sealed JSON +document remains valid after any number of `seal → stream → seal` cycles. Late-arriving signatures — for example, a reviewer approval added after the steps are recorded — arrive as `Signature` lines appended to the file. @@ -484,7 +498,7 @@ rejected because: - Forward compatibility is handled more transparently by a version bump in `PathOpen`: an older reader fails fast with a clear message rather than producing a subtly incorrect sealed document. -- Round-trip losslessness requires that readers see every line. +- Stable sealing requires that readers see every line. ### Why one path per file? From 7b0cba9cba21db5315a05c7edbf9e40cf2edc891 Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:39:37 -0400 Subject: [PATCH 6/8] docs: replace seal/stream terminology with read/write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The "sealing" and "streaming" abstractions added ceremony without value — they're just "how to read a JSONL file" and "how to write a JSONL file." Renamed throughout: - Sealing Algorithm -> Reading JSONL - Streaming Algorithm -> Writing JSONL - seal()/stream() -> read()/write() in round-trip identities - Updated all references in abstract, goals, signatures, compatibility, open questions, and design rationale --- docs/RFC-jsonl.md | 110 +++++++++++++++++++++++----------------------- 1 file changed, 56 insertions(+), 54 deletions(-) diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md index 5a001e6..dd35771 100644 --- a/docs/RFC-jsonl.md +++ b/docs/RFC-jsonl.md @@ -14,11 +14,11 @@ instruction that contributes to the final document — so that writers can appen one complete step at a time rather than buffering the entire path before serializing. -Any canonical JSON `Path` can be streamed to JSONL and sealed back to an -identical document (`seal(stream(P)) == P`). The reverse is lossy — sealing -collapses intermediate metadata — but the sealed form is stable: -`seal(stream(seal(L))) == seal(L)`. Signatures computed over canonical JSON -remain valid after any number of `seal → stream → seal` cycles. +Any canonical JSON `Path` can be written as JSONL and read back to an +identical document. The reverse direction is lossy — reading collapses +intermediate metadata updates — but the canonical form is stable across +conversion cycles. Signatures computed over canonical JSON remain valid +after any number of JSON → JSONL → JSON conversions. One additive schema change is required: an optional `graph_ref` field on `PathIdentity` so a streamed path can name the graph it belongs to up front. @@ -44,11 +44,11 @@ document. 1. **Append-only writes.** Writers emit complete lines; no line ever needs to be rewritten or removed. 2. **Peer format with canonical JSON.** JSON → JSONL → JSON is lossless; - the sealed form is stable across cycles. + the canonical form is stable across conversion cycles. 3. **Complete expressive power.** Every field reachable in canonical JSON is reachable through the line set. 4. **Signature preservation.** Signatures computed over canonical JSON - survive any number of `stream → file → seal` cycles. + survive any number of JSON → JSONL → JSON conversion cycles. 5. **Strict, unambiguous parsing.** Malformed input is a fatal error. No recovery heuristics. @@ -85,8 +85,8 @@ document type and the serialization format: Both extensions identify `Path` documents. The suffix (`.json` vs `.jsonl`) distinguishes the serialization strategy. Tools that accept `.path.jsonl` -input seal it internally to produce the same in-memory representation as -reading a `.path.json` file. +input read it into the same in-memory representation as a `.path.json` +file. `Step` and `Graph` documents retain their existing conventions (`.json` extension, `{"Step": ...}` / `{"Graph": ...}` envelope). Only `Path` has a @@ -94,9 +94,9 @@ streaming peer format. Graph `$ref` entries MUST point to sealed `.path.json` files, not to `.path.jsonl` streams. A `$ref` is a promise that the target is a complete, -valid document; a streaming file may be incomplete, unsealed, or mid-write. -Tools that consume `.path.jsonl` files should seal them before incorporating -them into a graph. +valid document; a streaming file may be incomplete or mid-write. +Tools that consume `.path.jsonl` files should convert them to +`.path.json` before incorporating them into a graph. ## File Structure @@ -138,7 +138,7 @@ Readers MUST treat the following as fatal errors: - Unknown variant at the top level of a line. - Unknown `version` value in `PathOpen`. - `Signature` targeting a step that has not yet appeared. -- Ambiguous head at EOF when no `Head` line was emitted (see *Sealing*). +- Ambiguous head at EOF when no `Head` line was emitted (see *Reading JSONL*). Forward compatibility is handled by bumping the format version in `PathOpen`, not by tolerating unknown line kinds. Older readers reject @@ -290,9 +290,10 @@ being appended to. Readers MUST tolerate its absence. {"PathClose": {}} ``` -## Sealing Algorithm (JSONL → JSON) +## Reading JSONL -Produces a canonical `{"Path": {...}}` document from a JSONL stream. +How a reader produces a canonical `{"Path": {...}}` document from a +`.path.jsonl` file. ``` state: @@ -340,10 +341,10 @@ on EOF: The single-tip DAG rule applies only when `head` is unspecified. Paths with intentional dead ends SHOULD emit an explicit `Head` line. -## Streaming Algorithm (JSON → JSONL) +## Writing JSONL -Produces a deterministic line sequence from a canonical `Path`. Used for -round-trip testing and for re-streaming stored documents. +How a writer produces a `.path.jsonl` file from a canonical `Path`. +Used for converting stored documents and for round-trip testing. ``` emit PathOpen { @@ -373,25 +374,25 @@ emit PathClose {} ## Round-Trip Guarantees -Sealing is a lossy operation on JSONL: `PathMeta` last-write-wins -semantics collapse intermediate metadata states, and line ordering is -not preserved. This means JSONL → JSON → JSONL does not reproduce the -original line sequence. However, the *sealed* form is stable: - -- **JSON → JSONL → JSON:** `seal(stream(P)) == P` for any canonical - `Path` `P`. The streaming algorithm emits every field, and sealing - reconstructs the original document. -- **JSONL → JSON → JSONL → JSON:** `seal(stream(seal(L))) == seal(L)` - for any valid JSONL stream `L`. Sealing collapses metadata and - resolves head; re-streaming and re-sealing produces the same +Reading JSONL is a lossy operation: `PathMeta` last-write-wins semantics +collapse intermediate metadata states, and line ordering is not preserved. +This means JSONL → JSON → JSONL does not reproduce the original line +sequence. However, the canonical JSON form is stable: + +- **JSON → JSONL → JSON:** `read(write(P)) == P` for any canonical + `Path` `P`. Writing emits every field, and reading reconstructs the + original document. +- **JSONL → JSON → JSONL → JSON:** `read(write(read(L))) == read(L)` + for any valid JSONL stream `L`. Reading collapses metadata and + resolves head; converting back and reading again produces the same canonical document. The format does **not** guarantee: -- **JSONL round-trip:** `stream(seal(L))` may differ from `L` in line +- **JSONL round-trip:** `write(read(L))` may differ from `L` in line count, line ordering, and metadata line content (intermediate - `PathMeta` and `ActorDef` overwrites are collapsed by sealing). -- **Canonical JSONL:** Two different JSONL streams may seal to the same + `PathMeta` and `ActorDef` overwrites are collapsed when read). +- **Canonical JSONL:** Two different JSONL files may read to the same JSON document. There is no unique JSONL representation. Every canonical JSON field has a corresponding line kind: @@ -410,14 +411,15 @@ Every canonical JSON field has a corresponding line kind: Canonicalization is unchanged. Signatures are computed over the canonical JSON form per JCS (RFC 8785), as defined in the base RFC. Because -`seal(stream(seal(L))) == seal(L)`, a signature computed on a sealed JSON -document remains valid after any number of `seal → stream → seal` cycles. +`read(write(read(L))) == read(L)`, a signature computed on a canonical +JSON document remains valid after any number of JSON → JSONL → JSON +conversion cycles. Late-arriving signatures — for example, a reviewer approval added after the steps are recorded — arrive as `Signature` lines appended to the file. -The signed payload is still the canonical JSON form: the reviewer seals -the current file, computes the signature over the sealed JSON per the -base RFC, and emits a `Signature` line. +The signed payload is still the canonical JSON form: the reviewer reads +the current JSONL file into canonical JSON, computes the signature per +the base RFC, and emits a `Signature` line. ## Schema Change @@ -457,7 +459,7 @@ A minimal streaming session recording a two-step path: {"PathClose":{}} ``` -Sealing this stream produces a canonical `{"Path": {...}}` document with: +Reading this JSONL file produces a canonical `{"Path": {...}}` document with: - `path.id = "pr-42"`, `path.base = {...}`, `path.head = "step-002"` - `steps = [step-001, step-002]` @@ -465,8 +467,8 @@ Sealing this stream produces a canonical `{"Path": {...}}` document with: - `path.meta.actors = {"agent:claude-code": {...}}` - `path.meta.signatures = []` -Streaming that canonical document back produces an equivalent line -sequence (modulo the normalized `stream` ordering). +Writing that canonical document back to JSONL produces an equivalent +line sequence (modulo the normalized write ordering). ## Design Rationale @@ -497,8 +499,8 @@ rejected because: unrecognized line means silently losing provenance. - Forward compatibility is handled more transparently by a version bump in `PathOpen`: an older reader fails fast with a clear message rather - than producing a subtly incorrect sealed document. -- Stable sealing requires that readers see every line. + than producing a subtly incorrect document. +- Correct reading requires that readers see every line. ### Why one path per file? @@ -542,26 +544,26 @@ complete new array. `graph_ref` on `PathIdentity` — is additive and optional. - **Existing signatures** remain valid. Canonicalization is unchanged. - **Tooling** that operates on canonical JSON (validate, render, query) - can accept `.path.jsonl` input by sealing internally before - operating. + can accept `.path.jsonl` input by reading it into canonical form + before operating. ## Open Questions -### Should `stream` canonicalize line bodies with JCS? +### Should writers canonicalize line bodies with JCS? -A stronger "normalized stream" guarantee would canonicalize each line -body per JCS, so that `stream(P)` is byte-identical across -implementations. This would allow signing a stream directly rather than -signing the sealed JSON. Current answer: not in v1. The canonical signing -path (seal → JCS → sign) is sufficient and avoids introducing a second -canonicalization rule. +A stronger "normalized write" guarantee would canonicalize each line +body per JCS, so that writing a given `Path` produces byte-identical +JSONL across implementations. This would allow signing a JSONL file +directly rather than signing the canonical JSON. Current answer: not in +v1. The canonical signing path (read → JCS → sign) is sufficient and +avoids introducing a second canonicalization rule. -### Should `seal` warn or error on a truncated file? +### Should readers warn or error on a truncated file? A file missing `PathClose` and ending mid-line (the last line is not newline-terminated) is ambiguous: still being written, or crashed -mid-write? Current answer: `seal` treats an unterminated final line as a -fatal error and treats EOF-without-`PathClose` as informational. Readers +mid-write? Current answer: readers treat an unterminated final line as a +fatal error and treat EOF-without-`PathClose` as informational. Readers tailing a live file use heuristics (file still open, recent mtime) to distinguish the cases. From 5856835f357bf4f93b4f8fc7a9292b01abefcc9f Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:47:20 -0400 Subject: [PATCH 7/8] fix: resolve clippy errors for Rust 1.95 - Collapse if-blocks into match guards (collapsible_match) - Use sort_by_key with Reverse instead of sort_by (unnecessary_sort_by) --- crates/toolpath-claude/src/derive.rs | 25 ++++++++++++------------- crates/toolpath-claude/src/io.rs | 2 +- crates/toolpath-claude/src/lib.rs | 4 ++-- 3 files changed, 15 insertions(+), 16 deletions(-) diff --git a/crates/toolpath-claude/src/derive.rs b/crates/toolpath-claude/src/derive.rs index d7b0d91..c39b846 100644 --- a/crates/toolpath-claude/src/derive.rs +++ b/crates/toolpath-claude/src/derive.rs @@ -95,15 +95,16 @@ pub fn derive_path(conversation: &Conversation, config: &DeriveConfig) -> Path { Some(MessageContent::Parts(parts)) => { for part in parts { match part { - ContentPart::Text { text } => { - if !text.trim().is_empty() { - text_parts.push(text.clone()); - } + ContentPart::Text { text } + if !text.trim().is_empty() => + { + text_parts.push(text.clone()); } - ContentPart::Thinking { thinking, .. } => { - if config.include_thinking && !thinking.trim().is_empty() { - text_parts.push(format!("[thinking] {}", thinking)); - } + ContentPart::Thinking { thinking, .. } + if config.include_thinking + && !thinking.trim().is_empty() => + { + text_parts.push(format!("[thinking] {}", thinking)); } ContentPart::ToolUse { name, input, .. } => { tool_uses.push(name.clone()); @@ -127,12 +128,10 @@ pub fn derive_path(conversation: &Conversation, config: &DeriveConfig) -> Path { } } } - Some(MessageContent::Text(text)) => { - if !text.trim().is_empty() { - text_parts.push(text.clone()); - } + Some(MessageContent::Text(text)) if !text.trim().is_empty() => { + text_parts.push(text.clone()); } - None => {} + _ => {} } // Skip entries with no conversation content and no file changes diff --git a/crates/toolpath-claude/src/io.rs b/crates/toolpath-claude/src/io.rs index 5ac6dd6..a044035 100644 --- a/crates/toolpath-claude/src/io.rs +++ b/crates/toolpath-claude/src/io.rs @@ -58,7 +58,7 @@ impl ConvoIO { } } - metadata.sort_by(|a, b| b.last_activity.cmp(&a.last_activity)); + metadata.sort_by_key(|m| std::cmp::Reverse(m.last_activity)); Ok(metadata) } diff --git a/crates/toolpath-claude/src/lib.rs b/crates/toolpath-claude/src/lib.rs index a24ad76..b793f3a 100644 --- a/crates/toolpath-claude/src/lib.rs +++ b/crates/toolpath-claude/src/lib.rs @@ -240,7 +240,7 @@ impl ClaudeConvo { } } - metadata.sort_by(|a, b| b.last_activity.cmp(&a.last_activity)); + metadata.sort_by_key(|m| std::cmp::Reverse(m.last_activity)); Ok(metadata) } @@ -316,7 +316,7 @@ impl ClaudeConvo { } } - conversations.sort_by(|a, b| b.last_activity.cmp(&a.last_activity)); + conversations.sort_by_key(|c| std::cmp::Reverse(c.last_activity)); Ok(conversations) } From 91fc92f1822783c368aa2c23a1d19409bca97457 Mon Sep 17 00:00:00 2001 From: Eliot Hedeman Date: Thu, 16 Apr 2026 11:49:05 -0400 Subject: [PATCH 8/8] docs: unknown JSONL variants warn instead of error Forward compatibility matters more than strict rejection. Older readers should produce a correct (if incomplete) view of a path written by a newer producer, rather than refusing to read it. Unknown lines are skipped with a warning. Version bumps remain available for truly incompatible structural changes. --- docs/RFC-jsonl.md | 49 +++++++++++++++++++++++++++++++---------------- 1 file changed, 32 insertions(+), 17 deletions(-) diff --git a/docs/RFC-jsonl.md b/docs/RFC-jsonl.md index dd35771..52c2ada 100644 --- a/docs/RFC-jsonl.md +++ b/docs/RFC-jsonl.md @@ -135,14 +135,24 @@ Readers MUST treat the following as fatal errors: - First line is not a valid `PathOpen`. - Malformed JSON on any line. -- Unknown variant at the top level of a line. -- Unknown `version` value in `PathOpen`. - `Signature` targeting a step that has not yet appeared. - Ambiguous head at EOF when no `Head` line was emitted (see *Reading JSONL*). -Forward compatibility is handled by bumping the format version in -`PathOpen`, not by tolerating unknown line kinds. Older readers reject -newer files deterministically. +Readers SHOULD warn on, but MUST skip, the following: + +- Unknown variant at the top level of a line. +- Unknown `version` value in `PathOpen`. + +Unknown lines are preserved in memory when possible (implementations +MAY store them as opaque JSON) so that a JSON → JSONL → JSON round-trip +through a newer-format file does not silently discard data. However, +readers are not required to interpret unknown lines, and unknown lines +do not affect the canonical `Path` produced by reading. + +This approach favors forward compatibility: a file written by a newer +producer remains readable by an older consumer, which gets a correct +(if incomplete) view of the path. Version bumps in `PathOpen` signal +structural changes that older readers cannot safely ignore. ### Durability @@ -489,18 +499,23 @@ intent becomes known at t=0, the diff at t=20s. This was rejected for v1: The mechanism can be added in a future revision via new line kinds, guarded by a version bump. -### Why strict parsing? - -Skip-on-error parsing is tempting for a streaming format — readers could -ignore unknown variants from future versions and keep going. It was -rejected because: - -- Toolpath documents are provenance records. Silently skipping an - unrecognized line means silently losing provenance. -- Forward compatibility is handled more transparently by a version bump - in `PathOpen`: an older reader fails fast with a clear message rather - than producing a subtly incorrect document. -- Correct reading requires that readers see every line. +### Why skip unknown variants instead of rejecting them? + +An earlier draft treated unknown variants as fatal errors, reasoning that +silently skipping a line means silently losing provenance. This was +reversed because: + +- Forward compatibility matters more in practice. A file written by a + newer producer should be readable by an older consumer — the older + reader gets a correct (if incomplete) view of the path rather than + refusing to read the file at all. +- Provenance is not lost — the unknown lines are still in the file. + A reader that understands them will interpret them correctly. An + older reader that skips them produces the same result it would have + produced before the new line kind existed. +- Version bumps in `PathOpen` remain available for structural changes + that older readers truly cannot handle (e.g., changes to `Step` + semantics or ordering constraints). ### Why one path per file?