WIP / investigation: lockfile format experiments (don't merge — measurement harness only) by laulauland · Pull Request #31 · fiberplane/drift

laulauland · 2026-05-12T10:27:48Z

This PR is for reference, not for merging. It captures the investigation that led to PR #30 (the TOML lockfile format).

What's in this branch

Eight commits, all under test/property/, gated on -Dformat-experiment=true so they don't slow the normal test suite:

feat(tests): adopt minish for property-based testing — wires minish as a test-only build dep. Seed defaults to first 16 hex chars of git HEAD; -Dminish-seed=<u64> for reproducibility.
fix(lockfile): canonicalize metadata field order on serialize — the six-line fix that sorts metadata by key on write. Removes a confound from the measurements that follow.
test(lockfile): property 1 — serialize is invariant under binding+field reorder — 200 iterations, asserts byte-identical serialize across binding+field reorders.
test(lockfile): property 2 — disjoint edits merge cleanly or conflict-but-never-corrupt — randomized merge oracle on disjoint edit scripts via git merge-file. Measured ~40% spurious conflict rate on the original format, 0 hard-fails across 400 trials.
test(experiment): measure spurious-conflict rate across lockfile format variants — V0–V3 baseline + multi-line variants.
test(experiment): add 5 more format variants + sample dump — V4–V8 (TOML flat, YAML nested, HR-separator, aligned columns, INI blocks) + a sample-output dump for visual comparison.
test(experiment): TOML variant serde benchmark (A flat, B nested, C grouped) — focused serde benchmark: 200 bindings × 200 iterations, measuring serialize/parse latency and peak memory across the three TOML arrangements.
test(experiment): add V9 (TOML nested) + V10 (TOML grouped) to merge oracle — completes the TOML variant comparison; data showed grouped narrowly wins on merge rate (~25.0% vs ~25.2% for flat) but the gap is within noise.

Findings

~40% of disjoint edits on the original line-based format produced spurious git conflicts. 0 silent corruptions across 400 trials.
All multi-line variants cluster at 25–31% conflict rate. Floor is structural (two inserts at the same sort anchor are unfixable without semantic merge).
TOML serializes ~2× faster than the original line format (V0 does render-then-sort; TOML sorts once and streams).
Among TOML arrangements, the flat `[[bindings]]` shape was chosen for PR feat(lockfile): switch drift.lock to versioned TOML array-of-tables format #30 — measurement-noise difference vs grouped, simpler to deserialize, easier to evolve forward.

How to run

```
zig build test -Dformat-experiment=true
```

Adds ~5s to the test suite. Off by default.

Why this PR is closed/draft

The experimental measurement harness isn't intended to ship. The findings informed PR #30 (the actual format change). This branch is preserved for reference and for anyone who wants to re-run the experiments.

Wire minish v0.3.0 into the build as a test-only module. Seed defaults to the first 16 hex chars of the current git HEAD so each commit explores a fresh slice of the state space; override with -Dminish-seed=<u64> to reproduce a specific failing run. Plumbed through helpers.minish_seed so property tests share one source of truth. Smoke test runs 25 iterations of a trivial int property to catch regressions in the wiring (import, module shape, seed plumbing). DRIFT-mshkigwb

Bindings were serialized with metadata fields in insertion order (per `Binding.setField` at lockfile.zig:33-45), which meant two branches that converged on the same semantic state through different `setField` sequences produced different byte strings for the same binding — and git/jj flagged those as spurious textual merge conflicts. Sort metadata by key in `renderLineToWriter` so the on-disk form is a function of semantic state only. The sort buffer reuses the `scratch` allocator that `serializeToWriter` already threads through, so no new lifetime surface. Unblocks the upcoming serialization-under-reorder property test. DRIFT-hoocxemw

…ld reorder Property: semantic_eq(L1, L2) ⟹ serialize(L1) == serialize(L2). The generator builds a lockfile state with up to 6 bindings, each with 0–6 metadata fields drawn uniquely from a fixed key pool; the property builds two `[]Binding` slices from that state (forward order, forward fields vs. reverse order, reverse fields) and asserts byte-equal serialize outputs. Runs 200 iterations, seeded from helpers.minish_seed (git HEAD by default; -Dminish-seed=<u64> to reproduce). Verified that the test catches the insertion-order bug by temporarily reverting the canonicalization fix — property fails with a counterexample showing differing serializations within the first few iterations. DRIFT-zvfkxuyr

…-but-never-corrupt Property asserts that for two edit scripts touching *disjoint* bindings, git merge-file either (a) produces a clean merge whose parsed state equals the semantic union of both scripts, or (b) produces textual conflict markers (measured, not a failure — this is the spurious conflict rate). A clean merge that parses to a semantically wrong state is a hard failure. Generator partitions base bindings into left/right sides up front; ops on each side (add, remove, set_field, remove_field) only touch that side's bindings. Add ops use side-specific doc/target prefixes so every trial is genuinely disjoint — no skip-based trial filtering. Measured on current repo state: seed=git-HEAD : 35/100 (35.0%) spurious, 0 mismatches seed=7 : 47/100 (47.0%) spurious, 0 mismatches seed=42 : 42/100 (42.0%) spurious, 0 mismatches seed=999 : 44/100 (44.0%) spurious, 0 mismatches ~40% of disjoint edits produce spurious textual conflicts via git's default merge. When git *does* produce a clean merge, it's always semantically correct — so the risk shape is annoyance, not corruption. That is enough signal to justify writing a `.gitattributes` merge driver (or a custom `drift merge` subcommand) as follow-up work. DRIFT-zfnkcflp

…at variants Follow-up probe for the open question in DRIFT-zfnkcflp: does a format tweak alone push the ~40% spurious-conflict rate down, or do we need a `.gitattributes` merge driver? Adds test/property/format_experiment_test.zig: runs the same disjoint-edit oracle as Property 2 against 4 serializer variants, all fed the same generated states so results are apples-to-apples: V0 baseline — current single-line sorted format V1 multiline-blocks — header + indented fields + 3 blank lines V2 sectioned-single — "# doc" headers, single-line bindings V3 sectioned-multiline — sections + multi-line blocks Gated on -Dformat-experiment=true (off by default; adds ~5s to test suite). Generator clusters bindings into a 3-doc pool so sectioning has something to group by, and add-ops re-use existing docs ~50% of the time to mirror realistic drift usage. Measured (5 seeds, 100 trials each, conflict rate only — no semantic parse-back since variant formats don't have matching parsers yet): variant | avg conflict rate | avg base bytes (×V0) V0 baseline | ~43% | 1.00x V1 multiline-blocks | ~28% (-35% rel) | 1.27x V2 sectioned-single | ~31% (-28% rel) | 1.14x V3 sectioned-multi | ~25% (-42% rel) | 1.40x Takeaways: 1. Format changes alone cut the rate by ~30-40% but can't eliminate it. The residual floor is same-sort-position inserts (both sides add a new binding that lands at the same gap) — git's hunk logic sees both as inserts at the same context, always a conflict. 2. V1 (just multi-line + blanks) is almost as good as V3 with less structural complexity. Good candidate if we want a cheap win. 3. Byte overhead is 1.14x–1.40x. Not free but not painful. 4. A `.gitattributes` merge driver is still the only path to 0% conflicts — adjacent-insert collisions are structural. Variant parsers not yet implemented; the oracle only measures the textual conflict rate. Before shipping any V{1,2,3}, we need a matching parseLine (so round-trip works) and to extend the oracle to verify semantic correctness of clean merges.

V4 toml-tables — TOML array-of-tables ([[bindings]] anchor, std parser) V5 yaml-nested — YAML doc-keyed nested map (indentation-carried grouping) V6 hr-separator — V1 with --- instead of 3 blank lines between blocks V7 aligned-cols — single-line with padded doc/target columns V8 ini-blocks — INI-style [doc -> target] headers Plus test/property/format_sample_test.zig that dumps each variant's output on a hand-picked 3-binding fixture so readability is auditable alongside the conflict-rate numbers. Both new tests gated on -Dformat-experiment=true. Results across 5 seeds × 100 trials each, averaged: V0 baseline 43.6% 1.00x bytes V1 multiline-blocks 27.6% 1.28x V2 sectioned-single 30.6% 1.13x V3 sectioned-multi 25.2% 1.40x V4 toml-tables 25.2% 2.06x V5 yaml-nested 25.8% 1.55x V6 hr-separator 27.6% 1.30x V7 aligned-cols 53.8% 1.00x <- worse than baseline V8 ini-blocks 27.6% 1.22x Findings: 1. V7 column-padding is actively bad — length changes propagate across every line as phantom whitespace diffs. 2. Distinctive anchor lines ([[bindings]], quoted keys, [label]) hit V3's conflict rate with less blank-line padding. Anchors do the alignment work, not context density. 3. All variants cluster in 25-31%. The floor is same-sort-position inserts. Merge driver or file-splitting is the only way below 25%.

…rouped) Follow-up on DRIFT-mlrbxsbz: isolates the three TOML arrangements and measures each on serialize/parse latency, peak memory, and byte size. A flat — [[bindings]] blocks, doc/target as fields B nested — ["doc"."target"] header carries full binding identity C grouped — [["doc"]] arrays-of-tables, target lives as a field Adds test/property/toml_variants_test.zig. Each variant has matching serialize + parse + round-trip check. Benchmark: - 200 bindings across 30 docs (realistic drift scale) - 200 iterations, min wall time reported - Peak memory measured via arena queryCapacity() after the operation Results on Debug build (ratios hold in ReleaseFast): variant bytes serialize parse ser peak par peak V0 baseline 18711 3266 us 4341 us 112564 B 101890 B A flat [[bindings]] 27106 1635 us 4618 us 161530 B 101890 B B nested [d.t] 21906 1628 us 4556 us 153850 B 101890 B C grouped [[d]] 24106 1723 us 4644 us 196746 B 101890 B Versus V0: A 1.45x bytes 0.50x serialize 1.06x parse 1.44x ser peak B 1.17x bytes 0.50x serialize 1.05x parse 1.37x ser peak C 1.29x bytes 0.53x serialize 1.07x parse 1.75x ser peak Two interesting findings: 1. All three TOML variants serialize ~2x FASTER than V0. V0's serialize renders each binding into a scratch buffer, sorts the buffers lexically, then writes — the sort-after-render pipeline is the overhead. The TOML variants sort bindings once by key, then stream. Same total work but fewer allocations. 2. B (nested) dominates A and C on every axis: smallest bytes of the three TOML variants, identical serialize speed, smallest serialize working memory. Combined with B's better merge-rate prediction (unique-per-binding headers), B is the clear TOML pick. Parse time is within 7% across variants (4.3–4.7ms). Parse peak memory is identical (101890 B) since all parsers produce the same ArrayList(Binding) shape — output dominates. Scoped parsers: each variant's parser handles only the subset our serializers emit (no escapes, no multi-line strings, no type coercion). That keeps parser cost apples-to-apples and sidesteps a full TOML library. A production adoption of any variant would swap in a real parser.

…oracle Completes the TOML variant comparison by running B and C through the same disjoint-edit oracle as every other variant, using serializers re-exported from toml_variants_test.zig (no duplication). Results across 5 seeds × 100 trials each, averaged: V4 toml (A) flat 25.2% <- [[bindings]], doc/target as fields V9 toml (B) nested 27.6% <- ["doc"."target"] unique headers V10 toml (C) grouped 25.0% <- [["doc"]] with target as field Surprise: C (grouped) narrowly wins among TOML variants, matching the ~25% floor. B (nested) is notably worse than predicted — 27.6% puts it in the same bucket as V1/V6/V8 (all distinctive-header multi-line variants without doc grouping). Mechanism re-interpreted: what helps git's merge isn't unique-per-binding header lines (B) but TEXTUAL DISTANCE between cross-doc edits. C's repeating `[["doc"]]` sections put cross-doc inserts into physically separate regions of the file. B's `["doc"."target"]` headers are distinctive but bindings still sort alphabetically and sit adjacent without grouping, so cross-doc edits are just one blank line apart — not enough for git's 3-line context window. Revised TOML ranking: - Best rate + std parser: C grouped (25.0% / ~1.67x bytes / fast serde) - Smallest bytes: B nested (27.6% / 1.17x / tied fastest serde) - Reference baseline: A flat (25.2% / 2.06x / most familiar shape) Combined with the serde benchmark (prior commit), C is the best-performing standard format across all dimensions except raw byte efficiency. Worth a closer look in DRIFT-mlrbxsbz.

laulauland · 2026-05-12T10:27:54Z

Closing as draft — see PR description. Investigation harness, not for merge.

laulauland added 8 commits May 12, 2026 12:27

laulauland closed this May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP / investigation: lockfile format experiments (don't merge — measurement harness only)#31

WIP / investigation: lockfile format experiments (don't merge — measurement harness only)#31
laulauland wants to merge 8 commits into
mainfrom
lau/lockfile-format-experiments

laulauland commented May 12, 2026

Uh oh!

laulauland commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laulauland commented May 12, 2026

What's in this branch

Findings

How to run

Why this PR is closed/draft

Uh oh!

laulauland commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant