Skip to content

WIP / investigation: lockfile format experiments (don't merge — measurement harness only)#31

Closed
laulauland wants to merge 8 commits into
mainfrom
lau/lockfile-format-experiments
Closed

WIP / investigation: lockfile format experiments (don't merge — measurement harness only)#31
laulauland wants to merge 8 commits into
mainfrom
lau/lockfile-format-experiments

Conversation

@laulauland
Copy link
Copy Markdown
Member

This PR is for reference, not for merging. It captures the investigation that led to PR #30 (the TOML lockfile format).

What's in this branch

Eight commits, all under test/property/, gated on -Dformat-experiment=true so they don't slow the normal test suite:

  1. feat(tests): adopt minish for property-based testing — wires minish as a test-only build dep. Seed defaults to first 16 hex chars of git HEAD; -Dminish-seed=<u64> for reproducibility.
  2. fix(lockfile): canonicalize metadata field order on serialize — the six-line fix that sorts metadata by key on write. Removes a confound from the measurements that follow.
  3. test(lockfile): property 1 — serialize is invariant under binding+field reorder — 200 iterations, asserts byte-identical serialize across binding+field reorders.
  4. test(lockfile): property 2 — disjoint edits merge cleanly or conflict-but-never-corrupt — randomized merge oracle on disjoint edit scripts via git merge-file. Measured ~40% spurious conflict rate on the original format, 0 hard-fails across 400 trials.
  5. test(experiment): measure spurious-conflict rate across lockfile format variants — V0–V3 baseline + multi-line variants.
  6. test(experiment): add 5 more format variants + sample dump — V4–V8 (TOML flat, YAML nested, HR-separator, aligned columns, INI blocks) + a sample-output dump for visual comparison.
  7. test(experiment): TOML variant serde benchmark (A flat, B nested, C grouped) — focused serde benchmark: 200 bindings × 200 iterations, measuring serialize/parse latency and peak memory across the three TOML arrangements.
  8. test(experiment): add V9 (TOML nested) + V10 (TOML grouped) to merge oracle — completes the TOML variant comparison; data showed grouped narrowly wins on merge rate (~25.0% vs ~25.2% for flat) but the gap is within noise.

Findings

  • ~40% of disjoint edits on the original line-based format produced spurious git conflicts. 0 silent corruptions across 400 trials.
  • All multi-line variants cluster at 25–31% conflict rate. Floor is structural (two inserts at the same sort anchor are unfixable without semantic merge).
  • TOML serializes ~2× faster than the original line format (V0 does render-then-sort; TOML sorts once and streams).
  • Among TOML arrangements, the flat `[[bindings]]` shape was chosen for PR feat(lockfile): switch drift.lock to versioned TOML array-of-tables format #30 — measurement-noise difference vs grouped, simpler to deserialize, easier to evolve forward.

How to run

```
zig build test -Dformat-experiment=true
```

Adds ~5s to the test suite. Off by default.

Why this PR is closed/draft

The experimental measurement harness isn't intended to ship. The findings informed PR #30 (the actual format change). This branch is preserved for reference and for anyone who wants to re-run the experiments.

Wire minish v0.3.0 into the build as a test-only module. Seed defaults to
the first 16 hex chars of the current git HEAD so each commit explores a
fresh slice of the state space; override with -Dminish-seed=<u64> to
reproduce a specific failing run. Plumbed through helpers.minish_seed so
property tests share one source of truth.

Smoke test runs 25 iterations of a trivial int property to catch
regressions in the wiring (import, module shape, seed plumbing).

DRIFT-mshkigwb
Bindings were serialized with metadata fields in insertion order (per
`Binding.setField` at lockfile.zig:33-45), which meant two branches that
converged on the same semantic state through different `setField` sequences
produced different byte strings for the same binding — and git/jj flagged
those as spurious textual merge conflicts.

Sort metadata by key in `renderLineToWriter` so the on-disk form is a
function of semantic state only. The sort buffer reuses the `scratch`
allocator that `serializeToWriter` already threads through, so no new
lifetime surface.

Unblocks the upcoming serialization-under-reorder property test.

DRIFT-hoocxemw
…ld reorder

Property: semantic_eq(L1, L2) ⟹ serialize(L1) == serialize(L2). The
generator builds a lockfile state with up to 6 bindings, each with 0–6
metadata fields drawn uniquely from a fixed key pool; the property builds
two `[]Binding` slices from that state (forward order, forward fields
vs. reverse order, reverse fields) and asserts byte-equal serialize
outputs. Runs 200 iterations, seeded from helpers.minish_seed (git HEAD
by default; -Dminish-seed=<u64> to reproduce).

Verified that the test catches the insertion-order bug by temporarily
reverting the canonicalization fix — property fails with a counterexample
showing differing serializations within the first few iterations.

DRIFT-zvfkxuyr
…-but-never-corrupt

Property asserts that for two edit scripts touching *disjoint* bindings,
git merge-file either (a) produces a clean merge whose parsed state
equals the semantic union of both scripts, or (b) produces textual
conflict markers (measured, not a failure — this is the spurious
conflict rate). A clean merge that parses to a semantically wrong state
is a hard failure.

Generator partitions base bindings into left/right sides up front; ops
on each side (add, remove, set_field, remove_field) only touch that
side's bindings. Add ops use side-specific doc/target prefixes so every
trial is genuinely disjoint — no skip-based trial filtering.

Measured on current repo state:
  seed=git-HEAD : 35/100 (35.0%) spurious, 0 mismatches
  seed=7        : 47/100 (47.0%) spurious, 0 mismatches
  seed=42       : 42/100 (42.0%) spurious, 0 mismatches
  seed=999      : 44/100 (44.0%) spurious, 0 mismatches

~40% of disjoint edits produce spurious textual conflicts via git's
default merge. When git *does* produce a clean merge, it's always
semantically correct — so the risk shape is annoyance, not corruption.
That is enough signal to justify writing a `.gitattributes` merge
driver (or a custom `drift merge` subcommand) as follow-up work.

DRIFT-zfnkcflp
…at variants

Follow-up probe for the open question in DRIFT-zfnkcflp: does a format
tweak alone push the ~40% spurious-conflict rate down, or do we need a
`.gitattributes` merge driver?

Adds test/property/format_experiment_test.zig: runs the same
disjoint-edit oracle as Property 2 against 4 serializer variants, all
fed the same generated states so results are apples-to-apples:

  V0 baseline            — current single-line sorted format
  V1 multiline-blocks    — header + indented fields + 3 blank lines
  V2 sectioned-single    — "# doc" headers, single-line bindings
  V3 sectioned-multiline — sections + multi-line blocks

Gated on -Dformat-experiment=true (off by default; adds ~5s to test
suite). Generator clusters bindings into a 3-doc pool so sectioning
has something to group by, and add-ops re-use existing docs ~50% of
the time to mirror realistic drift usage.

Measured (5 seeds, 100 trials each, conflict rate only — no semantic
parse-back since variant formats don't have matching parsers yet):

  variant              | avg conflict rate | avg base bytes (×V0)
  V0 baseline          | ~43%              | 1.00x
  V1 multiline-blocks  | ~28% (-35% rel)   | 1.27x
  V2 sectioned-single  | ~31% (-28% rel)   | 1.14x
  V3 sectioned-multi   | ~25% (-42% rel)   | 1.40x

Takeaways:
1. Format changes alone cut the rate by ~30-40% but can't eliminate
   it. The residual floor is same-sort-position inserts (both sides
   add a new binding that lands at the same gap) — git's hunk logic
   sees both as inserts at the same context, always a conflict.
2. V1 (just multi-line + blanks) is almost as good as V3 with less
   structural complexity. Good candidate if we want a cheap win.
3. Byte overhead is 1.14x–1.40x. Not free but not painful.
4. A `.gitattributes` merge driver is still the only path to 0%
   conflicts — adjacent-insert collisions are structural.

Variant parsers not yet implemented; the oracle only measures the
textual conflict rate. Before shipping any V{1,2,3}, we need a
matching parseLine (so round-trip works) and to extend the oracle
to verify semantic correctness of clean merges.
V4 toml-tables     — TOML array-of-tables ([[bindings]] anchor, std parser)
V5 yaml-nested     — YAML doc-keyed nested map (indentation-carried grouping)
V6 hr-separator    — V1 with --- instead of 3 blank lines between blocks
V7 aligned-cols    — single-line with padded doc/target columns
V8 ini-blocks      — INI-style [doc -> target] headers

Plus test/property/format_sample_test.zig that dumps each variant's output
on a hand-picked 3-binding fixture so readability is auditable alongside the
conflict-rate numbers. Both new tests gated on -Dformat-experiment=true.

Results across 5 seeds × 100 trials each, averaged:

  V0 baseline            43.6%  1.00x bytes
  V1 multiline-blocks    27.6%  1.28x
  V2 sectioned-single    30.6%  1.13x
  V3 sectioned-multi     25.2%  1.40x
  V4 toml-tables         25.2%  2.06x
  V5 yaml-nested         25.8%  1.55x
  V6 hr-separator        27.6%  1.30x
  V7 aligned-cols        53.8%  1.00x   <- worse than baseline
  V8 ini-blocks          27.6%  1.22x

Findings:
1. V7 column-padding is actively bad — length changes propagate across
   every line as phantom whitespace diffs.
2. Distinctive anchor lines ([[bindings]], quoted keys, [label]) hit V3's
   conflict rate with less blank-line padding. Anchors do the alignment
   work, not context density.
3. All variants cluster in 25-31%. The floor is same-sort-position inserts.
   Merge driver or file-splitting is the only way below 25%.
…rouped)

Follow-up on DRIFT-mlrbxsbz: isolates the three TOML arrangements and
measures each on serialize/parse latency, peak memory, and byte size.

A  flat   — [[bindings]] blocks, doc/target as fields
B  nested — ["doc"."target"] header carries full binding identity
C  grouped — [["doc"]] arrays-of-tables, target lives as a field

Adds test/property/toml_variants_test.zig. Each variant has matching
serialize + parse + round-trip check. Benchmark:
- 200 bindings across 30 docs (realistic drift scale)
- 200 iterations, min wall time reported
- Peak memory measured via arena queryCapacity() after the operation

Results on Debug build (ratios hold in ReleaseFast):

  variant               bytes  serialize  parse   ser peak  par peak
  V0 baseline          18711   3266 us   4341 us  112564 B  101890 B
  A  flat [[bindings]] 27106   1635 us   4618 us  161530 B  101890 B
  B  nested [d.t]      21906   1628 us   4556 us  153850 B  101890 B
  C  grouped [[d]]     24106   1723 us   4644 us  196746 B  101890 B

Versus V0:
  A   1.45x bytes  0.50x serialize  1.06x parse  1.44x ser peak
  B   1.17x bytes  0.50x serialize  1.05x parse  1.37x ser peak
  C   1.29x bytes  0.53x serialize  1.07x parse  1.75x ser peak

Two interesting findings:

1. All three TOML variants serialize ~2x FASTER than V0. V0's
   serialize renders each binding into a scratch buffer, sorts the
   buffers lexically, then writes — the sort-after-render pipeline
   is the overhead. The TOML variants sort bindings once by key,
   then stream. Same total work but fewer allocations.

2. B (nested) dominates A and C on every axis: smallest bytes of the
   three TOML variants, identical serialize speed, smallest serialize
   working memory. Combined with B's better merge-rate prediction
   (unique-per-binding headers), B is the clear TOML pick.

Parse time is within 7% across variants (4.3–4.7ms). Parse peak
memory is identical (101890 B) since all parsers produce the same
ArrayList(Binding) shape — output dominates.

Scoped parsers: each variant's parser handles only the subset our
serializers emit (no escapes, no multi-line strings, no type coercion).
That keeps parser cost apples-to-apples and sidesteps a full TOML
library. A production adoption of any variant would swap in a real
parser.
…oracle

Completes the TOML variant comparison by running B and C through the same
disjoint-edit oracle as every other variant, using serializers re-exported
from toml_variants_test.zig (no duplication).

Results across 5 seeds × 100 trials each, averaged:

  V4 toml (A) flat     25.2%   <- [[bindings]], doc/target as fields
  V9 toml (B) nested   27.6%   <- ["doc"."target"] unique headers
  V10 toml (C) grouped 25.0%   <- [["doc"]] with target as field

Surprise: C (grouped) narrowly wins among TOML variants, matching the
~25% floor. B (nested) is notably worse than predicted — 27.6% puts it
in the same bucket as V1/V6/V8 (all distinctive-header multi-line
variants without doc grouping).

Mechanism re-interpreted: what helps git's merge isn't unique-per-binding
header lines (B) but TEXTUAL DISTANCE between cross-doc edits. C's
repeating `[["doc"]]` sections put cross-doc inserts into physically
separate regions of the file. B's `["doc"."target"]` headers are
distinctive but bindings still sort alphabetically and sit adjacent
without grouping, so cross-doc edits are just one blank line apart —
not enough for git's 3-line context window.

Revised TOML ranking:
  - Best rate + std parser: C grouped  (25.0% / ~1.67x bytes / fast serde)
  - Smallest bytes:          B nested   (27.6% / 1.17x / tied fastest serde)
  - Reference baseline:      A flat     (25.2% / 2.06x / most familiar shape)

Combined with the serde benchmark (prior commit), C is the
best-performing standard format across all dimensions except raw byte
efficiency. Worth a closer look in DRIFT-mlrbxsbz.
@laulauland
Copy link
Copy Markdown
Member Author

Closing as draft — see PR description. Investigation harness, not for merge.

@laulauland laulauland closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant