WIP / investigation: lockfile format experiments (don't merge — measurement harness only)#31
Closed
laulauland wants to merge 8 commits into
Closed
WIP / investigation: lockfile format experiments (don't merge — measurement harness only)#31laulauland wants to merge 8 commits into
laulauland wants to merge 8 commits into
Conversation
Wire minish v0.3.0 into the build as a test-only module. Seed defaults to the first 16 hex chars of the current git HEAD so each commit explores a fresh slice of the state space; override with -Dminish-seed=<u64> to reproduce a specific failing run. Plumbed through helpers.minish_seed so property tests share one source of truth. Smoke test runs 25 iterations of a trivial int property to catch regressions in the wiring (import, module shape, seed plumbing). DRIFT-mshkigwb
Bindings were serialized with metadata fields in insertion order (per `Binding.setField` at lockfile.zig:33-45), which meant two branches that converged on the same semantic state through different `setField` sequences produced different byte strings for the same binding — and git/jj flagged those as spurious textual merge conflicts. Sort metadata by key in `renderLineToWriter` so the on-disk form is a function of semantic state only. The sort buffer reuses the `scratch` allocator that `serializeToWriter` already threads through, so no new lifetime surface. Unblocks the upcoming serialization-under-reorder property test. DRIFT-hoocxemw
…ld reorder Property: semantic_eq(L1, L2) ⟹ serialize(L1) == serialize(L2). The generator builds a lockfile state with up to 6 bindings, each with 0–6 metadata fields drawn uniquely from a fixed key pool; the property builds two `[]Binding` slices from that state (forward order, forward fields vs. reverse order, reverse fields) and asserts byte-equal serialize outputs. Runs 200 iterations, seeded from helpers.minish_seed (git HEAD by default; -Dminish-seed=<u64> to reproduce). Verified that the test catches the insertion-order bug by temporarily reverting the canonicalization fix — property fails with a counterexample showing differing serializations within the first few iterations. DRIFT-zvfkxuyr
…-but-never-corrupt Property asserts that for two edit scripts touching *disjoint* bindings, git merge-file either (a) produces a clean merge whose parsed state equals the semantic union of both scripts, or (b) produces textual conflict markers (measured, not a failure — this is the spurious conflict rate). A clean merge that parses to a semantically wrong state is a hard failure. Generator partitions base bindings into left/right sides up front; ops on each side (add, remove, set_field, remove_field) only touch that side's bindings. Add ops use side-specific doc/target prefixes so every trial is genuinely disjoint — no skip-based trial filtering. Measured on current repo state: seed=git-HEAD : 35/100 (35.0%) spurious, 0 mismatches seed=7 : 47/100 (47.0%) spurious, 0 mismatches seed=42 : 42/100 (42.0%) spurious, 0 mismatches seed=999 : 44/100 (44.0%) spurious, 0 mismatches ~40% of disjoint edits produce spurious textual conflicts via git's default merge. When git *does* produce a clean merge, it's always semantically correct — so the risk shape is annoyance, not corruption. That is enough signal to justify writing a `.gitattributes` merge driver (or a custom `drift merge` subcommand) as follow-up work. DRIFT-zfnkcflp
…at variants
Follow-up probe for the open question in DRIFT-zfnkcflp: does a format
tweak alone push the ~40% spurious-conflict rate down, or do we need a
`.gitattributes` merge driver?
Adds test/property/format_experiment_test.zig: runs the same
disjoint-edit oracle as Property 2 against 4 serializer variants, all
fed the same generated states so results are apples-to-apples:
V0 baseline — current single-line sorted format
V1 multiline-blocks — header + indented fields + 3 blank lines
V2 sectioned-single — "# doc" headers, single-line bindings
V3 sectioned-multiline — sections + multi-line blocks
Gated on -Dformat-experiment=true (off by default; adds ~5s to test
suite). Generator clusters bindings into a 3-doc pool so sectioning
has something to group by, and add-ops re-use existing docs ~50% of
the time to mirror realistic drift usage.
Measured (5 seeds, 100 trials each, conflict rate only — no semantic
parse-back since variant formats don't have matching parsers yet):
variant | avg conflict rate | avg base bytes (×V0)
V0 baseline | ~43% | 1.00x
V1 multiline-blocks | ~28% (-35% rel) | 1.27x
V2 sectioned-single | ~31% (-28% rel) | 1.14x
V3 sectioned-multi | ~25% (-42% rel) | 1.40x
Takeaways:
1. Format changes alone cut the rate by ~30-40% but can't eliminate
it. The residual floor is same-sort-position inserts (both sides
add a new binding that lands at the same gap) — git's hunk logic
sees both as inserts at the same context, always a conflict.
2. V1 (just multi-line + blanks) is almost as good as V3 with less
structural complexity. Good candidate if we want a cheap win.
3. Byte overhead is 1.14x–1.40x. Not free but not painful.
4. A `.gitattributes` merge driver is still the only path to 0%
conflicts — adjacent-insert collisions are structural.
Variant parsers not yet implemented; the oracle only measures the
textual conflict rate. Before shipping any V{1,2,3}, we need a
matching parseLine (so round-trip works) and to extend the oracle
to verify semantic correctness of clean merges.
V4 toml-tables — TOML array-of-tables ([[bindings]] anchor, std parser) V5 yaml-nested — YAML doc-keyed nested map (indentation-carried grouping) V6 hr-separator — V1 with --- instead of 3 blank lines between blocks V7 aligned-cols — single-line with padded doc/target columns V8 ini-blocks — INI-style [doc -> target] headers Plus test/property/format_sample_test.zig that dumps each variant's output on a hand-picked 3-binding fixture so readability is auditable alongside the conflict-rate numbers. Both new tests gated on -Dformat-experiment=true. Results across 5 seeds × 100 trials each, averaged: V0 baseline 43.6% 1.00x bytes V1 multiline-blocks 27.6% 1.28x V2 sectioned-single 30.6% 1.13x V3 sectioned-multi 25.2% 1.40x V4 toml-tables 25.2% 2.06x V5 yaml-nested 25.8% 1.55x V6 hr-separator 27.6% 1.30x V7 aligned-cols 53.8% 1.00x <- worse than baseline V8 ini-blocks 27.6% 1.22x Findings: 1. V7 column-padding is actively bad — length changes propagate across every line as phantom whitespace diffs. 2. Distinctive anchor lines ([[bindings]], quoted keys, [label]) hit V3's conflict rate with less blank-line padding. Anchors do the alignment work, not context density. 3. All variants cluster in 25-31%. The floor is same-sort-position inserts. Merge driver or file-splitting is the only way below 25%.
…rouped) Follow-up on DRIFT-mlrbxsbz: isolates the three TOML arrangements and measures each on serialize/parse latency, peak memory, and byte size. A flat — [[bindings]] blocks, doc/target as fields B nested — ["doc"."target"] header carries full binding identity C grouped — [["doc"]] arrays-of-tables, target lives as a field Adds test/property/toml_variants_test.zig. Each variant has matching serialize + parse + round-trip check. Benchmark: - 200 bindings across 30 docs (realistic drift scale) - 200 iterations, min wall time reported - Peak memory measured via arena queryCapacity() after the operation Results on Debug build (ratios hold in ReleaseFast): variant bytes serialize parse ser peak par peak V0 baseline 18711 3266 us 4341 us 112564 B 101890 B A flat [[bindings]] 27106 1635 us 4618 us 161530 B 101890 B B nested [d.t] 21906 1628 us 4556 us 153850 B 101890 B C grouped [[d]] 24106 1723 us 4644 us 196746 B 101890 B Versus V0: A 1.45x bytes 0.50x serialize 1.06x parse 1.44x ser peak B 1.17x bytes 0.50x serialize 1.05x parse 1.37x ser peak C 1.29x bytes 0.53x serialize 1.07x parse 1.75x ser peak Two interesting findings: 1. All three TOML variants serialize ~2x FASTER than V0. V0's serialize renders each binding into a scratch buffer, sorts the buffers lexically, then writes — the sort-after-render pipeline is the overhead. The TOML variants sort bindings once by key, then stream. Same total work but fewer allocations. 2. B (nested) dominates A and C on every axis: smallest bytes of the three TOML variants, identical serialize speed, smallest serialize working memory. Combined with B's better merge-rate prediction (unique-per-binding headers), B is the clear TOML pick. Parse time is within 7% across variants (4.3–4.7ms). Parse peak memory is identical (101890 B) since all parsers produce the same ArrayList(Binding) shape — output dominates. Scoped parsers: each variant's parser handles only the subset our serializers emit (no escapes, no multi-line strings, no type coercion). That keeps parser cost apples-to-apples and sidesteps a full TOML library. A production adoption of any variant would swap in a real parser.
…oracle Completes the TOML variant comparison by running B and C through the same disjoint-edit oracle as every other variant, using serializers re-exported from toml_variants_test.zig (no duplication). Results across 5 seeds × 100 trials each, averaged: V4 toml (A) flat 25.2% <- [[bindings]], doc/target as fields V9 toml (B) nested 27.6% <- ["doc"."target"] unique headers V10 toml (C) grouped 25.0% <- [["doc"]] with target as field Surprise: C (grouped) narrowly wins among TOML variants, matching the ~25% floor. B (nested) is notably worse than predicted — 27.6% puts it in the same bucket as V1/V6/V8 (all distinctive-header multi-line variants without doc grouping). Mechanism re-interpreted: what helps git's merge isn't unique-per-binding header lines (B) but TEXTUAL DISTANCE between cross-doc edits. C's repeating `[["doc"]]` sections put cross-doc inserts into physically separate regions of the file. B's `["doc"."target"]` headers are distinctive but bindings still sort alphabetically and sit adjacent without grouping, so cross-doc edits are just one blank line apart — not enough for git's 3-line context window. Revised TOML ranking: - Best rate + std parser: C grouped (25.0% / ~1.67x bytes / fast serde) - Smallest bytes: B nested (27.6% / 1.17x / tied fastest serde) - Reference baseline: A flat (25.2% / 2.06x / most familiar shape) Combined with the serde benchmark (prior commit), C is the best-performing standard format across all dimensions except raw byte efficiency. Worth a closer look in DRIFT-mlrbxsbz.
Member
Author
|
Closing as draft — see PR description. Investigation harness, not for merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is for reference, not for merging. It captures the investigation that led to PR #30 (the TOML lockfile format).
What's in this branch
Eight commits, all under
test/property/, gated on-Dformat-experiment=trueso they don't slow the normal test suite:feat(tests): adopt minish for property-based testing— wires minish as a test-only build dep. Seed defaults to first 16 hex chars of git HEAD;-Dminish-seed=<u64>for reproducibility.fix(lockfile): canonicalize metadata field order on serialize— the six-line fix that sorts metadata by key on write. Removes a confound from the measurements that follow.test(lockfile): property 1 — serialize is invariant under binding+field reorder— 200 iterations, asserts byte-identical serialize across binding+field reorders.test(lockfile): property 2 — disjoint edits merge cleanly or conflict-but-never-corrupt— randomized merge oracle on disjoint edit scripts viagit merge-file. Measured ~40% spurious conflict rate on the original format, 0 hard-fails across 400 trials.test(experiment): measure spurious-conflict rate across lockfile format variants— V0–V3 baseline + multi-line variants.test(experiment): add 5 more format variants + sample dump— V4–V8 (TOML flat, YAML nested, HR-separator, aligned columns, INI blocks) + a sample-output dump for visual comparison.test(experiment): TOML variant serde benchmark (A flat, B nested, C grouped)— focused serde benchmark: 200 bindings × 200 iterations, measuring serialize/parse latency and peak memory across the three TOML arrangements.test(experiment): add V9 (TOML nested) + V10 (TOML grouped) to merge oracle— completes the TOML variant comparison; data showed grouped narrowly wins on merge rate (~25.0% vs ~25.2% for flat) but the gap is within noise.Findings
How to run
```
zig build test -Dformat-experiment=true
```
Adds ~5s to the test suite. Off by default.
Why this PR is closed/draft
The experimental measurement harness isn't intended to ship. The findings informed PR #30 (the actual format change). This branch is preserved for reference and for anyone who wants to re-run the experiments.