Typing/checker by Felipe705x · Pull Request #1 · pleiad/frogql

Felipe705x · 2026-04-27T00:02:44Z

DO NOT MERGE -- DIFF PREVIEW ONLY

Step 0 of the fppc → gqlite typechecker port: AST and lattice mappings, PathType option-1 decision, and a reconciliation table specifying which gqlite lattice ops need to change to match fppc's semantics in Step 1. Per user override, fppc is the authority on type-checking semantics: where a gqlite op differs from fppc on a case fppc covers, gqlite is updated to match fppc. Cases gqlite covers but fppc does not (Top, Empty, Neg, F, Record, Group, EdgeNonDirectional, EdgeAnyDirection) stay as-is.

Per Step 1 of the migration plan: where a gqlite lattice op diverged from fppc on a case fppc covers, gqlite is updated to match fppc. - simple_type.rs: SimpleType::is_subtype — flip (_, Zero) => true to (Zero, _) => true. Bottom is subtype of everything, not vice versa. (fppc B2) - label_type.rs: LabelType::is_subtype — add the missing (Or, _) arm. Uses || (matches fppc's preserved-by-design behavior). - variable_type.rs: VariableType::is_subtype — add (Zero, _), (Group, Group), (Union, _) and (_, Union) arms. (fppc B5 plus the missing Union and Group covariance arms.) - variable_type.rs: VariableType::meet — replace the Union-LHS manual Union(r1, r2) construction with VariableType::join, so equal sides dedupe and Zero is absorbed (matches fppc's join-based combination). - variable_type.rs: add VariableType::refine_to_nodes helper, used by PathType::meet (ported in Step 3). - variable_type.rs: derive Debug+Clone on Schema so the Typechecker can own it by value. Cases gqlite covers but fppc does not (LabelType::Top, Empty, Neg, SimpleType::F, Record, Group, VariableType::Group, EdgeNonDirectional, EdgeAnyDirection) are gqlite-only and stay as-is. All 62 lib tests and 18 integration tests still pass.

A HashMap<String, VariableType> binding map plus pointwise union/meet and a to_group helper for repeated patterns. Adaptations from fppc: - gqlite uses raw String for variable names, not fppc's Var wrapper, so set/get take &str directly (no set_var/get_var siblings). - fppc's to_list becomes to_group: gqlite uses VariableType::Group for repetition grouping and reserves List for user-facing list values. - fppc's meet returns Result<_, String> because its VariableType::meet is fallible. gqlite's VariableType::meet returns Zero on shape mismatch; we surface that as Err here when the meet collapses to bottom on inputs that weren't already empty, preserving fppc's observable check-level behavior.

Step 3 of the typechecker migration. PathType (src/typing/path_type.rs) is ported as a parallel module per option 1 of the migration plan. Mirrors fppc's structure: Node / Edge / Union / Zero with meet, union, len, is_empty, is_unsatisfiable. The EdgeDir enum mirrors fppc's ast::EdgeDirection (gqlite has no AST counterpart since direction is encoded in the PathPattern variant). Adaptations from fppc: - NodePathType wraps DescriptorType directly (gqlite has no NodeType wrapper struct; VariableType::Node carries a DescriptorType). - EdgePathType keeps fppc's shape: { p1: Box<PathType>, n2: NodePathType }. The edge descriptor is held by the variable's VariableType binding, not by the path shape. - VariableType::Group (gqlite-only repetition grouping) maps to Zero in the path conversion, mirroring fppc's treatment of List. Checker skeleton (src/typing/checker.rs) defines Typechecker / TypecheckResult and stubs check_query / check_pattern / check_path_pattern / check_expr / refine_pattern_node / refine_pattern_edge with todo!() bodies. pow_path_type is implemented since it has no dependencies. Module compiles against gqlite's types, which proves the Step 0 mapping work.

Step 4 of the typechecker migration. The expression checker reuses gqlite's BinOp::delta and UnOp::delta, so no operator signature table is duplicated. Cases: - Const(Value): map literal to SimpleType (Int→Z, Float→F, Str→S, Bool→B). List and Record literals get loose types — fppc has no literal lists, and full element/field typing is a phase-1 punt recorded in the migration doc. - Type(t): return t directly. - AttrLookup { var, attr }: look up var, call get_attribute. Warn on unbound variable, missing attribute, or empty type. - FieldAccess { base, field }: gqlite-only. Look up the field on Record base; pass through Star/Zero gradually; warn on non-record base. - Binop: special-case Is and As when rhs is a Type literal (Is→Bool, As→declared type). Otherwise call BinOp::delta and verify both operands are compatible via SimpleType::meet. - Unop: same pattern with UnOp::delta. is_empty / != Zero is the compatibility test, mirroring fppc; no gradual_eq is reintroduced (it was removed in fppc HEAD).

Step 5 of the typechecker migration. refine_pattern_node builds a VariableType::Node from the descriptor (star descriptor for anonymous nodes) and runs VariableType::refine against the schema. refine_pattern_edge does the same, picking EdgeDirectional or EdgeNonDirectional from the EdgeDir. Neither pattern-matches on LabelType::{Top, Empty, Neg} or PropertyType::{Open, Closed} directly — refinement defers to the lattice's existing meet / is_subtype which already handle them. Adds an assert_filters_drained sanity check: elaboration is supposed to hoist Descriptor::value_filters into Filter nodes before typecheck runs. If we see leftovers we surface a typechecker error so an elaboration bug cannot silently drop constraints.

Step 6 of the typechecker migration — the load-bearing function. Each PathPattern variant produces a (PathType, TypeEnvironment) pair: - Node: refine descriptor against schema, wrap as PathType::Node, bind to env if a variable name is present. - EdgeRight / EdgeLeft / EdgeUndirected / EdgeAnyDirection: fan out fppc's single Edge { direction } arm. Right/Left/Any produce EdgeDirectional; Undirected produces EdgeNonDirectional. The PathType conversion uses EdgeDir to decide forward / reversed / union shape. - Concat / Join: meet environments (with schema refinement) and meet path shapes. Join is gqlite-only and reduces to Concat for typing purposes — both compose two patterns over shared variables; the endpoint-glued vs not distinction is a runtime concern. Documented in the migration doc; flagged for review if Join semantics ever diverge. - Filter: type-check the expression against Bool; warn if not boolean. - Union: union of paths, union of environments. - Repeat / Questioned: pow_path_type on the path, env wrapped in Group (gqlite's name for what fppc calls List). lb capped at 3 to avoid exponential blowup, mirroring fppc. create_context handles None descriptors (anonymous patterns) by returning an empty environment.

Step 7 of the typechecker migration. compile and compile_query now run typechecking between elaborate and optimizer. Typechecking uses the permissive Schema::star() and rejects the query if the checker reports errors (unbound variables, context meet failures, etc.). Sibling entry points compile_unchecked and compile_query_unchecked skip typechecking — same plan as gqlite/main produced before this branch landed. The optimizer is unchanged and works in both modes. Typechecker::check_query simply delegates to check_pattern: gqlite's elaborator folds WHERE clauses into PathPattern::Filter nodes inside the pattern, so there is no separate WHERE expression to check. RETURN-clause typing is out of scope for phase 1. All 62 lib tests and 18 integration tests still pass.

Step 8 of the typechecker migration. Four hand-written queries that exercise the on-by-default and opt-out paths: - A node pattern with property return. - A two-hop concat with a WHERE filter on a node property. - A three-node concat pattern. - An unbound-variable reference in WHERE — the checked path rejects it with a 'not found' error, the unchecked path accepts it and produces a plan, confirming the typechecker is the gatekeeper. For the three valid queries the test asserts that compile_query and compile_query_unchecked produce the same optimized plan: the checker adds verification, not transformation. Not a test suite — focused punch-list confirming the wiring works end-to-end against the existing parser/elaborator/optimizer.

Step 9 of the typechecker migration. Records: - Commit list and outcome vs acceptance criteria. - Lattice reconciliation outcomes with fppc citations and gqlite before/after for every touched op. - Five deviations from the Step-0 plan: - to_list -> to_group (gqlite's name for repetition grouping). - TypeEnvironment::meet still returns Result<_, String> by synthesizing Err from a VariableType::meet that collapses to Zero; preserves fppc's check-level observable behavior. - EdgeDir as a typechecker-local enum in path_type.rs (gqlite has no AST EdgeDirection); From<(VariableType, ...)> became PathType::from_variable. - Join treated as Concat for typing — verified: comma-join shares variables the same way Concat does. - Schema needed no new constructor (public fields suffice). - Punts (FieldAccess on non-records, list/record literal typing, RETURN typing, schema sourcing, NodeId side-table). - Two architectural follow-ups option 1 defers (parallel typing modules; per-segment clones at Concat/Union). - Three findings noted for the next phase to consider, none requiring action now.

Closes the validation gap noted in the Step 9 final report. Each test in tests/typecheck_fppc_parity.rs mirrors a fppc typechecker test line-for-line and asserts the same ok/empty/warning predicates. Surface-syntax differences ({a: int} -> {a is int}, bare -> -> ()-[]->()) are adapted; the behavior under test is unchanged. All 45 translated tests pass on first run, with no implementation changes needed. The skipped test (test_incompatible_records) uses double-brace nested record syntax that gqlite's parser doesn't accept; the underlying lattice case is covered by other tests. This moves the equivalence claim against fppc from 'I built it that way' to 'verified by direct test diff'. The remaining cross-direction edge meet error path (fppc Err vs gqlite Zero) is unexercised by fppc's tests and doesn't affect observable typecheck verdicts. Migration doc final report updated.

Two cargo examples for hands-on use: - examples/typecheck_demo.rs: ad-hoc CLI that takes a query as argv, prints the type-check result (ok/empty/path/env/errors/warnings). Useful for poking at individual queries: cargo run --release --example typecheck_demo -- '(x: {a is int})(x: {a is bool})' -> empty=true, prop 'a' collapses to Zero (int ⊓ bool). - examples/typecheck_repl_smoke.rs: end-to-end smoke against examples/movies.gdb. Runs four representative queries through compile_query (typecheck on) and confirms: * the checked path accepts each query, * the unchecked path produces the same plan, * the runtime returns expected row counts (38 movies, 133 people, ...), * an unbound-variable WHERE is rejected by the checker only. All five cases pass. Migration doc updated with the completed test-plan items: - bench_test failure mode confirmed identical to main (pre-existing, unrelated to this branch) - REPL-level smoke against movies.gdb passes Removed unused TypecheckResult::failed associated function (dead code warning).

Two fppc-parity fixes in the typing lattice, both narrowed to behavior on variants both repos share: - LabelType::is_subtype: swap (And, _) and (_, Or) arm order so decomposition matches fppc. The two orderings are sound but pick different short-circuit paths on (And, Or) inputs and disagree on nested cases (e.g. And(A, Or(B,C)) vs Or(D, And(E,F))). - PropertyType::is_subtype: add (Zero, _) => true as the first arm. Without it, the lower-bound law fails when meet collapses to Zero on key-mismatched Closed maps (regression seed in fppc: Closed({"x": Star}) vs Closed({})). Verified against ported fppc property tests + regression seeds and the existing typecheck_fppc_parity suite.

In gqlite there's no parity-vs-non-parity distinction — these are just the typechecker tests, mirroring the gqlite-local *_test.rs convention (parser_test.rs, runtime_test.rs, etc). The file's header comment is updated to record where the cases came from (line-for-line translation of fppc/src/typechecker/checker.rs's inline #[cfg(test)] mod tests, which is the canonical typechecker reference suite) without framing them as a separate 'parity' suite. No semantic changes; all 45 tests still pass.

Format the files on main that the fmt-CI PR (#2) didn't reach. Those files came in via PR #1 (typing/checker) and either weren't part of PR #2's diff or were touched in a way that didn't normalize the formatting. Running 'cargo fmt --all' on main produces no further changes after this lands. Files touched: examples/typecheck_demo.rs examples/typecheck_repl_smoke.rs src/lib.rs src/typing/checker.rs src/typing/mod.rs src/typing/path_type.rs src/typing/type_environment.rs tests/typecheck_smoke.rs tests/typecheck_test.rs No semantic changes. cargo build + cargo test --lib both clean.

Now that the LDBC loader exposes `id` as a queryable property (parent commit), the IC2 query can anchor by `Person.id` directly — same as the spec. Changes: - `WHERE p.firstName = ... AND p.lastName = ...` → `WHERE p.id = ...` - `RETURN ...firstName, ...lastName, c.content, c.creationDate` → `RETURN friend.id, ...firstName, ...lastName, c.id, c.content, c.creationDate` (matches the spec's RETURN modulo `coalesce`). - `PARAMS` is now five real `Person.id` values from SF0.1 instead of `(firstName, lastName)` workaround pairs. Doc reflects the upgraded fidelity: replaces the old "divergences fixed" section with a 9-row honest divergence table covering every surface-syntax-or-semantic difference. Net divergences from the spec are reduced to two with real semantic effect: - **#6 (`coalesce`)**: gqlite has no `coalesce` builtin. Posts with empty `content` but non-empty `imageFile` return blank rather than the imageFile path. Few rows in SF0.1. - **#7 (no ORDER BY)**: gqlite parser doesn't have ORDER BY yet. Affects *which* 20 rows you see under LIMIT, not wall time. All other table rows (#1, #2, #3, #4, #5, #8, #9) are surface- syntax or naming differences with no semantic divergence — same query, just spelled to fit gqlite's grammar / data conventions. ### Result snapshot (SF0.1, single-iter) | Person.id | resolves to | rows | wall time | |-----------------|----------------|------|-----------| | 933 | Mahinda Perera | 20 | 108.5 s | | 1129 | Carmen Lepland | 20 | 100.6 s | | 8796093023296 | Hồ Chí Loan | 20 | 97.4 s | | 21990232555524 | Bryn Davies | 20 | 99.2 s | | 32985348833865 | Cheng Yu | 20 | 107.5 s | Wall times are similar to the previous (less faithful) bench. The optimizer gap (no value-equality predicate pushdown) still dominates; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Sweep of LDBC divergences found a bigger one I'd missed: we don't use the LDBC SNB Driver at all. The driver defines the audited measurement regime — workload at target throughput, mix of IC+IS reads and IU updates, multi-minute warmup, randomized param order, audit JSON output, throughput-at-SLA + p99 latency reporting. We have none of that. The bench is a hand-rolled harness: direct Runtime::run_query calls, IC2 only, sequential, no warmup, CSV output, min/median/mean/max wall times. The query and parameter set are still spec-faithful (rows #1-#5, #8 in the audit); the rows returned would match an LDBC-conformant run modulo #6 (no coalesce) and #7 (no ORDER BY). What's NOT spec-faithful is the measurement regime itself — the numbers we report are not directly comparable to throughput numbers in published LDBC audit submissions. Adding this as audit row #9 plus an expanded "Methodology / reporting divergence" section listing the 9 specific differences (harness, workload mix, concurrency, warmup, param order, headline metric, output format, update streams, result validation). Strict description updated: "IC2 with no coalesce, no ordering, and a hand-rolled (non-driver) harness reporting wall-time stats instead of throughput-at-SLA." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two changes from the post-spec-fetch review: 1. Switch IC2 query to use `(p: Person {id: $personId})` shorthand, which is verbatim spec syntax. gqlite's parser accepts it and elaborates to `(p: Person) WHERE p.id = $personId` internally; identical post-elab behavior. Closes audit-row #1 entirely — that row was a self-imposed stylistic divergence, not a gap. 2. Add "Paper-tier disclosure" section answering all 10 items from `benchmark-checklist.tex` directly. Each answer is one line, no hedging. Replaces what would otherwise be scattered across the doc: - Cross-validation: No (with reason: coalesce + ORDER BY gaps) - Persistent storage: Yes - ACID: No - Fault-tolerance: No - Warmup rounds: 0 - Execution rounds: 3 per param × 15 params - Summarization: per-param med + across-param med headline - Load phase included: No (separate) - SUT-dev consulted: gqlite is our own system The maxDate predicate stays in WHERE since `<=` isn't equality; the descriptor shorthand only handles equality. That's a spec- faithful split (the IC2 spec has the `id` filter inside the descriptor and the date filter in WHERE, exactly as we now do). Results table is being re-run with --iters 3 (was 1) for stable medians; lands in a follow-up commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* loader(ldbc): expose `id` as a queryable property on every node Previously `load_ldbc_node_file` consumed the LDBC `id` column for the internal node name only, excluding it from the property map. That made LDBC's spec queries un-expressible: every IC1-IC14 query in LDBC SNB Interactive parameterizes by `Person.id`, `Comment.id`, etc. via `{id: $param}` shorthand or `WHERE x.id = $param`. Without `id` as a property, those queries return NULL and the bench has to use surrogate anchors (firstName, lastName) — a real divergence from the spec. The fix: after the existing prop_cols loop, insert `props["id"] = parse_ldbc_value(&id_str)`. The id is also still folded into the internal node name (`"Person:933"`) for cross-file edge resolution; the duplication is ~12 bytes per node (~4 MB extra on SF0.1), required for spec-faithful queries. Why LDBC-only and not the spanner-config CSV loader: that loader follows different conventions (`vid`, `<Label>_id`, etc., per CLAUDE.md) and isn't tied to a workload that requires id-as-property. LDBC is the immediate driver; the spanner loader can argue for the same treatment separately if needed. Verified end-to-end on SF0.1: $ MATCH (p: Person) WHERE p.id = 933 RETURN p.firstName, p.lastName "Mahinda" | "Perera" 1 rows (0.026s) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench: add LDBC SNB Interactive IC2 benchmark First slice of the LDBC Social Network Benchmark v1 (arXiv:2001.02299 §6, Interactive Complex reads) — IC2 only ("recent messages by friends"). Other ICs need features gqlite doesn't have: shortest / variable-length paths (IC1, IC3, IC11, IC12, IC14), date arithmetic / OPTIONAL MATCH / ORDER BY / aggregation-with-HAVING (IC4-IC8, IC10, IC13). `src/bin/ldbc_bench.rs` runs IC2 against an SF0.1 .gdb, sweeping a small curated set of `Person.firstName` values (LDBC's `Person.id` is folded into the gqlite node name, not addressable as a property). Reports per-param min / median / mean / max wall time. Typechecker is off — IC queries are valid by construction; comparing typecheck overhead vs runtime is a separate bench. `bench/LDBC_BENCHMARK.md` documents setup (download, decompress, import) and includes baseline numbers from the first run. ### Result snapshot (SF0.1, 327k nodes / 1.48M edges, limit=20) | Param | min | med | mean | max | |----------|-------|-------|-------|--------| | Mahinda | 52 s | 56 s | 55 s | 57 s | | Carmen | 38 s | 50 s | 65 s | 106 s | | Bryn | 84 s | 85 s | 86 s | 88 s | | Cheng | 87 s | 95 s | 94 s | 99 s | These are 50-130x slower than LDBC's sub-second target. The benchmark surfaces the cause: the optimizer's predicate pushdown handles `x.attr is type` (per `docs/implemented-optimizations.md`) but not value-equality predicates like `WHERE p.firstName = 'Mahinda'`. So the firstName filter applies *after* the join enumerates every `Person~knows~Person~hasCreator~Comment` triple — work scales with the cartesian intermediate, not the parameter's selectivity. Documented as a finding for a future optimizer pass — pushing `=` constants into descriptors would collapse this into a 1-2-row anchor on Person, dropping the join's left side from ~28k to a handful. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): close 4 IC2 fidelity divergences vs LDBC spec The earlier bench had five divergences from LDBC IC2; four are now fixed without touching gqlite implementation. Only ORDER BY remains (parser feature gap). Fixed: - **Anchor selectivity.** `Person.firstName` → `(firstName, lastName)`. Five params each map to *exactly one* Person in SF0.1 — same per- query selectivity as the spec's `Person.id`, just spelled differently because the gqlite LDBC loader folds `id` into the internal node name rather than a property. - **Message = Comment ∪ Post.** LDBC's `(message:Message)` covers both via Cypher label inheritance, which gqlite doesn't have. Use path-pattern union: `<-[:hasCreator]-(c:Comment) | <-[:hasCreator]- (c:Post)`. `c` binds to whichever arm matches. - **`message.creationDate <= $maxDate` filter.** Was dropped entirely. Now in the WHERE clause; `maxDate` is a const (1340000000000 ms ≈ mid-2012) chosen so the filter retains a sizable fraction of SF0.1 data while still exercising. - **RETURN columns.** Added `friend.firstName`, `friend.lastName`, `c.content` alongside the existing `c.creationDate`. The two `id` columns the spec wants (`friend.id`, `message.id`) remain dropped because they're not addressable as properties; documented. Remaining divergence (out of scope without parser changes): - **No `ORDER BY ... DESC`** — gqlite's parser doesn't have it. Affects which 20 rows you get under `LIMIT 20`, not wall time. Updated `bench/LDBC_BENCHMARK.md` to spell out the spec query, the gqlite query, and the divergence list explicitly. ### Result snapshot (SF0.1, single-iter) | Param | rows | wall time | |-------------------|------|-----------| | Mahinda Perera | 20 | 96.4 s | | Carmen Lepland | 20 | 108.1 s | | Bryn Davies | 20 | 85.0 s | | Cheng Yu | 20 | 105.6 s | | Hồ Chí Loan | 20 | 110.7 s | Wall times are slightly higher than the previous (less faithful) bench because the union arm doubles the right side of the join (~286k hasCreator edges vs ~151k Comment-only). The dominant cost remains the optimizer's missing predicate pushdown on value-equality WHERE; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): use Person.id anchor (full IC2 fidelity) Now that the LDBC loader exposes `id` as a queryable property (parent commit), the IC2 query can anchor by `Person.id` directly — same as the spec. Changes: - `WHERE p.firstName = ... AND p.lastName = ...` → `WHERE p.id = ...` - `RETURN ...firstName, ...lastName, c.content, c.creationDate` → `RETURN friend.id, ...firstName, ...lastName, c.id, c.content, c.creationDate` (matches the spec's RETURN modulo `coalesce`). - `PARAMS` is now five real `Person.id` values from SF0.1 instead of `(firstName, lastName)` workaround pairs. Doc reflects the upgraded fidelity: replaces the old "divergences fixed" section with a 9-row honest divergence table covering every surface-syntax-or-semantic difference. Net divergences from the spec are reduced to two with real semantic effect: - **#6 (`coalesce`)**: gqlite has no `coalesce` builtin. Posts with empty `content` but non-empty `imageFile` return blank rather than the imageFile path. Few rows in SF0.1. - **#7 (no ORDER BY)**: gqlite parser doesn't have ORDER BY yet. Affects *which* 20 rows you see under LIMIT, not wall time. All other table rows (#1, #2, #3, #4, #5, #8, #9) are surface- syntax or naming differences with no semantic divergence — same query, just spelled to fit gqlite's grammar / data conventions. ### Result snapshot (SF0.1, single-iter) | Person.id | resolves to | rows | wall time | |-----------------|----------------|------|-----------| | 933 | Mahinda Perera | 20 | 108.5 s | | 1129 | Carmen Lepland | 20 | 100.6 s | | 8796093023296 | Hồ Chí Loan | 20 | 97.4 s | | 21990232555524 | Bryn Davies | 20 | 99.2 s | | 32985348833865 | Cheng Yu | 20 | 107.5 s | Wall times are similar to the previous (less faithful) bench. The optimizer gap (no value-equality predicate pushdown) still dominates; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): document parameter-generator divergence Previous audit listed personId and maxDate substitutions as if they were independent surface differences. They aren't — they're symptoms of a single methodology divergence: we didn't run LDBC's substitution_parameters generator (ldbc_snb_datagen) and instead hand-picked params by inspection. LDBC's spec example values are SF-specific by design (the id space is bit-packed per SF and event timelines stretch with SF), so spec values from one SF are not expected to work against another. The canonical workflow is to run the generator against the chosen SF and consume its interactive_<n>.txt output. This doesn't affect the bench's timing claim — the join work is the same for any valid Person.id anchor and any maxDate that doesn't cut all rows — but it's a procedural deviation from LDBC methodology that wasn't called out clearly. Adding it as row #10 in the audit table, reframing #8 and #9 as derived from #10, and updating the bench's rustdoc comment to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): use official LDBC interactive_2 substitution params Closes the methodology divergence flagged in the previous commit. LDBC ships substitution parameters as a separate archive (substitution_parameters-sf0.1.tar.zst on datasets.ldbcouncil.org) because our SF0.1 distribution was generated with parametergenerator.parameters:false. Downloading that archive gives us interactive_2_param.txt: 15 (personId, maxDate) pairs from LDBC's official parameter generator, matching our SF0.1 build because the generator is deterministic on the dataset seed (verified — all 15 personIds resolve in person_0_0.csv). Changes: - src/bin/ldbc_bench.rs: PARAMS replaced with the 15 official pairs verbatim. Each param now carries its own maxDate (the generator's output is per-param), so the global MAX_DATE_MS constant goes away. CSV stdout grows a `max_date` column. Names alongside each ID are resolved from person_0_0.csv only for human-readable output. - bench/LDBC_BENCHMARK.md: audit table collapses #8/#9/#10 into a single row #8 ("uses official LDBC params, no semantic divergence"). Setup grows a §1b for the substitution-params download. Summary bullets updated. Strict description is now "IC2 with no coalesce and no ordering" — the parameter-source caveat is gone. The Results table is currently a re-run note pointing at the old hand-picked numbers as a reference; the official-param run is in progress (one param in: 101s, 20 rows — same order of magnitude as the hand-picked-param run, as expected since the bench's bottleneck is optimizer-bound rather than parameter-bound). A follow-up commit will land the fresh numbers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): document driver / reporting methodology divergence Sweep of LDBC divergences found a bigger one I'd missed: we don't use the LDBC SNB Driver at all. The driver defines the audited measurement regime — workload at target throughput, mix of IC+IS reads and IU updates, multi-minute warmup, randomized param order, audit JSON output, throughput-at-SLA + p99 latency reporting. We have none of that. The bench is a hand-rolled harness: direct Runtime::run_query calls, IC2 only, sequential, no warmup, CSV output, min/median/mean/max wall times. The query and parameter set are still spec-faithful (rows #1-#5, #8 in the audit); the rows returned would match an LDBC-conformant run modulo #6 (no coalesce) and #7 (no ORDER BY). What's NOT spec-faithful is the measurement regime itself — the numbers we report are not directly comparable to throughput numbers in published LDBC audit submissions. Adding this as audit row #9 plus an expanded "Methodology / reporting divergence" section listing the 9 specific differences (harness, workload mix, concurrency, warmup, param order, headline metric, output format, update streams, result validation). Strict description updated: "IC2 with no coalesce, no ordering, and a hand-rolled (non-driver) harness reporting wall-time stats instead of throughput-at-SLA." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): land Results table from official-param re-run Full IC2 sweep against ldbc-sf0.1.gdb with the 15 official LDBC substitution parameters. 1 iter per param, ~20 min total wall. Stats across all 15 params: - min 52.8 s - med 79.7 s - mean 83.0 s - max 146.9 s Same regime as the previous hand-picked-param run (97-108 s for 5 params), confirming the bottleneck is gqlite's optimizer (no predicate pushdown for value equality on `=`, see "Things this surfaces") rather than parameter selectivity. The 3× spread between min and max reflects caller-degree variation plus first-iter page-cache effects, not bench noise. All 15 params returned 20 rows (the LIMIT). Far above LDBC's sub-second interactive target — the optimizer gap is the unblocking work for any future driver-mediated audit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): note bench/impl scope split + no result validation Two clarifications surfaced in review: 1. The optimizer pushdown gap that makes IC2 take ~100s is a real finding but is NOT something this branch will fix. bench/ldbc is bench-only — no changes to src/* implementation. Adding a pointer to which files would need to change (pushdown.rs + descriptor.rs + engine.rs) so a future impl PR has a starting point. 2. We don't validate row contents against LDBC's expected_output archives. Two divergences make any byte-for-byte check fail: #7 (no ORDER BY) means our 20 rows aren't the 20 most recent, and #6 (no coalesce) means imageFile-only Posts show blank. Either alone invalidates a diff -q check. The bench asserts row_count == LIMIT, not row contents. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): drop misleading min/med/mean/max labels at N=1 Sweep-review item: with --iters 1, the bench's report() printed "min=X med=X mean=X max=X" with all four columns showing the same number, suggesting statistical content where there is none. Now: N=1 prints "wall=X.XXms (n=1, --iters >=3 recommended for stable median)". N>=2 keeps the full spread display with an explicit n=N. Bench-side fix only; doesn't touch gqlite impl. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): add SPEC_CHECKLIST.md — verbatim quotes from LDBC spec Source-of-truth doc that maps each LDBC spec requirement to whether it applies to research-paper use vs audited submissions, and what our bench currently does. All quotes verbatim from github.com/ldbc/ldbc_snb_docs (main branch), with file citations. Headline finding: the spec itself sanctions a two-tier model. benchmark-checklist.tex says "For most research papers, fully audited results are unrealistic and even unaudited results can provide insight" and lists exactly what info to disclose for research use. We're firmly in that tier and the spec says that's fine. Practical use: when someone challenges "is this a real LDBC bench?" the answer is now "yes, at the research-paper tier the spec defines, with these specific gaps documented." Each gap (validation, ACID, driver, run duration, scale) carries a verbatim quote and a reason it's audit-only or out-of-scope here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): IC2 syntax fidelity + paper-tier disclosure Two changes from the post-spec-fetch review: 1. Switch IC2 query to use `(p: Person {id: $personId})` shorthand, which is verbatim spec syntax. gqlite's parser accepts it and elaborates to `(p: Person) WHERE p.id = $personId` internally; identical post-elab behavior. Closes audit-row #1 entirely — that row was a self-imposed stylistic divergence, not a gap. 2. Add "Paper-tier disclosure" section answering all 10 items from `benchmark-checklist.tex` directly. Each answer is one line, no hedging. Replaces what would otherwise be scattered across the doc: - Cross-validation: No (with reason: coalesce + ORDER BY gaps) - Persistent storage: Yes - ACID: No - Fault-tolerance: No - Warmup rounds: 0 - Execution rounds: 3 per param × 15 params - Summarization: per-param med + across-param med headline - Load phase included: No (separate) - SUT-dev consulted: gqlite is our own system The maxDate predicate stays in WHERE since `<=` isn't equality; the descriptor shorthand only handles equality. That's a spec- faithful split (the IC2 spec has the `id` filter inside the descriptor and the date filter in WHERE, exactly as we now do). Results table is being re-run with --iters 3 (was 1) for stable medians; lands in a follow-up commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): --backend memory|lazy|disk + RSS measurement (sysinfo) Per professor's request: compare gqlite's three GraphAccess backends on the same IC2 query so the paper can show the speed-vs-RAM tradeoff. - Memory (Graph from CSV): everything in RAM, no disk I/O at query time. Uses csv_loader::load_from_ldbc_csv_dir. - Lazy (LazyGraphStore): topology + label index in RAM, props via per-process LRU page cache + OS page cache. - Disk (DiskGraphStore): topology in RAM, everything else from disk per access (via OS page cache). Bench grows --backend and --csv-dir flags. Memory backend takes the CSV directory directly; Lazy/Disk take the .gdb path. Each run reports both per-param wall time and peak RSS delta over baseline. sysinfo crate (0.30) added for cross-platform RSS reading. Used only by ldbc_bench. RSS is OS-reported working-set; on Windows that includes mapped file pages, so Lazy/Disk numbers reflect both the in-process structures AND any portion of the .gdb that the OS has resident — disclosed in the doc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): land 3-backend Results table + cache-cost note Memory vs Lazy vs Disk on SF0.1 IC2, per professor's request. Median per-param: memory 4.65s, disk 7.71s, lazy 8.42s. Memory wins on speed by ~2x, costs +605 MiB; Disk uses least RAM and loads fastest but no in-process caching. Three real findings, all worth flagging in the paper: 1. Memory beats Lazy/Disk by ~2x. Page-fault-free queries. 2. Lazy and Disk are within 10% on speed despite Lazy having an in-process LRU cache. The OS file cache dominates; Lazy's process-level cache adds bookkeeping without much benefit. 3. Cold-cache vs warm-cache cost differs by ~10x. The very first run after machine restart shows 53-147s per param (median 80s); subsequent runs benefit from OS-cached .gdb pages and drop to 7-10s per param. Documented as context for the warm-cache table; cold-cache numbers preserved for paper use. Caveat documented: sysinfo's RSS reading on Windows includes mmap'd file pages, so Lazy/Disk RSS deltas are confounded by which portion of the .gdb the OS has resident. Memory's +605 MiB delta is purely allocated heap and is the most reliable absolute number in the table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): per-IC TOML query catalog + dispatching runner PR #1 of the LDBC runner refactor (per the planning doc). Architecture: - bench/ldbc-queries/ic<n>.toml — one file per IC (1-14). IC2 carries query template + params_file ref + divergences; the other 13 carry status="blocked" + required_features + blocked_reason. - src/bin/ldbc_bench.rs rewritten to load TOML files, parse the LDBC param file, and substitute {paramName} placeholders in the query template. No more hardcoded PARAMS / query string. CLI: --ic <n>|N,M|all|blocked which IC(s) to run; default 2 --backend memory|lazy|disk default lazy --params-dir <dir> default bench/data/substitution_parameters-sf0.1/... --queries-dir <dir> default bench/ldbc-queries --iters N default 3 --limit N default 20 --csv-dir <dir> required only with --backend memory `--ic blocked` prints the inventory of blocked ICs with their required_features (no bench run). Useful as a status report: "what would we need to add to gqlite to run these?" New deps: toml = "0.8" (small, well-maintained; multiline strings in TOML are nicer than JSON for query templates). Per professor feedback: Memory backend stays available in the bench binary for smoke-testing but is dropped from the headline Lazy-vs-Disk comparison in LDBC_BENCHMARK.md (Memory loads everything from CSV at every startup — unrealistic for a database product). Same numbers as the previous IC2-only bench; this commit is purely the architecture change. Adding a new IC (when blocking gqlite features land) = drop a TOML file, no bench code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): trim docs — drop audit-tier discussion + non-divergences Per feedback: docs were too verbose with audit-tier content we don't care about and "divergences" that turned out to be no-ops. - Delete LDBC_SPEC_CHECKLIST.md entirely (314 lines of audit vs research-paper analysis with verbatim spec quotes — irrelevant since we're not auditing). - Drop the "Methodology / reporting divergence (#9, expanded)" section from LDBC_BENCHMARK.md (driver, throughput-at-SLA, ACID, audit JSON — all audit-tier). - Drop the "Paper-tier disclosure" section's 10-row spec-checklist table (mostly no/no/n.a. answers; not useful when we're not arguing for compliance). - Trim the divergence audit from 9 rows to 2: only the real divergences (#6 no coalesce, #7 no ORDER BY). The other 7 rows were "None — same as spec" or "fixed in earlier commit" and weren't divergences anyway. - Trim the RSS caveat and cold-cache notes to a few lines each. - Remove the long "out of scope" rationale block on the optimizer pushdown gap; reduced to a 4-line "Fix:" pointer to which src/* files would need to change. LDBC_BENCHMARK.md goes from 428 -> 324 lines. Combined with the deleted SPEC_CHECKLIST.md, total bench-doc reduction is ~420 lines. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): note LIMIT-keyword TODO in ic2.toml When gqlite's parser gains a LIMIT keyword, append `LIMIT 20` to the query template and drop --limit from the bench CLI. Until then the runtime row cap (Runtime::run_query's second arg, exposed as --limit) is the only way to enforce the spec's LIMIT 20. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): rewrite README to be how-to-run only Per feedback: the markdown was trying to do too much. - Divergences are already in each ic<n>.toml's [divergences] table - Results / findings / cold-warm analysis belong in the paper, not the bench README - The README's job is "how does someone reproduce this run" Cuts: - IC2 fidelity-vs-spec section (the 2-row divergence table is now redundant with ic2.toml; the spec quote is in the TOML's spec link) - Results section (Lazy-vs-Disk table, per-param times, RSS caveat, cold-cache note — all paper material) - "Things this surfaces" (optimizer pushdown finding — paper material) - "Architecture" section (TOML schema is self-documenting) What stays: - Status: 1 implemented, 13 catalogued, point at toml inventory - Setup: download dataset, download params, build .gdb - Run: invocation examples + flag table + output format - Add new IC: 2-step recipe for promoting a blocked TOML - Backends: one paragraph distinguishing the three options - See also: pointers to the toml files, runner source, and paper Goes from 324 to 133 lines (-258 +67). Combined with the earlier trim and the deleted SPEC_CHECKLIST.md, total reduction across the two LDBC bench docs is ~640 lines. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): bench_setup binary for one-command data prep Replaces the README's three manual setup steps (curl + python zstd + gqlite import) with a single binary that does all three. Idempotent: each step checks for its output and skips if present. Re-running bench_setup with everything in place is a no-op (just prints "[skip] ..." for each step). Pass --rebuild to force-rebuild the .gdb. Three deps added (only used by this binary): - ureq 2.12 (default-features = false, "tls"): small blocking HTTP client, ~one network call per archive - zstd 0.13: streaming decompressor for the .tar.zst archives - tar 0.4: streaming tar extractor The .gdb build step shells out to the existing `gqlite --import-ldbc-csv` binary rather than calling csv_loader directly, so the .gdb is produced by exactly the same code path users run from the REPL — no risk of bench_setup and gqlite drifting on the .gdb format. Output: data-dir: bench/data downloading https://datasets.../ldbc-sf0.1.tar.zst done: 17.5 MiB in 0.7s extracting ldbc-sf0.1.tar.zst -> ldbc-sf0.1 done in 0.3s downloading https://datasets.../substitution_parameters-sf0.1.tar.zst done: 198.4 KiB in 0.2s extracting substitution_parameters-sf0.1.tar.zst -> substitution_parameters-sf0.1 done in 0.0s building ldbc-sf0.1.gdb from .../social_network-sf0.1-CsvBasic-LongDateFormatter done: ldbc-sf0.1.gdb in 39.1s ✓ setup complete README's "Setup" section collapsed from 3 multi-block steps to a single command. Manual steps still recoverable from this commit's diff if anyone needs them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): document output format + rename ic CSV column to params Two related fixes after a user pointed out the CSV format wasn't self-explanatory: 1. The third CSV column was named `ic` but actually carries the substitution-param values (the IC name is already in column 1). Renamed to `params`. Reading the column as `params` matches the stderr summary's parens, e.g. `(19791209300143|1354060800000)`. 2. README's "Output" section was 4 lines listing column names with no semantics. Expanded to ~50 lines covering: - Why two streams (stdout=data, stderr=chatter, Unix convention) - Column-by-column meaning of the CSV - What `(personId|maxDate)` and `(n=3)` in the stderr summary mean - Worked cross-check showing min/median/mean/max of three CSV rows reproduces one stderr summary line The CSV header rename is breaking-but-trivial: any existing scripts reading `ic` would now get `params` and need a one-character change. Since the bench has shipped maybe 2 days ago and nothing is pinned to it, this is the right time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): rename param_idx -> row + done summary at end of run Two small UX fixes: 1. The CSV's `param_idx` column was the same number as the stderr summary's `row#N` label but with a different name. Renamed to `row` so the two streams use the same vocabulary. 2. After the last param of the last IC, the bench now prints: ✓ done — 1 IC(s) ran to completion IC2: 15 rows × 1 iter(s) = 15 runs; across-row median 11506.22ms (range 9109.93-16517.23ms) Peak RSS during query loop: 84.5 MiB (+75.4 MiB over baseline) Confirms the run finished cleanly (vs got killed mid-row) and gives the across-row median without grepping. Per-IC summary line so multi-IC runs (when more get implemented) compose cleanly. Implementation: `report` now returns the per-row median; `run_one_ic` collects them into an `IcSummary { ic_id, iters, row_medians_ns }` and `run_targets` builds the final aggregate from the Vec<IcSummary>. README's Output section updated for both changes (column rename in the CSV table; done-summary section added with worked example). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): pre-merge fixes from review (A1-A4 + warmup + --help) Five fixes from the pre-merge review on the runner: A1. Type-aware param substitution. New optional `param_types` array in the per-IC TOML; per-column tag is "int" (raw substitution, default) or "string" (wraps in single quotes, rejects values containing `'` since gqlite's lexer has no escape syntax). Unblocks future ICs with string params (firstName, country names, tag names) without putting quote-handling onto the TOML author. ic2.toml is unchanged — defaults to all-int. A2. Duplicate IC id detection in load_queries. After sort, scan adjacent for matching ids; fail fast naming both source files. Catches the footgun of copying ic2.toml without renumbering. A3. Median for even N. Previously returned sorted[n/2] (upper of two midpoints) for even-sized samples — neither the true median nor a labeled percentile. New median_of() helper does the standard average for even N. Doesn't affect --iters 3 (odd); fixes --iters 2/4/6. A4. Schema validation at load time. New IcQuery::validate() enforces that "implemented" carries query+params_file and "blocked" carries blocked_reason, plus param_columns/param_types length agreement. Failure now surfaces before any backend is opened — not 8 minutes into a multi-IC run. B4. --warmup N flag. Per-row warmup that runs N extra iters of the query before the timed ones, discarding their measurements. Banner now reads "{warmup}+{iters} iters/param (warmup+measured)" so the count is visible. Mirrors typecheck_bench's --warmup. Default 0 (preserves existing behavior). Plus a small ergonomic fix: - `ldbc_bench --help` (with no positional .gdb) now prints usage instead of trying to open "--help" as a database file. What was not fixed (deliberately, low impact for now): - B1 CSV ambiguity if param values contain `|` or `;`: real issue but no LDBC param file has these chars in our scope. - B2 peak RSS sampling between rows, not during query: watchdog thread isn't worth the complexity. - B3 mmap inflates Windows RSS: documented in the README; can't fix from the bench side. - B5 hardcoded SF0.1 in bench_setup: out of scope for now. - B6 no checksums on downloads: would need LDBC's published hashes; out of scope. - B7 --ic 2,2 doesn't dedup: not a bug, just undocumented; fine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): second-pass review fixes — stale docs + warmup honesty Findings from a second-pass review of the runner + README: Stale doc references: - Rustdoc said `--show-blocked` mode (real flag is `--ic blocked`). - Usage text recommended `--warmup 1` to absorb cold-cache cost. Empirically wrong for our scope: looking at iter-level data from the SF0.1 iter=3 run, iter 0 is NOT systematically the slowest within a row — variance is general jitter, not first-iter cold cache. The OS page cache is cross-row, so once row#0's iters populate it, subsequent rows + iters all hit warm pages. Reframed: --warmup is available for parity with typecheck_bench and for larger SFs where the cache can't hold the working set, but at SF0.1 it's overhead. - Warmup loop's inline comment claimed "iter 0 cold-cache"; updated to match the empirical finding. Stale README: - The flag table was missing --warmup entirely (added in a previous commit but the doc table didn't get the row). - The stderr summary example showed `3 iters/param` but the new banner format is `0+3 iters/param (warmup+measured)`. - "Adding a new IC" recipe didn't mention `param_types` for ICs with string params (IC1 firstName, IC3/IC11 country names, IC6 tagName, etc.). Now mentions `param_types` and the gqlite no-escape-for-quotes caveat. What's NOT a stale-doc issue (verified during the pass): - All `# Spec: ...` URLs in TOMLs point to ldbc/ldbc_snb_docs main branch which is stable. - The `_ic` parameter in `report()` is used in the eprintln line. - Edge cases (--iters 0, --warmup with no iters, etc.) handled gracefully via the `is_empty()` skip in the final summary. - `param_columns` doesn't strictly need to match the param file's header — the runtime reads the file's own header. Documented. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): real fixes from a deeper review pass + 13 unit tests The previous "second-pass review" was thin (mostly stale-doc fixes). A proper adversarial pass surfaced four substantive issues: C1. Arg parsing crashed with index-out-of-bounds when a flag had no value (e.g. `ldbc_bench db.gdb --iters` with nothing after). New `need_value` helper checks bounds and prints a clear "{flag} requires a value" before exiting. C2. "All rows failed" was reported as `✓ done` with no per-IC line. A user could read this as "ran successfully, no rows" instead of "every row erred." IcSummary now tracks `failed_rows`; final summary prints `⚠ done with errors` when any IC had zero successful rows, and per-IC lines surface failure counts even when row_medians_ns is empty. C3. `substitute()` silently left unmatched `{...}` placeholders if the query template referenced a column not in the param file's header. The gqlite parser would then fail with "Parse error" — a misleading symptom for the actual problem (TOML/header mismatch). New `find_unsubstituted_placeholder` post-pass scan catches bare `{ident}` tokens — careful to skip the legitimate `{id: 933}` descriptor shorthand and `{1,3}` repetition syntax. Verified end-to-end: changing ic2.toml's query to reference a nonexistent {countryName} now produces: SUBSTITUTE ERROR on row 0 (params: ...): unsubstituted placeholder {countryName} remains after substitution; check that the query template's column names match the param file's header (personId, maxDate) C4. Zero unit tests for 700 lines of code. Added a tests module covering median_of (incl. the even-N regression A3 caught earlier), substitute (int/string/bad-type/quote-rejection/ unmatched-placeholder), and find_unsubstituted_placeholder (true positives + false-positive guards on descriptor shorthand and repetition quantifiers). 13 tests, all green. Plus reframed the warmup recommendation honestly. Earlier commits said "empirically not needed at SF0.1" based on 6 data points — too strong a claim from too little data. New text acknowledges OS page-cache mechanics (real, well-known) but is calibrated about what we've actually measured: "varies by OS / RAM / storage; on a warm machine with free RAM >= dataset size it usually doesn't help, on cold or cache-constrained machines it can absorb a cold-iter spike." Default stays 0; recommendation is "raise it if the per-row summary shows iter 0 consistently dominating." What's still NOT fixed (deliberately, low impact): C5 TOML schema versioning, C6 absolute-path footgun in `params_file`, C7 .gdb freshness vs loader version. Each is documentation territory or a system-wide gqlite concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): final pre-merge fixes — arg parsing + idempotency Three more issues found in the fourth review pass: E1. The --ic value parser still used .expect() (panic with raw Rust error) for non-numeric values like `--ic 2,abc`. Inconsistent with the C1 fix from the previous round. Now a clean error: "invalid --ic value \"2,abc\": invalid digit found in string. expected: N, N,M,K, all, or blocked". F1. bench_setup's arg parser had the same C1-style index-out-of-bounds risk (`bench_setup --data-dir` with no value would panic). Added the same `need_value` helper used in ldbc_bench. F2. bench_setup's idempotency check was "does the directory exist?". A partially-extracted directory (interrupted run, network drop mid-extract) would skip re-extraction and feed garbage data into the .gdb build. Now checks for a known sub-path that only exists post-successful-extract: - dataset: csv_dir/dynamic/ must be a directory - params: params_inner/interactive_2_param.txt must be a file Detected the partial-extract case in a synthetic test (empty csv_dir → bench_setup correctly tried to re-download instead of skipping). 13 unit tests still pass. End-to-end smoke tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): loop-review fixes — error handling, idempotency, lint, doc User asked for a loop-review until I find no issues. Multiple passes; this commit is everything that surfaced. Pass 1 — `.expect()` panics replaced with clean exits: - load_params on empty file (ldbc_bench) - fs::create_dir_all on the data-dir (bench_setup) - env::current_exe lookup (bench_setup) - fs::remove_file on --rebuild (bench_setup) Pass 2 — TOML schema strictness: - #[serde(deny_unknown_fields)] on IcQuery so a typo like `parms_file` produces an immediate parse error naming the unknown key, instead of silently leaving params_file=None and surfacing later as "missing `params_file`". Pass 3 — three more `.expect()` on backend dispatch: - csv_dir on memory backend → "--csv-dir is required..." - csv_loader::load_from_ldbc_csv_dir → propagates the io::Error - LazyGraphStore::open / DiskGraphStore::open → ditto Pass 4 — README's flag table had stale strong claim "--warmup empirically doesn't help at SF0.1" — softened to match the calibrated runner help text (varies by OS / RAM / storage). Pass 5 — `--iters 0` was silently allowed but produced no output: the bench would run warmup iters (if any), record zero measured iters, emit no CSV, and print "✓ done" with no per-IC line. Now rejected at arg parse with "--iters must be >= 1". Pass 8 — clippy warnings: - cmp_owned: `data_dir == PathBuf::from("...")` → `Path::new` - doc_overindented_list_items: re-indented two doc bullets Pass 9 — `--ic 99` (no match) printed "✓ done — 0 IC(s) ran to completion" which falsely implied success. Now prints "⚠ done — 0 IC(s) ran (none of the requested ICs were implemented in the catalog). Use --ic blocked to see the inventory." 13 unit tests still pass. cargo clippy clean. cargo fmt clean. End-to-end smoke (--ic blocked, --ic 2, --ic 99, --ic 0, --ic 2 with bad value) all produce clean error/output paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): rewrite stderr docs — full sample + RSS/baseline glossary Two issues a reader pointed out in the README's stderr section: 1. The example was incomplete — it skipped the setup lines (the "Loaded N IC definitions / RSS baseline / Loading .gdb / RSS after open" header) and the "✓ done" final summary that the bench actually emits. README only showed the per-row middle. Now shows a complete stderr trace from --ic 2 --iters 3, top to bottom, so readers can match what they see live to what's documented. 2. RSS and "baseline" were used in the example without definition, leaving readers (correctly) confused about what the numbers mean. New "What RSS and baseline mean" subsection: RSS is OS- reported RAM use; baseline is process startup; the "+X MiB over baseline" form makes per-step cost readable without manual subtraction. Worked example using the actual sample numbers. Per-row docs (min/med/mean/max meaning, sample-size note when n=1) unchanged — those were already correct, just buried under the incomplete example. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Felipe705x added 14 commits April 26, 2026 19:02

matiastoro marked this pull request as ready for review April 27, 2026 15:33

matiastoro merged commit 1414bfb into main Apr 27, 2026

Felipe705x mentioned this pull request Apr 27, 2026

chore: cargo fmt --all #5

Merged

Felipe705x mentioned this pull request Apr 27, 2026

Typecheck/repl #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typing/checker#1

Typing/checker#1
matiastoro merged 14 commits into
mainfrom
typing/checker

Felipe705x commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Felipe705x commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants