Typing/checker#1
Merged
Merged
Conversation
Step 0 of the fppc → gqlite typechecker port: AST and lattice mappings, PathType option-1 decision, and a reconciliation table specifying which gqlite lattice ops need to change to match fppc's semantics in Step 1. Per user override, fppc is the authority on type-checking semantics: where a gqlite op differs from fppc on a case fppc covers, gqlite is updated to match fppc. Cases gqlite covers but fppc does not (Top, Empty, Neg, F, Record, Group, EdgeNonDirectional, EdgeAnyDirection) stay as-is.
Per Step 1 of the migration plan: where a gqlite lattice op diverged from fppc on a case fppc covers, gqlite is updated to match fppc. - simple_type.rs: SimpleType::is_subtype — flip (_, Zero) => true to (Zero, _) => true. Bottom is subtype of everything, not vice versa. (fppc B2) - label_type.rs: LabelType::is_subtype — add the missing (Or, _) arm. Uses || (matches fppc's preserved-by-design behavior). - variable_type.rs: VariableType::is_subtype — add (Zero, _), (Group, Group), (Union, _) and (_, Union) arms. (fppc B5 plus the missing Union and Group covariance arms.) - variable_type.rs: VariableType::meet — replace the Union-LHS manual Union(r1, r2) construction with VariableType::join, so equal sides dedupe and Zero is absorbed (matches fppc's join-based combination). - variable_type.rs: add VariableType::refine_to_nodes helper, used by PathType::meet (ported in Step 3). - variable_type.rs: derive Debug+Clone on Schema so the Typechecker can own it by value. Cases gqlite covers but fppc does not (LabelType::Top, Empty, Neg, SimpleType::F, Record, Group, VariableType::Group, EdgeNonDirectional, EdgeAnyDirection) are gqlite-only and stay as-is. All 62 lib tests and 18 integration tests still pass.
A HashMap<String, VariableType> binding map plus pointwise union/meet and a to_group helper for repeated patterns. Adaptations from fppc: - gqlite uses raw String for variable names, not fppc's Var wrapper, so set/get take &str directly (no set_var/get_var siblings). - fppc's to_list becomes to_group: gqlite uses VariableType::Group for repetition grouping and reserves List for user-facing list values. - fppc's meet returns Result<_, String> because its VariableType::meet is fallible. gqlite's VariableType::meet returns Zero on shape mismatch; we surface that as Err here when the meet collapses to bottom on inputs that weren't already empty, preserving fppc's observable check-level behavior.
Step 3 of the typechecker migration.
PathType (src/typing/path_type.rs) is ported as a parallel module per
option 1 of the migration plan. Mirrors fppc's structure: Node / Edge /
Union / Zero with meet, union, len, is_empty, is_unsatisfiable. The
EdgeDir enum mirrors fppc's ast::EdgeDirection (gqlite has no AST
counterpart since direction is encoded in the PathPattern variant).
Adaptations from fppc:
- NodePathType wraps DescriptorType directly (gqlite has no NodeType
wrapper struct; VariableType::Node carries a DescriptorType).
- EdgePathType keeps fppc's shape: { p1: Box<PathType>, n2: NodePathType }.
The edge descriptor is held by the variable's VariableType binding,
not by the path shape.
- VariableType::Group (gqlite-only repetition grouping) maps to Zero in
the path conversion, mirroring fppc's treatment of List.
Checker skeleton (src/typing/checker.rs) defines Typechecker /
TypecheckResult and stubs check_query / check_pattern /
check_path_pattern / check_expr / refine_pattern_node /
refine_pattern_edge with todo!() bodies. pow_path_type is implemented
since it has no dependencies. Module compiles against gqlite's types,
which proves the Step 0 mapping work.
Step 4 of the typechecker migration. The expression checker reuses
gqlite's BinOp::delta and UnOp::delta, so no operator signature table
is duplicated.
Cases:
- Const(Value): map literal to SimpleType (Int→Z, Float→F, Str→S,
Bool→B). List and Record literals get loose types — fppc has no
literal lists, and full element/field typing is a phase-1 punt
recorded in the migration doc.
- Type(t): return t directly.
- AttrLookup { var, attr }: look up var, call get_attribute. Warn on
unbound variable, missing attribute, or empty type.
- FieldAccess { base, field }: gqlite-only. Look up the field on Record
base; pass through Star/Zero gradually; warn on non-record base.
- Binop: special-case Is and As when rhs is a Type literal (Is→Bool,
As→declared type). Otherwise call BinOp::delta and verify both
operands are compatible via SimpleType::meet.
- Unop: same pattern with UnOp::delta.
is_empty / != Zero is the compatibility test, mirroring fppc; no
gradual_eq is reintroduced (it was removed in fppc HEAD).
Step 5 of the typechecker migration.
refine_pattern_node builds a VariableType::Node from the descriptor
(star descriptor for anonymous nodes) and runs VariableType::refine
against the schema. refine_pattern_edge does the same, picking
EdgeDirectional or EdgeNonDirectional from the EdgeDir.
Neither pattern-matches on LabelType::{Top, Empty, Neg} or
PropertyType::{Open, Closed} directly — refinement defers to the
lattice's existing meet / is_subtype which already handle them.
Adds an assert_filters_drained sanity check: elaboration is supposed
to hoist Descriptor::value_filters into Filter nodes before typecheck
runs. If we see leftovers we surface a typechecker error so an
elaboration bug cannot silently drop constraints.
Step 6 of the typechecker migration — the load-bearing function.
Each PathPattern variant produces a (PathType, TypeEnvironment) pair:
- Node: refine descriptor against schema, wrap as PathType::Node, bind
to env if a variable name is present.
- EdgeRight / EdgeLeft / EdgeUndirected / EdgeAnyDirection: fan out
fppc's single Edge { direction } arm. Right/Left/Any produce
EdgeDirectional; Undirected produces EdgeNonDirectional. The PathType
conversion uses EdgeDir to decide forward / reversed / union shape.
- Concat / Join: meet environments (with schema refinement) and meet
path shapes. Join is gqlite-only and reduces to Concat for typing
purposes — both compose two patterns over shared variables; the
endpoint-glued vs not distinction is a runtime concern. Documented in
the migration doc; flagged for review if Join semantics ever diverge.
- Filter: type-check the expression against Bool; warn if not boolean.
- Union: union of paths, union of environments.
- Repeat / Questioned: pow_path_type on the path, env wrapped in Group
(gqlite's name for what fppc calls List). lb capped at 3 to avoid
exponential blowup, mirroring fppc.
create_context handles None descriptors (anonymous patterns) by
returning an empty environment.
Step 7 of the typechecker migration. compile and compile_query now run typechecking between elaborate and optimizer. Typechecking uses the permissive Schema::star() and rejects the query if the checker reports errors (unbound variables, context meet failures, etc.). Sibling entry points compile_unchecked and compile_query_unchecked skip typechecking — same plan as gqlite/main produced before this branch landed. The optimizer is unchanged and works in both modes. Typechecker::check_query simply delegates to check_pattern: gqlite's elaborator folds WHERE clauses into PathPattern::Filter nodes inside the pattern, so there is no separate WHERE expression to check. RETURN-clause typing is out of scope for phase 1. All 62 lib tests and 18 integration tests still pass.
Step 8 of the typechecker migration. Four hand-written queries that exercise the on-by-default and opt-out paths: - A node pattern with property return. - A two-hop concat with a WHERE filter on a node property. - A three-node concat pattern. - An unbound-variable reference in WHERE — the checked path rejects it with a 'not found' error, the unchecked path accepts it and produces a plan, confirming the typechecker is the gatekeeper. For the three valid queries the test asserts that compile_query and compile_query_unchecked produce the same optimized plan: the checker adds verification, not transformation. Not a test suite — focused punch-list confirming the wiring works end-to-end against the existing parser/elaborator/optimizer.
Step 9 of the typechecker migration. Records:
- Commit list and outcome vs acceptance criteria.
- Lattice reconciliation outcomes with fppc citations and gqlite
before/after for every touched op.
- Five deviations from the Step-0 plan:
- to_list -> to_group (gqlite's name for repetition grouping).
- TypeEnvironment::meet still returns Result<_, String> by
synthesizing Err from a VariableType::meet that collapses to Zero;
preserves fppc's check-level observable behavior.
- EdgeDir as a typechecker-local enum in path_type.rs (gqlite has no
AST EdgeDirection); From<(VariableType, ...)> became
PathType::from_variable.
- Join treated as Concat for typing — verified: comma-join shares
variables the same way Concat does.
- Schema needed no new constructor (public fields suffice).
- Punts (FieldAccess on non-records, list/record literal typing, RETURN
typing, schema sourcing, NodeId side-table).
- Two architectural follow-ups option 1 defers (parallel typing
modules; per-segment clones at Concat/Union).
- Three findings noted for the next phase to consider, none requiring
action now.
Closes the validation gap noted in the Step 9 final report. Each test
in tests/typecheck_fppc_parity.rs mirrors a fppc typechecker test
line-for-line and asserts the same ok/empty/warning predicates.
Surface-syntax differences ({a: int} -> {a is int}, bare -> -> ()-[]->())
are adapted; the behavior under test is unchanged.
All 45 translated tests pass on first run, with no implementation
changes needed. The skipped test (test_incompatible_records) uses
double-brace nested record syntax that gqlite's parser doesn't accept;
the underlying lattice case is covered by other tests.
This moves the equivalence claim against fppc from 'I built it that
way' to 'verified by direct test diff'. The remaining cross-direction
edge meet error path (fppc Err vs gqlite Zero) is unexercised by
fppc's tests and doesn't affect observable typecheck verdicts.
Migration doc final report updated.
Two cargo examples for hands-on use:
- examples/typecheck_demo.rs: ad-hoc CLI that takes a query as argv,
prints the type-check result (ok/empty/path/env/errors/warnings).
Useful for poking at individual queries:
cargo run --release --example typecheck_demo -- '(x: {a is int})(x: {a is bool})'
-> empty=true, prop 'a' collapses to Zero (int ⊓ bool).
- examples/typecheck_repl_smoke.rs: end-to-end smoke against
examples/movies.gdb. Runs four representative queries through
compile_query (typecheck on) and confirms:
* the checked path accepts each query,
* the unchecked path produces the same plan,
* the runtime returns expected row counts (38 movies, 133 people, ...),
* an unbound-variable WHERE is rejected by the checker only.
All five cases pass.
Migration doc updated with the completed test-plan items:
- bench_test failure mode confirmed identical to main (pre-existing,
unrelated to this branch)
- REPL-level smoke against movies.gdb passes
Removed unused TypecheckResult::failed associated function (dead code
warning).
Two fppc-parity fixes in the typing lattice, both narrowed to behavior
on variants both repos share:
- LabelType::is_subtype: swap (And, _) and (_, Or) arm order so
decomposition matches fppc. The two orderings are sound but pick
different short-circuit paths on (And, Or) inputs and disagree on
nested cases (e.g. And(A, Or(B,C)) vs Or(D, And(E,F))).
- PropertyType::is_subtype: add (Zero, _) => true as the first arm.
Without it, the lower-bound law fails when meet collapses to Zero
on key-mismatched Closed maps (regression seed in fppc:
Closed({"x": Star}) vs Closed({})).
Verified against ported fppc property tests + regression seeds and
the existing typecheck_fppc_parity suite.
In gqlite there's no parity-vs-non-parity distinction — these are just the typechecker tests, mirroring the gqlite-local *_test.rs convention (parser_test.rs, runtime_test.rs, etc). The file's header comment is updated to record where the cases came from (line-for-line translation of fppc/src/typechecker/checker.rs's inline #[cfg(test)] mod tests, which is the canonical typechecker reference suite) without framing them as a separate 'parity' suite. No semantic changes; all 45 tests still pass.
Merged
Felipe705x
added a commit
that referenced
this pull request
Apr 27, 2026
Format the files on main that the fmt-CI PR (#2) didn't reach. Those files came in via PR #1 (typing/checker) and either weren't part of PR #2's diff or were touched in a way that didn't normalize the formatting. Running 'cargo fmt --all' on main produces no further changes after this lands. Files touched: examples/typecheck_demo.rs examples/typecheck_repl_smoke.rs src/lib.rs src/typing/checker.rs src/typing/mod.rs src/typing/path_type.rs src/typing/type_environment.rs tests/typecheck_smoke.rs tests/typecheck_test.rs No semantic changes. cargo build + cargo test --lib both clean.
Merged
Felipe705x
added a commit
that referenced
this pull request
Apr 29, 2026
Now that the LDBC loader exposes `id` as a queryable property (parent commit), the IC2 query can anchor by `Person.id` directly — same as the spec. Changes: - `WHERE p.firstName = ... AND p.lastName = ...` → `WHERE p.id = ...` - `RETURN ...firstName, ...lastName, c.content, c.creationDate` → `RETURN friend.id, ...firstName, ...lastName, c.id, c.content, c.creationDate` (matches the spec's RETURN modulo `coalesce`). - `PARAMS` is now five real `Person.id` values from SF0.1 instead of `(firstName, lastName)` workaround pairs. Doc reflects the upgraded fidelity: replaces the old "divergences fixed" section with a 9-row honest divergence table covering every surface-syntax-or-semantic difference. Net divergences from the spec are reduced to two with real semantic effect: - **#6 (`coalesce`)**: gqlite has no `coalesce` builtin. Posts with empty `content` but non-empty `imageFile` return blank rather than the imageFile path. Few rows in SF0.1. - **#7 (no ORDER BY)**: gqlite parser doesn't have ORDER BY yet. Affects *which* 20 rows you see under LIMIT, not wall time. All other table rows (#1, #2, #3, #4, #5, #8, #9) are surface- syntax or naming differences with no semantic divergence — same query, just spelled to fit gqlite's grammar / data conventions. ### Result snapshot (SF0.1, single-iter) | Person.id | resolves to | rows | wall time | |-----------------|----------------|------|-----------| | 933 | Mahinda Perera | 20 | 108.5 s | | 1129 | Carmen Lepland | 20 | 100.6 s | | 8796093023296 | Hồ Chí Loan | 20 | 97.4 s | | 21990232555524 | Bryn Davies | 20 | 99.2 s | | 32985348833865 | Cheng Yu | 20 | 107.5 s | Wall times are similar to the previous (less faithful) bench. The optimizer gap (no value-equality predicate pushdown) still dominates; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Felipe705x
added a commit
that referenced
this pull request
Apr 29, 2026
Sweep of LDBC divergences found a bigger one I'd missed: we don't use the LDBC SNB Driver at all. The driver defines the audited measurement regime — workload at target throughput, mix of IC+IS reads and IU updates, multi-minute warmup, randomized param order, audit JSON output, throughput-at-SLA + p99 latency reporting. We have none of that. The bench is a hand-rolled harness: direct Runtime::run_query calls, IC2 only, sequential, no warmup, CSV output, min/median/mean/max wall times. The query and parameter set are still spec-faithful (rows #1-#5, #8 in the audit); the rows returned would match an LDBC-conformant run modulo #6 (no coalesce) and #7 (no ORDER BY). What's NOT spec-faithful is the measurement regime itself — the numbers we report are not directly comparable to throughput numbers in published LDBC audit submissions. Adding this as audit row #9 plus an expanded "Methodology / reporting divergence" section listing the 9 specific differences (harness, workload mix, concurrency, warmup, param order, headline metric, output format, update streams, result validation). Strict description updated: "IC2 with no coalesce, no ordering, and a hand-rolled (non-driver) harness reporting wall-time stats instead of throughput-at-SLA." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Felipe705x
added a commit
that referenced
this pull request
Apr 29, 2026
Two changes from the post-spec-fetch review:
1. Switch IC2 query to use `(p: Person {id: $personId})` shorthand,
which is verbatim spec syntax. gqlite's parser accepts it and
elaborates to `(p: Person) WHERE p.id = $personId` internally;
identical post-elab behavior. Closes audit-row #1 entirely —
that row was a self-imposed stylistic divergence, not a gap.
2. Add "Paper-tier disclosure" section answering all 10 items
from `benchmark-checklist.tex` directly. Each answer is one
line, no hedging. Replaces what would otherwise be scattered
across the doc:
- Cross-validation: No (with reason: coalesce + ORDER BY gaps)
- Persistent storage: Yes
- ACID: No
- Fault-tolerance: No
- Warmup rounds: 0
- Execution rounds: 3 per param × 15 params
- Summarization: per-param med + across-param med headline
- Load phase included: No (separate)
- SUT-dev consulted: gqlite is our own system
The maxDate predicate stays in WHERE since `<=` isn't equality;
the descriptor shorthand only handles equality. That's a spec-
faithful split (the IC2 spec has the `id` filter inside the
descriptor and the date filter in WHERE, exactly as we now do).
Results table is being re-run with --iters 3 (was 1) for stable
medians; lands in a follow-up commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Felipe705x
added a commit
that referenced
this pull request
Apr 29, 2026
* loader(ldbc): expose `id` as a queryable property on every node
Previously `load_ldbc_node_file` consumed the LDBC `id` column for the
internal node name only, excluding it from the property map. That made
LDBC's spec queries un-expressible: every IC1-IC14 query in LDBC SNB
Interactive parameterizes by `Person.id`, `Comment.id`, etc. via
`{id: $param}` shorthand or `WHERE x.id = $param`. Without `id` as a
property, those queries return NULL and the bench has to use surrogate
anchors (firstName, lastName) — a real divergence from the spec.
The fix: after the existing prop_cols loop, insert
`props["id"] = parse_ldbc_value(&id_str)`. The id is also still folded
into the internal node name (`"Person:933"`) for cross-file edge
resolution; the duplication is ~12 bytes per node (~4 MB extra on
SF0.1), required for spec-faithful queries.
Why LDBC-only and not the spanner-config CSV loader: that loader
follows different conventions (`vid`, `<Label>_id`, etc., per
CLAUDE.md) and isn't tied to a workload that requires id-as-property.
LDBC is the immediate driver; the spanner loader can argue for the
same treatment separately if needed.
Verified end-to-end on SF0.1:
$ MATCH (p: Person) WHERE p.id = 933 RETURN p.firstName, p.lastName
"Mahinda" | "Perera"
1 rows (0.026s)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench: add LDBC SNB Interactive IC2 benchmark
First slice of the LDBC Social Network Benchmark v1 (arXiv:2001.02299
§6, Interactive Complex reads) — IC2 only ("recent messages by
friends"). Other ICs need features gqlite doesn't have: shortest /
variable-length paths (IC1, IC3, IC11, IC12, IC14), date arithmetic /
OPTIONAL MATCH / ORDER BY / aggregation-with-HAVING (IC4-IC8, IC10,
IC13).
`src/bin/ldbc_bench.rs` runs IC2 against an SF0.1 .gdb, sweeping a
small curated set of `Person.firstName` values (LDBC's `Person.id` is
folded into the gqlite node name, not addressable as a property).
Reports per-param min / median / mean / max wall time. Typechecker
is off — IC queries are valid by construction; comparing typecheck
overhead vs runtime is a separate bench.
`bench/LDBC_BENCHMARK.md` documents setup (download, decompress,
import) and includes baseline numbers from the first run.
### Result snapshot (SF0.1, 327k nodes / 1.48M edges, limit=20)
| Param | min | med | mean | max |
|----------|-------|-------|-------|--------|
| Mahinda | 52 s | 56 s | 55 s | 57 s |
| Carmen | 38 s | 50 s | 65 s | 106 s |
| Bryn | 84 s | 85 s | 86 s | 88 s |
| Cheng | 87 s | 95 s | 94 s | 99 s |
These are 50-130x slower than LDBC's sub-second target. The benchmark
surfaces the cause: the optimizer's predicate pushdown handles
`x.attr is type` (per `docs/implemented-optimizations.md`) but not
value-equality predicates like `WHERE p.firstName = 'Mahinda'`. So
the firstName filter applies *after* the join enumerates every
`Person~knows~Person~hasCreator~Comment` triple — work scales with
the cartesian intermediate, not the parameter's selectivity.
Documented as a finding for a future optimizer pass — pushing `=`
constants into descriptors would collapse this into a 1-2-row anchor
on Person, dropping the join's left side from ~28k to a handful.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): close 4 IC2 fidelity divergences vs LDBC spec
The earlier bench had five divergences from LDBC IC2; four are now
fixed without touching gqlite implementation. Only ORDER BY remains
(parser feature gap).
Fixed:
- **Anchor selectivity.** `Person.firstName` → `(firstName, lastName)`.
Five params each map to *exactly one* Person in SF0.1 — same per-
query selectivity as the spec's `Person.id`, just spelled
differently because the gqlite LDBC loader folds `id` into the
internal node name rather than a property.
- **Message = Comment ∪ Post.** LDBC's `(message:Message)` covers
both via Cypher label inheritance, which gqlite doesn't have. Use
path-pattern union: `<-[:hasCreator]-(c:Comment) | <-[:hasCreator]-
(c:Post)`. `c` binds to whichever arm matches.
- **`message.creationDate <= $maxDate` filter.** Was dropped
entirely. Now in the WHERE clause; `maxDate` is a const
(1340000000000 ms ≈ mid-2012) chosen so the filter retains a
sizable fraction of SF0.1 data while still exercising.
- **RETURN columns.** Added `friend.firstName`, `friend.lastName`,
`c.content` alongside the existing `c.creationDate`. The two `id`
columns the spec wants (`friend.id`, `message.id`) remain dropped
because they're not addressable as properties; documented.
Remaining divergence (out of scope without parser changes):
- **No `ORDER BY ... DESC`** — gqlite's parser doesn't have it.
Affects which 20 rows you get under `LIMIT 20`, not wall time.
Updated `bench/LDBC_BENCHMARK.md` to spell out the spec query, the
gqlite query, and the divergence list explicitly.
### Result snapshot (SF0.1, single-iter)
| Param | rows | wall time |
|-------------------|------|-----------|
| Mahinda Perera | 20 | 96.4 s |
| Carmen Lepland | 20 | 108.1 s |
| Bryn Davies | 20 | 85.0 s |
| Cheng Yu | 20 | 105.6 s |
| Hồ Chí Loan | 20 | 110.7 s |
Wall times are slightly higher than the previous (less faithful)
bench because the union arm doubles the right side of the join
(~286k hasCreator edges vs ~151k Comment-only). The dominant cost
remains the optimizer's missing predicate pushdown on value-equality
WHERE; documented in "Things this surfaces".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): use Person.id anchor (full IC2 fidelity)
Now that the LDBC loader exposes `id` as a queryable property
(parent commit), the IC2 query can anchor by `Person.id` directly
— same as the spec.
Changes:
- `WHERE p.firstName = ... AND p.lastName = ...` → `WHERE p.id = ...`
- `RETURN ...firstName, ...lastName, c.content, c.creationDate`
→ `RETURN friend.id, ...firstName, ...lastName, c.id, c.content,
c.creationDate` (matches the spec's RETURN modulo `coalesce`).
- `PARAMS` is now five real `Person.id` values from SF0.1 instead
of `(firstName, lastName)` workaround pairs.
Doc reflects the upgraded fidelity: replaces the old "divergences
fixed" section with a 9-row honest divergence table covering every
surface-syntax-or-semantic difference. Net divergences from the
spec are reduced to two with real semantic effect:
- **#6 (`coalesce`)**: gqlite has no `coalesce` builtin. Posts
with empty `content` but non-empty `imageFile` return blank
rather than the imageFile path. Few rows in SF0.1.
- **#7 (no ORDER BY)**: gqlite parser doesn't have ORDER BY yet.
Affects *which* 20 rows you see under LIMIT, not wall time.
All other table rows (#1, #2, #3, #4, #5, #8, #9) are surface-
syntax or naming differences with no semantic divergence — same
query, just spelled to fit gqlite's grammar / data conventions.
### Result snapshot (SF0.1, single-iter)
| Person.id | resolves to | rows | wall time |
|-----------------|----------------|------|-----------|
| 933 | Mahinda Perera | 20 | 108.5 s |
| 1129 | Carmen Lepland | 20 | 100.6 s |
| 8796093023296 | Hồ Chí Loan | 20 | 97.4 s |
| 21990232555524 | Bryn Davies | 20 | 99.2 s |
| 32985348833865 | Cheng Yu | 20 | 107.5 s |
Wall times are similar to the previous (less faithful) bench. The
optimizer gap (no value-equality predicate pushdown) still
dominates; documented in "Things this surfaces".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): document parameter-generator divergence
Previous audit listed personId and maxDate substitutions as if they
were independent surface differences. They aren't — they're symptoms
of a single methodology divergence: we didn't run LDBC's
substitution_parameters generator (ldbc_snb_datagen) and instead
hand-picked params by inspection. LDBC's spec example values are
SF-specific by design (the id space is bit-packed per SF and event
timelines stretch with SF), so spec values from one SF are not
expected to work against another. The canonical workflow is to run
the generator against the chosen SF and consume its
interactive_<n>.txt output.
This doesn't affect the bench's timing claim — the join work is the
same for any valid Person.id anchor and any maxDate that doesn't cut
all rows — but it's a procedural deviation from LDBC methodology
that wasn't called out clearly. Adding it as row #10 in the audit
table, reframing #8 and #9 as derived from #10, and updating the
bench's rustdoc comment to match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): use official LDBC interactive_2 substitution params
Closes the methodology divergence flagged in the previous commit.
LDBC ships substitution parameters as a separate archive
(substitution_parameters-sf0.1.tar.zst on datasets.ldbcouncil.org)
because our SF0.1 distribution was generated with
parametergenerator.parameters:false. Downloading that archive gives
us interactive_2_param.txt: 15 (personId, maxDate) pairs from
LDBC's official parameter generator, matching our SF0.1 build
because the generator is deterministic on the dataset seed
(verified — all 15 personIds resolve in person_0_0.csv).
Changes:
- src/bin/ldbc_bench.rs: PARAMS replaced with the 15 official
pairs verbatim. Each param now carries its own maxDate (the
generator's output is per-param), so the global MAX_DATE_MS
constant goes away. CSV stdout grows a `max_date` column.
Names alongside each ID are resolved from person_0_0.csv only
for human-readable output.
- bench/LDBC_BENCHMARK.md: audit table collapses #8/#9/#10 into
a single row #8 ("uses official LDBC params, no semantic
divergence"). Setup grows a §1b for the substitution-params
download. Summary bullets updated. Strict description is now
"IC2 with no coalesce and no ordering" — the parameter-source
caveat is gone.
The Results table is currently a re-run note pointing at the old
hand-picked numbers as a reference; the official-param run is in
progress (one param in: 101s, 20 rows — same order of magnitude
as the hand-picked-param run, as expected since the bench's
bottleneck is optimizer-bound rather than parameter-bound). A
follow-up commit will land the fresh numbers.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): document driver / reporting methodology divergence
Sweep of LDBC divergences found a bigger one I'd missed: we don't
use the LDBC SNB Driver at all. The driver defines the audited
measurement regime — workload at target throughput, mix of IC+IS
reads and IU updates, multi-minute warmup, randomized param order,
audit JSON output, throughput-at-SLA + p99 latency reporting.
We have none of that. The bench is a hand-rolled harness: direct
Runtime::run_query calls, IC2 only, sequential, no warmup, CSV
output, min/median/mean/max wall times.
The query and parameter set are still spec-faithful (rows #1-#5,
#8 in the audit); the rows returned would match an LDBC-conformant
run modulo #6 (no coalesce) and #7 (no ORDER BY). What's NOT
spec-faithful is the measurement regime itself — the numbers we
report are not directly comparable to throughput numbers in
published LDBC audit submissions.
Adding this as audit row #9 plus an expanded "Methodology /
reporting divergence" section listing the 9 specific differences
(harness, workload mix, concurrency, warmup, param order, headline
metric, output format, update streams, result validation). Strict
description updated: "IC2 with no coalesce, no ordering, and a
hand-rolled (non-driver) harness reporting wall-time stats instead
of throughput-at-SLA."
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): land Results table from official-param re-run
Full IC2 sweep against ldbc-sf0.1.gdb with the 15 official LDBC
substitution parameters. 1 iter per param, ~20 min total wall.
Stats across all 15 params:
- min 52.8 s
- med 79.7 s
- mean 83.0 s
- max 146.9 s
Same regime as the previous hand-picked-param run (97-108 s for 5
params), confirming the bottleneck is gqlite's optimizer (no
predicate pushdown for value equality on `=`, see "Things this
surfaces") rather than parameter selectivity. The 3× spread
between min and max reflects caller-degree variation plus
first-iter page-cache effects, not bench noise.
All 15 params returned 20 rows (the LIMIT). Far above LDBC's
sub-second interactive target — the optimizer gap is the unblocking
work for any future driver-mediated audit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): note bench/impl scope split + no result validation
Two clarifications surfaced in review:
1. The optimizer pushdown gap that makes IC2 take ~100s is a real
finding but is NOT something this branch will fix. bench/ldbc is
bench-only — no changes to src/* implementation. Adding a
pointer to which files would need to change (pushdown.rs +
descriptor.rs + engine.rs) so a future impl PR has a starting
point.
2. We don't validate row contents against LDBC's expected_output
archives. Two divergences make any byte-for-byte check fail:
#7 (no ORDER BY) means our 20 rows aren't the 20 most recent,
and #6 (no coalesce) means imageFile-only Posts show blank.
Either alone invalidates a diff -q check. The bench asserts
row_count == LIMIT, not row contents.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): drop misleading min/med/mean/max labels at N=1
Sweep-review item: with --iters 1, the bench's report() printed
"min=X med=X mean=X max=X" with all four columns showing the same
number, suggesting statistical content where there is none.
Now: N=1 prints "wall=X.XXms (n=1, --iters >=3 recommended for
stable median)". N>=2 keeps the full spread display with an
explicit n=N. Bench-side fix only; doesn't touch gqlite impl.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): add SPEC_CHECKLIST.md — verbatim quotes from LDBC spec
Source-of-truth doc that maps each LDBC spec requirement to whether
it applies to research-paper use vs audited submissions, and what
our bench currently does. All quotes verbatim from
github.com/ldbc/ldbc_snb_docs (main branch), with file citations.
Headline finding: the spec itself sanctions a two-tier model.
benchmark-checklist.tex says "For most research papers, fully
audited results are unrealistic and even unaudited results can
provide insight" and lists exactly what info to disclose for
research use. We're firmly in that tier and the spec says that's
fine.
Practical use: when someone challenges "is this a real LDBC bench?"
the answer is now "yes, at the research-paper tier the spec
defines, with these specific gaps documented." Each gap (validation,
ACID, driver, run duration, scale) carries a verbatim quote and
a reason it's audit-only or out-of-scope here.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): IC2 syntax fidelity + paper-tier disclosure
Two changes from the post-spec-fetch review:
1. Switch IC2 query to use `(p: Person {id: $personId})` shorthand,
which is verbatim spec syntax. gqlite's parser accepts it and
elaborates to `(p: Person) WHERE p.id = $personId` internally;
identical post-elab behavior. Closes audit-row #1 entirely —
that row was a self-imposed stylistic divergence, not a gap.
2. Add "Paper-tier disclosure" section answering all 10 items
from `benchmark-checklist.tex` directly. Each answer is one
line, no hedging. Replaces what would otherwise be scattered
across the doc:
- Cross-validation: No (with reason: coalesce + ORDER BY gaps)
- Persistent storage: Yes
- ACID: No
- Fault-tolerance: No
- Warmup rounds: 0
- Execution rounds: 3 per param × 15 params
- Summarization: per-param med + across-param med headline
- Load phase included: No (separate)
- SUT-dev consulted: gqlite is our own system
The maxDate predicate stays in WHERE since `<=` isn't equality;
the descriptor shorthand only handles equality. That's a spec-
faithful split (the IC2 spec has the `id` filter inside the
descriptor and the date filter in WHERE, exactly as we now do).
Results table is being re-run with --iters 3 (was 1) for stable
medians; lands in a follow-up commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): --backend memory|lazy|disk + RSS measurement (sysinfo)
Per professor's request: compare gqlite's three GraphAccess backends
on the same IC2 query so the paper can show the speed-vs-RAM tradeoff.
- Memory (Graph from CSV): everything in RAM, no disk I/O at query
time. Uses csv_loader::load_from_ldbc_csv_dir.
- Lazy (LazyGraphStore): topology + label index in RAM, props via
per-process LRU page cache + OS page cache.
- Disk (DiskGraphStore): topology in RAM, everything else from disk
per access (via OS page cache).
Bench grows --backend and --csv-dir flags. Memory backend takes the
CSV directory directly; Lazy/Disk take the .gdb path. Each run
reports both per-param wall time and peak RSS delta over baseline.
sysinfo crate (0.30) added for cross-platform RSS reading. Used
only by ldbc_bench. RSS is OS-reported working-set; on Windows
that includes mapped file pages, so Lazy/Disk numbers reflect both
the in-process structures AND any portion of the .gdb that the OS
has resident — disclosed in the doc.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): land 3-backend Results table + cache-cost note
Memory vs Lazy vs Disk on SF0.1 IC2, per professor's request.
Median per-param: memory 4.65s, disk 7.71s, lazy 8.42s. Memory
wins on speed by ~2x, costs +605 MiB; Disk uses least RAM and
loads fastest but no in-process caching.
Three real findings, all worth flagging in the paper:
1. Memory beats Lazy/Disk by ~2x. Page-fault-free queries.
2. Lazy and Disk are within 10% on speed despite Lazy having
an in-process LRU cache. The OS file cache dominates; Lazy's
process-level cache adds bookkeeping without much benefit.
3. Cold-cache vs warm-cache cost differs by ~10x. The very first
run after machine restart shows 53-147s per param (median 80s);
subsequent runs benefit from OS-cached .gdb pages and drop to
7-10s per param. Documented as context for the warm-cache
table; cold-cache numbers preserved for paper use.
Caveat documented: sysinfo's RSS reading on Windows includes
mmap'd file pages, so Lazy/Disk RSS deltas are confounded by
which portion of the .gdb the OS has resident. Memory's +605 MiB
delta is purely allocated heap and is the most reliable absolute
number in the table.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): per-IC TOML query catalog + dispatching runner
PR #1 of the LDBC runner refactor (per the planning doc).
Architecture:
- bench/ldbc-queries/ic<n>.toml — one file per IC (1-14).
IC2 carries query template + params_file ref + divergences;
the other 13 carry status="blocked" + required_features +
blocked_reason.
- src/bin/ldbc_bench.rs rewritten to load TOML files, parse the
LDBC param file, and substitute {paramName} placeholders in the
query template. No more hardcoded PARAMS / query string.
CLI:
--ic <n>|N,M|all|blocked which IC(s) to run; default 2
--backend memory|lazy|disk default lazy
--params-dir <dir> default bench/data/substitution_parameters-sf0.1/...
--queries-dir <dir> default bench/ldbc-queries
--iters N default 3
--limit N default 20
--csv-dir <dir> required only with --backend memory
`--ic blocked` prints the inventory of blocked ICs with their
required_features (no bench run). Useful as a status report:
"what would we need to add to gqlite to run these?"
New deps: toml = "0.8" (small, well-maintained; multiline strings
in TOML are nicer than JSON for query templates).
Per professor feedback: Memory backend stays available in the bench
binary for smoke-testing but is dropped from the headline Lazy-vs-Disk
comparison in LDBC_BENCHMARK.md (Memory loads everything from CSV at
every startup — unrealistic for a database product).
Same numbers as the previous IC2-only bench; this commit is purely
the architecture change. Adding a new IC (when blocking gqlite
features land) = drop a TOML file, no bench code changes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): trim docs — drop audit-tier discussion + non-divergences
Per feedback: docs were too verbose with audit-tier content we don't
care about and "divergences" that turned out to be no-ops.
- Delete LDBC_SPEC_CHECKLIST.md entirely (314 lines of audit vs
research-paper analysis with verbatim spec quotes — irrelevant
since we're not auditing).
- Drop the "Methodology / reporting divergence (#9, expanded)"
section from LDBC_BENCHMARK.md (driver, throughput-at-SLA, ACID,
audit JSON — all audit-tier).
- Drop the "Paper-tier disclosure" section's 10-row spec-checklist
table (mostly no/no/n.a. answers; not useful when we're not
arguing for compliance).
- Trim the divergence audit from 9 rows to 2: only the real
divergences (#6 no coalesce, #7 no ORDER BY). The other 7 rows
were "None — same as spec" or "fixed in earlier commit" and
weren't divergences anyway.
- Trim the RSS caveat and cold-cache notes to a few lines each.
- Remove the long "out of scope" rationale block on the optimizer
pushdown gap; reduced to a 4-line "Fix:" pointer to which
src/* files would need to change.
LDBC_BENCHMARK.md goes from 428 -> 324 lines. Combined with the
deleted SPEC_CHECKLIST.md, total bench-doc reduction is ~420 lines.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): note LIMIT-keyword TODO in ic2.toml
When gqlite's parser gains a LIMIT keyword, append `LIMIT 20` to
the query template and drop --limit from the bench CLI. Until then
the runtime row cap (Runtime::run_query's second arg, exposed as
--limit) is the only way to enforce the spec's LIMIT 20.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): rewrite README to be how-to-run only
Per feedback: the markdown was trying to do too much.
- Divergences are already in each ic<n>.toml's [divergences] table
- Results / findings / cold-warm analysis belong in the paper, not
the bench README
- The README's job is "how does someone reproduce this run"
Cuts:
- IC2 fidelity-vs-spec section (the 2-row divergence table is now
redundant with ic2.toml; the spec quote is in the TOML's spec link)
- Results section (Lazy-vs-Disk table, per-param times, RSS caveat,
cold-cache note — all paper material)
- "Things this surfaces" (optimizer pushdown finding — paper material)
- "Architecture" section (TOML schema is self-documenting)
What stays:
- Status: 1 implemented, 13 catalogued, point at toml inventory
- Setup: download dataset, download params, build .gdb
- Run: invocation examples + flag table + output format
- Add new IC: 2-step recipe for promoting a blocked TOML
- Backends: one paragraph distinguishing the three options
- See also: pointers to the toml files, runner source, and paper
Goes from 324 to 133 lines (-258 +67). Combined with the earlier
trim and the deleted SPEC_CHECKLIST.md, total reduction across the
two LDBC bench docs is ~640 lines.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): bench_setup binary for one-command data prep
Replaces the README's three manual setup steps (curl + python zstd
+ gqlite import) with a single binary that does all three.
Idempotent: each step checks for its output and skips if present.
Re-running bench_setup with everything in place is a no-op (just
prints "[skip] ..." for each step). Pass --rebuild to force-rebuild
the .gdb.
Three deps added (only used by this binary):
- ureq 2.12 (default-features = false, "tls"): small blocking HTTP
client, ~one network call per archive
- zstd 0.13: streaming decompressor for the .tar.zst archives
- tar 0.4: streaming tar extractor
The .gdb build step shells out to the existing `gqlite --import-ldbc-csv`
binary rather than calling csv_loader directly, so the .gdb is
produced by exactly the same code path users run from the REPL — no
risk of bench_setup and gqlite drifting on the .gdb format.
Output:
data-dir: bench/data
downloading https://datasets.../ldbc-sf0.1.tar.zst
done: 17.5 MiB in 0.7s
extracting ldbc-sf0.1.tar.zst -> ldbc-sf0.1
done in 0.3s
downloading https://datasets.../substitution_parameters-sf0.1.tar.zst
done: 198.4 KiB in 0.2s
extracting substitution_parameters-sf0.1.tar.zst -> substitution_parameters-sf0.1
done in 0.0s
building ldbc-sf0.1.gdb from .../social_network-sf0.1-CsvBasic-LongDateFormatter
done: ldbc-sf0.1.gdb in 39.1s
✓ setup complete
README's "Setup" section collapsed from 3 multi-block steps to a
single command. Manual steps still recoverable from this commit's
diff if anyone needs them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): document output format + rename ic CSV column to params
Two related fixes after a user pointed out the CSV format wasn't
self-explanatory:
1. The third CSV column was named `ic` but actually carries the
substitution-param values (the IC name is already in column 1).
Renamed to `params`. Reading the column as `params` matches the
stderr summary's parens, e.g. `(19791209300143|1354060800000)`.
2. README's "Output" section was 4 lines listing column names with
no semantics. Expanded to ~50 lines covering:
- Why two streams (stdout=data, stderr=chatter, Unix convention)
- Column-by-column meaning of the CSV
- What `(personId|maxDate)` and `(n=3)` in the stderr summary mean
- Worked cross-check showing min/median/mean/max of three CSV
rows reproduces one stderr summary line
The CSV header rename is breaking-but-trivial: any existing scripts
reading `ic` would now get `params` and need a one-character change.
Since the bench has shipped maybe 2 days ago and nothing is pinned
to it, this is the right time.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): rename param_idx -> row + done summary at end of run
Two small UX fixes:
1. The CSV's `param_idx` column was the same number as the stderr
summary's `row#N` label but with a different name. Renamed to
`row` so the two streams use the same vocabulary.
2. After the last param of the last IC, the bench now prints:
✓ done — 1 IC(s) ran to completion
IC2: 15 rows × 1 iter(s) = 15 runs;
across-row median 11506.22ms
(range 9109.93-16517.23ms)
Peak RSS during query loop: 84.5 MiB (+75.4 MiB over baseline)
Confirms the run finished cleanly (vs got killed mid-row) and
gives the across-row median without grepping. Per-IC summary line
so multi-IC runs (when more get implemented) compose cleanly.
Implementation: `report` now returns the per-row median;
`run_one_ic` collects them into an `IcSummary { ic_id, iters,
row_medians_ns }` and `run_targets` builds the final aggregate
from the Vec<IcSummary>.
README's Output section updated for both changes (column rename in
the CSV table; done-summary section added with worked example).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): pre-merge fixes from review (A1-A4 + warmup + --help)
Five fixes from the pre-merge review on the runner:
A1. Type-aware param substitution. New optional `param_types` array
in the per-IC TOML; per-column tag is "int" (raw substitution,
default) or "string" (wraps in single quotes, rejects values
containing `'` since gqlite's lexer has no escape syntax).
Unblocks future ICs with string params (firstName, country
names, tag names) without putting quote-handling onto the
TOML author. ic2.toml is unchanged — defaults to all-int.
A2. Duplicate IC id detection in load_queries. After sort, scan
adjacent for matching ids; fail fast naming both source files.
Catches the footgun of copying ic2.toml without renumbering.
A3. Median for even N. Previously returned sorted[n/2] (upper of
two midpoints) for even-sized samples — neither the true
median nor a labeled percentile. New median_of() helper does
the standard average for even N. Doesn't affect --iters 3
(odd); fixes --iters 2/4/6.
A4. Schema validation at load time. New IcQuery::validate()
enforces that "implemented" carries query+params_file and
"blocked" carries blocked_reason, plus param_columns/param_types
length agreement. Failure now surfaces before any backend is
opened — not 8 minutes into a multi-IC run.
B4. --warmup N flag. Per-row warmup that runs N extra iters of the
query before the timed ones, discarding their measurements.
Banner now reads "{warmup}+{iters} iters/param (warmup+measured)"
so the count is visible. Mirrors typecheck_bench's --warmup.
Default 0 (preserves existing behavior).
Plus a small ergonomic fix:
- `ldbc_bench --help` (with no positional .gdb) now prints usage
instead of trying to open "--help" as a database file.
What was not fixed (deliberately, low impact for now):
- B1 CSV ambiguity if param values contain `|` or `;`: real
issue but no LDBC param file has these chars in our scope.
- B2 peak RSS sampling between rows, not during query: watchdog
thread isn't worth the complexity.
- B3 mmap inflates Windows RSS: documented in the README; can't
fix from the bench side.
- B5 hardcoded SF0.1 in bench_setup: out of scope for now.
- B6 no checksums on downloads: would need LDBC's published
hashes; out of scope.
- B7 --ic 2,2 doesn't dedup: not a bug, just undocumented; fine.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): second-pass review fixes — stale docs + warmup honesty
Findings from a second-pass review of the runner + README:
Stale doc references:
- Rustdoc said `--show-blocked` mode (real flag is `--ic blocked`).
- Usage text recommended `--warmup 1` to absorb cold-cache cost.
Empirically wrong for our scope: looking at iter-level data from
the SF0.1 iter=3 run, iter 0 is NOT systematically the slowest
within a row — variance is general jitter, not first-iter cold
cache. The OS page cache is cross-row, so once row#0's iters
populate it, subsequent rows + iters all hit warm pages.
Reframed: --warmup is available for parity with typecheck_bench
and for larger SFs where the cache can't hold the working set,
but at SF0.1 it's overhead.
- Warmup loop's inline comment claimed "iter 0 cold-cache"; updated
to match the empirical finding.
Stale README:
- The flag table was missing --warmup entirely (added in a previous
commit but the doc table didn't get the row).
- The stderr summary example showed `3 iters/param` but the new
banner format is `0+3 iters/param (warmup+measured)`.
- "Adding a new IC" recipe didn't mention `param_types` for ICs
with string params (IC1 firstName, IC3/IC11 country names, IC6
tagName, etc.). Now mentions `param_types` and the gqlite
no-escape-for-quotes caveat.
What's NOT a stale-doc issue (verified during the pass):
- All `# Spec: ...` URLs in TOMLs point to ldbc/ldbc_snb_docs main
branch which is stable.
- The `_ic` parameter in `report()` is used in the eprintln line.
- Edge cases (--iters 0, --warmup with no iters, etc.) handled
gracefully via the `is_empty()` skip in the final summary.
- `param_columns` doesn't strictly need to match the param file's
header — the runtime reads the file's own header. Documented.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): real fixes from a deeper review pass + 13 unit tests
The previous "second-pass review" was thin (mostly stale-doc fixes).
A proper adversarial pass surfaced four substantive issues:
C1. Arg parsing crashed with index-out-of-bounds when a flag had
no value (e.g. `ldbc_bench db.gdb --iters` with nothing after).
New `need_value` helper checks bounds and prints a clear
"{flag} requires a value" before exiting.
C2. "All rows failed" was reported as `✓ done` with no per-IC line.
A user could read this as "ran successfully, no rows" instead of
"every row erred." IcSummary now tracks `failed_rows`; final
summary prints `⚠ done with errors` when any IC had zero
successful rows, and per-IC lines surface failure counts even
when row_medians_ns is empty.
C3. `substitute()` silently left unmatched `{...}` placeholders if
the query template referenced a column not in the param file's
header. The gqlite parser would then fail with "Parse error" —
a misleading symptom for the actual problem (TOML/header
mismatch). New `find_unsubstituted_placeholder` post-pass scan
catches bare `{ident}` tokens — careful to skip the legitimate
`{id: 933}` descriptor shorthand and `{1,3}` repetition syntax.
Verified end-to-end: changing ic2.toml's query to reference a
nonexistent {countryName} now produces:
SUBSTITUTE ERROR on row 0 (params: ...): unsubstituted
placeholder {countryName} remains after substitution; check
that the query template's column names match the param
file's header (personId, maxDate)
C4. Zero unit tests for 700 lines of code. Added a tests module
covering median_of (incl. the even-N regression A3 caught
earlier), substitute (int/string/bad-type/quote-rejection/
unmatched-placeholder), and find_unsubstituted_placeholder
(true positives + false-positive guards on descriptor
shorthand and repetition quantifiers). 13 tests, all green.
Plus reframed the warmup recommendation honestly. Earlier commits
said "empirically not needed at SF0.1" based on 6 data points — too
strong a claim from too little data. New text acknowledges OS
page-cache mechanics (real, well-known) but is calibrated about
what we've actually measured: "varies by OS / RAM / storage; on a
warm machine with free RAM >= dataset size it usually doesn't help,
on cold or cache-constrained machines it can absorb a cold-iter
spike." Default stays 0; recommendation is "raise it if the
per-row summary shows iter 0 consistently dominating."
What's still NOT fixed (deliberately, low impact):
C5 TOML schema versioning, C6 absolute-path footgun in
`params_file`, C7 .gdb freshness vs loader version. Each is
documentation territory or a system-wide gqlite concern.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): final pre-merge fixes — arg parsing + idempotency
Three more issues found in the fourth review pass:
E1. The --ic value parser still used .expect() (panic with raw
Rust error) for non-numeric values like `--ic 2,abc`.
Inconsistent with the C1 fix from the previous round.
Now a clean error: "invalid --ic value \"2,abc\":
invalid digit found in string. expected: N, N,M,K, all, or
blocked".
F1. bench_setup's arg parser had the same C1-style index-out-of-bounds
risk (`bench_setup --data-dir` with no value would panic). Added
the same `need_value` helper used in ldbc_bench.
F2. bench_setup's idempotency check was "does the directory exist?".
A partially-extracted directory (interrupted run, network drop
mid-extract) would skip re-extraction and feed garbage data into
the .gdb build. Now checks for a known sub-path that only exists
post-successful-extract:
- dataset: csv_dir/dynamic/ must be a directory
- params: params_inner/interactive_2_param.txt must be a file
Detected the partial-extract case in a synthetic test (empty
csv_dir → bench_setup correctly tried to re-download instead of
skipping).
13 unit tests still pass. End-to-end smoke tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): loop-review fixes — error handling, idempotency, lint, doc
User asked for a loop-review until I find no issues. Multiple
passes; this commit is everything that surfaced.
Pass 1 — `.expect()` panics replaced with clean exits:
- load_params on empty file (ldbc_bench)
- fs::create_dir_all on the data-dir (bench_setup)
- env::current_exe lookup (bench_setup)
- fs::remove_file on --rebuild (bench_setup)
Pass 2 — TOML schema strictness:
- #[serde(deny_unknown_fields)] on IcQuery so a typo like
`parms_file` produces an immediate parse error naming the
unknown key, instead of silently leaving params_file=None
and surfacing later as "missing `params_file`".
Pass 3 — three more `.expect()` on backend dispatch:
- csv_dir on memory backend → "--csv-dir is required..."
- csv_loader::load_from_ldbc_csv_dir → propagates the io::Error
- LazyGraphStore::open / DiskGraphStore::open → ditto
Pass 4 — README's flag table had stale strong claim
"--warmup empirically doesn't help at SF0.1" — softened to match
the calibrated runner help text (varies by OS / RAM / storage).
Pass 5 — `--iters 0` was silently allowed but produced no output:
the bench would run warmup iters (if any), record zero measured
iters, emit no CSV, and print "✓ done" with no per-IC line.
Now rejected at arg parse with "--iters must be >= 1".
Pass 8 — clippy warnings:
- cmp_owned: `data_dir == PathBuf::from("...")` → `Path::new`
- doc_overindented_list_items: re-indented two doc bullets
Pass 9 — `--ic 99` (no match) printed "✓ done — 0 IC(s) ran to
completion" which falsely implied success. Now prints
"⚠ done — 0 IC(s) ran (none of the requested ICs were
implemented in the catalog). Use --ic blocked to see the
inventory."
13 unit tests still pass. cargo clippy clean. cargo fmt clean.
End-to-end smoke (--ic blocked, --ic 2, --ic 99, --ic 0, --ic 2
with bad value) all produce clean error/output paths.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* bench(ldbc): rewrite stderr docs — full sample + RSS/baseline glossary
Two issues a reader pointed out in the README's stderr section:
1. The example was incomplete — it skipped the setup lines (the
"Loaded N IC definitions / RSS baseline / Loading .gdb / RSS
after open" header) and the "✓ done" final summary that the
bench actually emits. README only showed the per-row middle.
Now shows a complete stderr trace from --ic 2 --iters 3, top
to bottom, so readers can match what they see live to what's
documented.
2. RSS and "baseline" were used in the example without definition,
leaving readers (correctly) confused about what the numbers
mean. New "What RSS and baseline mean" subsection: RSS is OS-
reported RAM use; baseline is process startup; the
"+X MiB over baseline" form makes per-step cost readable
without manual subtraction. Worked example using the actual
sample numbers.
Per-row docs (min/med/mean/max meaning, sample-size note when n=1)
unchanged — those were already correct, just buried under the
incomplete example.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DO NOT MERGE -- DIFF PREVIEW ONLY