Chore/fmt typecheck repl by Felipe705x · Pull Request #4 · pleiad/frogql

Felipe705x · 2026-04-27T16:04:21Z

No description provided.

Single source of truth for the compile pipeline: parse → elaborate → typecheck → optimize. compile_query becomes a thin wrapper that flattens diagnostics into a single Err(String); compile_query_unchecked and compile/compile_unchecked are unchanged. CompileError tags failures by phase (Parse vs Type) so callers can distinguish the two without inspecting strings. CompileResult preserves typechecker warnings (currently dropped by compile_query) so non-blocking diagnostics — typo'd attributes, non-boolean filters — can reach the user. The legacy compile_query string format ('Parse error: …', 'Type error: …') is preserved verbatim by the wrapper so existing callers (Python bindings, tests, examples) see no behavior change.

Locks the lib-level contract the REPL depends on: - parse failures → CompileError::Parse - typecheck failures → CompileError::Type with non-empty error vec - successful compile with non-bool filter → warnings vector populated - successful clean compile → empty warnings - legacy compile_query string-error format unchanged - compile_query_unchecked still ignores typechecker These also serve as a regression net for the lib pipeline itself — any future phase change (constant folding, etc.) lands in one place and these tests exercise it for both the structured and the legacy entry points.

Make the typechecker visible and toggleable in the REPL: - --no-typecheck CLI flag (anywhere in argv) skips typechecking for the session. Default: on. Flag is stripped before the existing --import-csv / --import-json positional handling so they continue to work unchanged, including in combination. - Startup banner adds 'Typechecker: on' or 'off' so the user knows. - Compile path uses compile_query_with_diagnostics when typecheck is on and compile_query_unchecked when off. - Errors are now phase-labeled at the prompt: 'Parse error: …' from CompileError::Parse 'Type error: …' from CompileError::Type (joined) Previously both surfaced as 'Parse error', which was misleading. - Typechecker warnings (e.g. 'Filter expression has type int, which is not a boolean'; typo'd attributes under closed schemas) print one per line as 'warning: …' before the query runs. Warnings don't block — the query still executes. - Usage text mentions the new flag. Manual smoke against examples/movies.gdb confirms: - 'MATCH (m:Movie) RETURN m.title' → 38 rows, no diagnostics - 'MATCH (m:Movie) WHERE x.foo = 1 RETURN m.title' → Type error: x not found - '(x WHERE 1)' → warning: not-a-boolean, then 0 rows - Same queries under --no-typecheck → all run, no errors or warnings

Run cargo fmt over the files added on the typing/checker line of work that postdated the fmt-CI PR (#2). Those files weren't in the tree when the CI fmt run produced its baseline, so they remained unformatted: examples/typecheck_demo.rs, examples/typecheck_repl_smoke.rs, src/lib.rs (new diagnostics types), src/typing/{checker, path_type, type_environment, mod}.rs, and the new tests (compile_diagnostics, typecheck_smoke, typecheck_test). No semantic changes. All 198 tests still pass after fmt.

Now that the LDBC loader exposes `id` as a queryable property (parent commit), the IC2 query can anchor by `Person.id` directly — same as the spec. Changes: - `WHERE p.firstName = ... AND p.lastName = ...` → `WHERE p.id = ...` - `RETURN ...firstName, ...lastName, c.content, c.creationDate` → `RETURN friend.id, ...firstName, ...lastName, c.id, c.content, c.creationDate` (matches the spec's RETURN modulo `coalesce`). - `PARAMS` is now five real `Person.id` values from SF0.1 instead of `(firstName, lastName)` workaround pairs. Doc reflects the upgraded fidelity: replaces the old "divergences fixed" section with a 9-row honest divergence table covering every surface-syntax-or-semantic difference. Net divergences from the spec are reduced to two with real semantic effect: - **#6 (`coalesce`)**: gqlite has no `coalesce` builtin. Posts with empty `content` but non-empty `imageFile` return blank rather than the imageFile path. Few rows in SF0.1. - **#7 (no ORDER BY)**: gqlite parser doesn't have ORDER BY yet. Affects *which* 20 rows you see under LIMIT, not wall time. All other table rows (#1, #2, #3, #4, #5, #8, #9) are surface- syntax or naming differences with no semantic divergence — same query, just spelled to fit gqlite's grammar / data conventions. ### Result snapshot (SF0.1, single-iter) | Person.id | resolves to | rows | wall time | |-----------------|----------------|------|-----------| | 933 | Mahinda Perera | 20 | 108.5 s | | 1129 | Carmen Lepland | 20 | 100.6 s | | 8796093023296 | Hồ Chí Loan | 20 | 97.4 s | | 21990232555524 | Bryn Davies | 20 | 99.2 s | | 32985348833865 | Cheng Yu | 20 | 107.5 s | Wall times are similar to the previous (less faithful) bench. The optimizer gap (no value-equality predicate pushdown) still dominates; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* loader(ldbc): expose `id` as a queryable property on every node Previously `load_ldbc_node_file` consumed the LDBC `id` column for the internal node name only, excluding it from the property map. That made LDBC's spec queries un-expressible: every IC1-IC14 query in LDBC SNB Interactive parameterizes by `Person.id`, `Comment.id`, etc. via `{id: $param}` shorthand or `WHERE x.id = $param`. Without `id` as a property, those queries return NULL and the bench has to use surrogate anchors (firstName, lastName) — a real divergence from the spec. The fix: after the existing prop_cols loop, insert `props["id"] = parse_ldbc_value(&id_str)`. The id is also still folded into the internal node name (`"Person:933"`) for cross-file edge resolution; the duplication is ~12 bytes per node (~4 MB extra on SF0.1), required for spec-faithful queries. Why LDBC-only and not the spanner-config CSV loader: that loader follows different conventions (`vid`, `<Label>_id`, etc., per CLAUDE.md) and isn't tied to a workload that requires id-as-property. LDBC is the immediate driver; the spanner loader can argue for the same treatment separately if needed. Verified end-to-end on SF0.1: $ MATCH (p: Person) WHERE p.id = 933 RETURN p.firstName, p.lastName "Mahinda" | "Perera" 1 rows (0.026s) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench: add LDBC SNB Interactive IC2 benchmark First slice of the LDBC Social Network Benchmark v1 (arXiv:2001.02299 §6, Interactive Complex reads) — IC2 only ("recent messages by friends"). Other ICs need features gqlite doesn't have: shortest / variable-length paths (IC1, IC3, IC11, IC12, IC14), date arithmetic / OPTIONAL MATCH / ORDER BY / aggregation-with-HAVING (IC4-IC8, IC10, IC13). `src/bin/ldbc_bench.rs` runs IC2 against an SF0.1 .gdb, sweeping a small curated set of `Person.firstName` values (LDBC's `Person.id` is folded into the gqlite node name, not addressable as a property). Reports per-param min / median / mean / max wall time. Typechecker is off — IC queries are valid by construction; comparing typecheck overhead vs runtime is a separate bench. `bench/LDBC_BENCHMARK.md` documents setup (download, decompress, import) and includes baseline numbers from the first run. ### Result snapshot (SF0.1, 327k nodes / 1.48M edges, limit=20) | Param | min | med | mean | max | |----------|-------|-------|-------|--------| | Mahinda | 52 s | 56 s | 55 s | 57 s | | Carmen | 38 s | 50 s | 65 s | 106 s | | Bryn | 84 s | 85 s | 86 s | 88 s | | Cheng | 87 s | 95 s | 94 s | 99 s | These are 50-130x slower than LDBC's sub-second target. The benchmark surfaces the cause: the optimizer's predicate pushdown handles `x.attr is type` (per `docs/implemented-optimizations.md`) but not value-equality predicates like `WHERE p.firstName = 'Mahinda'`. So the firstName filter applies *after* the join enumerates every `Person~knows~Person~hasCreator~Comment` triple — work scales with the cartesian intermediate, not the parameter's selectivity. Documented as a finding for a future optimizer pass — pushing `=` constants into descriptors would collapse this into a 1-2-row anchor on Person, dropping the join's left side from ~28k to a handful. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): close 4 IC2 fidelity divergences vs LDBC spec The earlier bench had five divergences from LDBC IC2; four are now fixed without touching gqlite implementation. Only ORDER BY remains (parser feature gap). Fixed: - **Anchor selectivity.** `Person.firstName` → `(firstName, lastName)`. Five params each map to *exactly one* Person in SF0.1 — same per- query selectivity as the spec's `Person.id`, just spelled differently because the gqlite LDBC loader folds `id` into the internal node name rather than a property. - **Message = Comment ∪ Post.** LDBC's `(message:Message)` covers both via Cypher label inheritance, which gqlite doesn't have. Use path-pattern union: `<-[:hasCreator]-(c:Comment) | <-[:hasCreator]- (c:Post)`. `c` binds to whichever arm matches. - **`message.creationDate <= $maxDate` filter.** Was dropped entirely. Now in the WHERE clause; `maxDate` is a const (1340000000000 ms ≈ mid-2012) chosen so the filter retains a sizable fraction of SF0.1 data while still exercising. - **RETURN columns.** Added `friend.firstName`, `friend.lastName`, `c.content` alongside the existing `c.creationDate`. The two `id` columns the spec wants (`friend.id`, `message.id`) remain dropped because they're not addressable as properties; documented. Remaining divergence (out of scope without parser changes): - **No `ORDER BY ... DESC`** — gqlite's parser doesn't have it. Affects which 20 rows you get under `LIMIT 20`, not wall time. Updated `bench/LDBC_BENCHMARK.md` to spell out the spec query, the gqlite query, and the divergence list explicitly. ### Result snapshot (SF0.1, single-iter) | Param | rows | wall time | |-------------------|------|-----------| | Mahinda Perera | 20 | 96.4 s | | Carmen Lepland | 20 | 108.1 s | | Bryn Davies | 20 | 85.0 s | | Cheng Yu | 20 | 105.6 s | | Hồ Chí Loan | 20 | 110.7 s | Wall times are slightly higher than the previous (less faithful) bench because the union arm doubles the right side of the join (~286k hasCreator edges vs ~151k Comment-only). The dominant cost remains the optimizer's missing predicate pushdown on value-equality WHERE; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): use Person.id anchor (full IC2 fidelity) Now that the LDBC loader exposes `id` as a queryable property (parent commit), the IC2 query can anchor by `Person.id` directly — same as the spec. Changes: - `WHERE p.firstName = ... AND p.lastName = ...` → `WHERE p.id = ...` - `RETURN ...firstName, ...lastName, c.content, c.creationDate` → `RETURN friend.id, ...firstName, ...lastName, c.id, c.content, c.creationDate` (matches the spec's RETURN modulo `coalesce`). - `PARAMS` is now five real `Person.id` values from SF0.1 instead of `(firstName, lastName)` workaround pairs. Doc reflects the upgraded fidelity: replaces the old "divergences fixed" section with a 9-row honest divergence table covering every surface-syntax-or-semantic difference. Net divergences from the spec are reduced to two with real semantic effect: - **#6 (`coalesce`)**: gqlite has no `coalesce` builtin. Posts with empty `content` but non-empty `imageFile` return blank rather than the imageFile path. Few rows in SF0.1. - **#7 (no ORDER BY)**: gqlite parser doesn't have ORDER BY yet. Affects *which* 20 rows you see under LIMIT, not wall time. All other table rows (#1, #2, #3, #4, #5, #8, #9) are surface- syntax or naming differences with no semantic divergence — same query, just spelled to fit gqlite's grammar / data conventions. ### Result snapshot (SF0.1, single-iter) | Person.id | resolves to | rows | wall time | |-----------------|----------------|------|-----------| | 933 | Mahinda Perera | 20 | 108.5 s | | 1129 | Carmen Lepland | 20 | 100.6 s | | 8796093023296 | Hồ Chí Loan | 20 | 97.4 s | | 21990232555524 | Bryn Davies | 20 | 99.2 s | | 32985348833865 | Cheng Yu | 20 | 107.5 s | Wall times are similar to the previous (less faithful) bench. The optimizer gap (no value-equality predicate pushdown) still dominates; documented in "Things this surfaces". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): document parameter-generator divergence Previous audit listed personId and maxDate substitutions as if they were independent surface differences. They aren't — they're symptoms of a single methodology divergence: we didn't run LDBC's substitution_parameters generator (ldbc_snb_datagen) and instead hand-picked params by inspection. LDBC's spec example values are SF-specific by design (the id space is bit-packed per SF and event timelines stretch with SF), so spec values from one SF are not expected to work against another. The canonical workflow is to run the generator against the chosen SF and consume its interactive_<n>.txt output. This doesn't affect the bench's timing claim — the join work is the same for any valid Person.id anchor and any maxDate that doesn't cut all rows — but it's a procedural deviation from LDBC methodology that wasn't called out clearly. Adding it as row #10 in the audit table, reframing #8 and #9 as derived from #10, and updating the bench's rustdoc comment to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): use official LDBC interactive_2 substitution params Closes the methodology divergence flagged in the previous commit. LDBC ships substitution parameters as a separate archive (substitution_parameters-sf0.1.tar.zst on datasets.ldbcouncil.org) because our SF0.1 distribution was generated with parametergenerator.parameters:false. Downloading that archive gives us interactive_2_param.txt: 15 (personId, maxDate) pairs from LDBC's official parameter generator, matching our SF0.1 build because the generator is deterministic on the dataset seed (verified — all 15 personIds resolve in person_0_0.csv). Changes: - src/bin/ldbc_bench.rs: PARAMS replaced with the 15 official pairs verbatim. Each param now carries its own maxDate (the generator's output is per-param), so the global MAX_DATE_MS constant goes away. CSV stdout grows a `max_date` column. Names alongside each ID are resolved from person_0_0.csv only for human-readable output. - bench/LDBC_BENCHMARK.md: audit table collapses #8/#9/#10 into a single row #8 ("uses official LDBC params, no semantic divergence"). Setup grows a §1b for the substitution-params download. Summary bullets updated. Strict description is now "IC2 with no coalesce and no ordering" — the parameter-source caveat is gone. The Results table is currently a re-run note pointing at the old hand-picked numbers as a reference; the official-param run is in progress (one param in: 101s, 20 rows — same order of magnitude as the hand-picked-param run, as expected since the bench's bottleneck is optimizer-bound rather than parameter-bound). A follow-up commit will land the fresh numbers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): document driver / reporting methodology divergence Sweep of LDBC divergences found a bigger one I'd missed: we don't use the LDBC SNB Driver at all. The driver defines the audited measurement regime — workload at target throughput, mix of IC+IS reads and IU updates, multi-minute warmup, randomized param order, audit JSON output, throughput-at-SLA + p99 latency reporting. We have none of that. The bench is a hand-rolled harness: direct Runtime::run_query calls, IC2 only, sequential, no warmup, CSV output, min/median/mean/max wall times. The query and parameter set are still spec-faithful (rows #1-#5, #8 in the audit); the rows returned would match an LDBC-conformant run modulo #6 (no coalesce) and #7 (no ORDER BY). What's NOT spec-faithful is the measurement regime itself — the numbers we report are not directly comparable to throughput numbers in published LDBC audit submissions. Adding this as audit row #9 plus an expanded "Methodology / reporting divergence" section listing the 9 specific differences (harness, workload mix, concurrency, warmup, param order, headline metric, output format, update streams, result validation). Strict description updated: "IC2 with no coalesce, no ordering, and a hand-rolled (non-driver) harness reporting wall-time stats instead of throughput-at-SLA." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): land Results table from official-param re-run Full IC2 sweep against ldbc-sf0.1.gdb with the 15 official LDBC substitution parameters. 1 iter per param, ~20 min total wall. Stats across all 15 params: - min 52.8 s - med 79.7 s - mean 83.0 s - max 146.9 s Same regime as the previous hand-picked-param run (97-108 s for 5 params), confirming the bottleneck is gqlite's optimizer (no predicate pushdown for value equality on `=`, see "Things this surfaces") rather than parameter selectivity. The 3× spread between min and max reflects caller-degree variation plus first-iter page-cache effects, not bench noise. All 15 params returned 20 rows (the LIMIT). Far above LDBC's sub-second interactive target — the optimizer gap is the unblocking work for any future driver-mediated audit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): note bench/impl scope split + no result validation Two clarifications surfaced in review: 1. The optimizer pushdown gap that makes IC2 take ~100s is a real finding but is NOT something this branch will fix. bench/ldbc is bench-only — no changes to src/* implementation. Adding a pointer to which files would need to change (pushdown.rs + descriptor.rs + engine.rs) so a future impl PR has a starting point. 2. We don't validate row contents against LDBC's expected_output archives. Two divergences make any byte-for-byte check fail: #7 (no ORDER BY) means our 20 rows aren't the 20 most recent, and #6 (no coalesce) means imageFile-only Posts show blank. Either alone invalidates a diff -q check. The bench asserts row_count == LIMIT, not row contents. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): drop misleading min/med/mean/max labels at N=1 Sweep-review item: with --iters 1, the bench's report() printed "min=X med=X mean=X max=X" with all four columns showing the same number, suggesting statistical content where there is none. Now: N=1 prints "wall=X.XXms (n=1, --iters >=3 recommended for stable median)". N>=2 keeps the full spread display with an explicit n=N. Bench-side fix only; doesn't touch gqlite impl. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): add SPEC_CHECKLIST.md — verbatim quotes from LDBC spec Source-of-truth doc that maps each LDBC spec requirement to whether it applies to research-paper use vs audited submissions, and what our bench currently does. All quotes verbatim from github.com/ldbc/ldbc_snb_docs (main branch), with file citations. Headline finding: the spec itself sanctions a two-tier model. benchmark-checklist.tex says "For most research papers, fully audited results are unrealistic and even unaudited results can provide insight" and lists exactly what info to disclose for research use. We're firmly in that tier and the spec says that's fine. Practical use: when someone challenges "is this a real LDBC bench?" the answer is now "yes, at the research-paper tier the spec defines, with these specific gaps documented." Each gap (validation, ACID, driver, run duration, scale) carries a verbatim quote and a reason it's audit-only or out-of-scope here. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): IC2 syntax fidelity + paper-tier disclosure Two changes from the post-spec-fetch review: 1. Switch IC2 query to use `(p: Person {id: $personId})` shorthand, which is verbatim spec syntax. gqlite's parser accepts it and elaborates to `(p: Person) WHERE p.id = $personId` internally; identical post-elab behavior. Closes audit-row #1 entirely — that row was a self-imposed stylistic divergence, not a gap. 2. Add "Paper-tier disclosure" section answering all 10 items from `benchmark-checklist.tex` directly. Each answer is one line, no hedging. Replaces what would otherwise be scattered across the doc: - Cross-validation: No (with reason: coalesce + ORDER BY gaps) - Persistent storage: Yes - ACID: No - Fault-tolerance: No - Warmup rounds: 0 - Execution rounds: 3 per param × 15 params - Summarization: per-param med + across-param med headline - Load phase included: No (separate) - SUT-dev consulted: gqlite is our own system The maxDate predicate stays in WHERE since `<=` isn't equality; the descriptor shorthand only handles equality. That's a spec- faithful split (the IC2 spec has the `id` filter inside the descriptor and the date filter in WHERE, exactly as we now do). Results table is being re-run with --iters 3 (was 1) for stable medians; lands in a follow-up commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): --backend memory|lazy|disk + RSS measurement (sysinfo) Per professor's request: compare gqlite's three GraphAccess backends on the same IC2 query so the paper can show the speed-vs-RAM tradeoff. - Memory (Graph from CSV): everything in RAM, no disk I/O at query time. Uses csv_loader::load_from_ldbc_csv_dir. - Lazy (LazyGraphStore): topology + label index in RAM, props via per-process LRU page cache + OS page cache. - Disk (DiskGraphStore): topology in RAM, everything else from disk per access (via OS page cache). Bench grows --backend and --csv-dir flags. Memory backend takes the CSV directory directly; Lazy/Disk take the .gdb path. Each run reports both per-param wall time and peak RSS delta over baseline. sysinfo crate (0.30) added for cross-platform RSS reading. Used only by ldbc_bench. RSS is OS-reported working-set; on Windows that includes mapped file pages, so Lazy/Disk numbers reflect both the in-process structures AND any portion of the .gdb that the OS has resident — disclosed in the doc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): land 3-backend Results table + cache-cost note Memory vs Lazy vs Disk on SF0.1 IC2, per professor's request. Median per-param: memory 4.65s, disk 7.71s, lazy 8.42s. Memory wins on speed by ~2x, costs +605 MiB; Disk uses least RAM and loads fastest but no in-process caching. Three real findings, all worth flagging in the paper: 1. Memory beats Lazy/Disk by ~2x. Page-fault-free queries. 2. Lazy and Disk are within 10% on speed despite Lazy having an in-process LRU cache. The OS file cache dominates; Lazy's process-level cache adds bookkeeping without much benefit. 3. Cold-cache vs warm-cache cost differs by ~10x. The very first run after machine restart shows 53-147s per param (median 80s); subsequent runs benefit from OS-cached .gdb pages and drop to 7-10s per param. Documented as context for the warm-cache table; cold-cache numbers preserved for paper use. Caveat documented: sysinfo's RSS reading on Windows includes mmap'd file pages, so Lazy/Disk RSS deltas are confounded by which portion of the .gdb the OS has resident. Memory's +605 MiB delta is purely allocated heap and is the most reliable absolute number in the table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): per-IC TOML query catalog + dispatching runner PR #1 of the LDBC runner refactor (per the planning doc). Architecture: - bench/ldbc-queries/ic<n>.toml — one file per IC (1-14). IC2 carries query template + params_file ref + divergences; the other 13 carry status="blocked" + required_features + blocked_reason. - src/bin/ldbc_bench.rs rewritten to load TOML files, parse the LDBC param file, and substitute {paramName} placeholders in the query template. No more hardcoded PARAMS / query string. CLI: --ic <n>|N,M|all|blocked which IC(s) to run; default 2 --backend memory|lazy|disk default lazy --params-dir <dir> default bench/data/substitution_parameters-sf0.1/... --queries-dir <dir> default bench/ldbc-queries --iters N default 3 --limit N default 20 --csv-dir <dir> required only with --backend memory `--ic blocked` prints the inventory of blocked ICs with their required_features (no bench run). Useful as a status report: "what would we need to add to gqlite to run these?" New deps: toml = "0.8" (small, well-maintained; multiline strings in TOML are nicer than JSON for query templates). Per professor feedback: Memory backend stays available in the bench binary for smoke-testing but is dropped from the headline Lazy-vs-Disk comparison in LDBC_BENCHMARK.md (Memory loads everything from CSV at every startup — unrealistic for a database product). Same numbers as the previous IC2-only bench; this commit is purely the architecture change. Adding a new IC (when blocking gqlite features land) = drop a TOML file, no bench code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): trim docs — drop audit-tier discussion + non-divergences Per feedback: docs were too verbose with audit-tier content we don't care about and "divergences" that turned out to be no-ops. - Delete LDBC_SPEC_CHECKLIST.md entirely (314 lines of audit vs research-paper analysis with verbatim spec quotes — irrelevant since we're not auditing). - Drop the "Methodology / reporting divergence (#9, expanded)" section from LDBC_BENCHMARK.md (driver, throughput-at-SLA, ACID, audit JSON — all audit-tier). - Drop the "Paper-tier disclosure" section's 10-row spec-checklist table (mostly no/no/n.a. answers; not useful when we're not arguing for compliance). - Trim the divergence audit from 9 rows to 2: only the real divergences (#6 no coalesce, #7 no ORDER BY). The other 7 rows were "None — same as spec" or "fixed in earlier commit" and weren't divergences anyway. - Trim the RSS caveat and cold-cache notes to a few lines each. - Remove the long "out of scope" rationale block on the optimizer pushdown gap; reduced to a 4-line "Fix:" pointer to which src/* files would need to change. LDBC_BENCHMARK.md goes from 428 -> 324 lines. Combined with the deleted SPEC_CHECKLIST.md, total bench-doc reduction is ~420 lines. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): note LIMIT-keyword TODO in ic2.toml When gqlite's parser gains a LIMIT keyword, append `LIMIT 20` to the query template and drop --limit from the bench CLI. Until then the runtime row cap (Runtime::run_query's second arg, exposed as --limit) is the only way to enforce the spec's LIMIT 20. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): rewrite README to be how-to-run only Per feedback: the markdown was trying to do too much. - Divergences are already in each ic<n>.toml's [divergences] table - Results / findings / cold-warm analysis belong in the paper, not the bench README - The README's job is "how does someone reproduce this run" Cuts: - IC2 fidelity-vs-spec section (the 2-row divergence table is now redundant with ic2.toml; the spec quote is in the TOML's spec link) - Results section (Lazy-vs-Disk table, per-param times, RSS caveat, cold-cache note — all paper material) - "Things this surfaces" (optimizer pushdown finding — paper material) - "Architecture" section (TOML schema is self-documenting) What stays: - Status: 1 implemented, 13 catalogued, point at toml inventory - Setup: download dataset, download params, build .gdb - Run: invocation examples + flag table + output format - Add new IC: 2-step recipe for promoting a blocked TOML - Backends: one paragraph distinguishing the three options - See also: pointers to the toml files, runner source, and paper Goes from 324 to 133 lines (-258 +67). Combined with the earlier trim and the deleted SPEC_CHECKLIST.md, total reduction across the two LDBC bench docs is ~640 lines. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): bench_setup binary for one-command data prep Replaces the README's three manual setup steps (curl + python zstd + gqlite import) with a single binary that does all three. Idempotent: each step checks for its output and skips if present. Re-running bench_setup with everything in place is a no-op (just prints "[skip] ..." for each step). Pass --rebuild to force-rebuild the .gdb. Three deps added (only used by this binary): - ureq 2.12 (default-features = false, "tls"): small blocking HTTP client, ~one network call per archive - zstd 0.13: streaming decompressor for the .tar.zst archives - tar 0.4: streaming tar extractor The .gdb build step shells out to the existing `gqlite --import-ldbc-csv` binary rather than calling csv_loader directly, so the .gdb is produced by exactly the same code path users run from the REPL — no risk of bench_setup and gqlite drifting on the .gdb format. Output: data-dir: bench/data downloading https://datasets.../ldbc-sf0.1.tar.zst done: 17.5 MiB in 0.7s extracting ldbc-sf0.1.tar.zst -> ldbc-sf0.1 done in 0.3s downloading https://datasets.../substitution_parameters-sf0.1.tar.zst done: 198.4 KiB in 0.2s extracting substitution_parameters-sf0.1.tar.zst -> substitution_parameters-sf0.1 done in 0.0s building ldbc-sf0.1.gdb from .../social_network-sf0.1-CsvBasic-LongDateFormatter done: ldbc-sf0.1.gdb in 39.1s ✓ setup complete README's "Setup" section collapsed from 3 multi-block steps to a single command. Manual steps still recoverable from this commit's diff if anyone needs them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): document output format + rename ic CSV column to params Two related fixes after a user pointed out the CSV format wasn't self-explanatory: 1. The third CSV column was named `ic` but actually carries the substitution-param values (the IC name is already in column 1). Renamed to `params`. Reading the column as `params` matches the stderr summary's parens, e.g. `(19791209300143|1354060800000)`. 2. README's "Output" section was 4 lines listing column names with no semantics. Expanded to ~50 lines covering: - Why two streams (stdout=data, stderr=chatter, Unix convention) - Column-by-column meaning of the CSV - What `(personId|maxDate)` and `(n=3)` in the stderr summary mean - Worked cross-check showing min/median/mean/max of three CSV rows reproduces one stderr summary line The CSV header rename is breaking-but-trivial: any existing scripts reading `ic` would now get `params` and need a one-character change. Since the bench has shipped maybe 2 days ago and nothing is pinned to it, this is the right time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): rename param_idx -> row + done summary at end of run Two small UX fixes: 1. The CSV's `param_idx` column was the same number as the stderr summary's `row#N` label but with a different name. Renamed to `row` so the two streams use the same vocabulary. 2. After the last param of the last IC, the bench now prints: ✓ done — 1 IC(s) ran to completion IC2: 15 rows × 1 iter(s) = 15 runs; across-row median 11506.22ms (range 9109.93-16517.23ms) Peak RSS during query loop: 84.5 MiB (+75.4 MiB over baseline) Confirms the run finished cleanly (vs got killed mid-row) and gives the across-row median without grepping. Per-IC summary line so multi-IC runs (when more get implemented) compose cleanly. Implementation: `report` now returns the per-row median; `run_one_ic` collects them into an `IcSummary { ic_id, iters, row_medians_ns }` and `run_targets` builds the final aggregate from the Vec<IcSummary>. README's Output section updated for both changes (column rename in the CSV table; done-summary section added with worked example). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): pre-merge fixes from review (A1-A4 + warmup + --help) Five fixes from the pre-merge review on the runner: A1. Type-aware param substitution. New optional `param_types` array in the per-IC TOML; per-column tag is "int" (raw substitution, default) or "string" (wraps in single quotes, rejects values containing `'` since gqlite's lexer has no escape syntax). Unblocks future ICs with string params (firstName, country names, tag names) without putting quote-handling onto the TOML author. ic2.toml is unchanged — defaults to all-int. A2. Duplicate IC id detection in load_queries. After sort, scan adjacent for matching ids; fail fast naming both source files. Catches the footgun of copying ic2.toml without renumbering. A3. Median for even N. Previously returned sorted[n/2] (upper of two midpoints) for even-sized samples — neither the true median nor a labeled percentile. New median_of() helper does the standard average for even N. Doesn't affect --iters 3 (odd); fixes --iters 2/4/6. A4. Schema validation at load time. New IcQuery::validate() enforces that "implemented" carries query+params_file and "blocked" carries blocked_reason, plus param_columns/param_types length agreement. Failure now surfaces before any backend is opened — not 8 minutes into a multi-IC run. B4. --warmup N flag. Per-row warmup that runs N extra iters of the query before the timed ones, discarding their measurements. Banner now reads "{warmup}+{iters} iters/param (warmup+measured)" so the count is visible. Mirrors typecheck_bench's --warmup. Default 0 (preserves existing behavior). Plus a small ergonomic fix: - `ldbc_bench --help` (with no positional .gdb) now prints usage instead of trying to open "--help" as a database file. What was not fixed (deliberately, low impact for now): - B1 CSV ambiguity if param values contain `|` or `;`: real issue but no LDBC param file has these chars in our scope. - B2 peak RSS sampling between rows, not during query: watchdog thread isn't worth the complexity. - B3 mmap inflates Windows RSS: documented in the README; can't fix from the bench side. - B5 hardcoded SF0.1 in bench_setup: out of scope for now. - B6 no checksums on downloads: would need LDBC's published hashes; out of scope. - B7 --ic 2,2 doesn't dedup: not a bug, just undocumented; fine. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): second-pass review fixes — stale docs + warmup honesty Findings from a second-pass review of the runner + README: Stale doc references: - Rustdoc said `--show-blocked` mode (real flag is `--ic blocked`). - Usage text recommended `--warmup 1` to absorb cold-cache cost. Empirically wrong for our scope: looking at iter-level data from the SF0.1 iter=3 run, iter 0 is NOT systematically the slowest within a row — variance is general jitter, not first-iter cold cache. The OS page cache is cross-row, so once row#0's iters populate it, subsequent rows + iters all hit warm pages. Reframed: --warmup is available for parity with typecheck_bench and for larger SFs where the cache can't hold the working set, but at SF0.1 it's overhead. - Warmup loop's inline comment claimed "iter 0 cold-cache"; updated to match the empirical finding. Stale README: - The flag table was missing --warmup entirely (added in a previous commit but the doc table didn't get the row). - The stderr summary example showed `3 iters/param` but the new banner format is `0+3 iters/param (warmup+measured)`. - "Adding a new IC" recipe didn't mention `param_types` for ICs with string params (IC1 firstName, IC3/IC11 country names, IC6 tagName, etc.). Now mentions `param_types` and the gqlite no-escape-for-quotes caveat. What's NOT a stale-doc issue (verified during the pass): - All `# Spec: ...` URLs in TOMLs point to ldbc/ldbc_snb_docs main branch which is stable. - The `_ic` parameter in `report()` is used in the eprintln line. - Edge cases (--iters 0, --warmup with no iters, etc.) handled gracefully via the `is_empty()` skip in the final summary. - `param_columns` doesn't strictly need to match the param file's header — the runtime reads the file's own header. Documented. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): real fixes from a deeper review pass + 13 unit tests The previous "second-pass review" was thin (mostly stale-doc fixes). A proper adversarial pass surfaced four substantive issues: C1. Arg parsing crashed with index-out-of-bounds when a flag had no value (e.g. `ldbc_bench db.gdb --iters` with nothing after). New `need_value` helper checks bounds and prints a clear "{flag} requires a value" before exiting. C2. "All rows failed" was reported as `✓ done` with no per-IC line. A user could read this as "ran successfully, no rows" instead of "every row erred." IcSummary now tracks `failed_rows`; final summary prints `⚠ done with errors` when any IC had zero successful rows, and per-IC lines surface failure counts even when row_medians_ns is empty. C3. `substitute()` silently left unmatched `{...}` placeholders if the query template referenced a column not in the param file's header. The gqlite parser would then fail with "Parse error" — a misleading symptom for the actual problem (TOML/header mismatch). New `find_unsubstituted_placeholder` post-pass scan catches bare `{ident}` tokens — careful to skip the legitimate `{id: 933}` descriptor shorthand and `{1,3}` repetition syntax. Verified end-to-end: changing ic2.toml's query to reference a nonexistent {countryName} now produces: SUBSTITUTE ERROR on row 0 (params: ...): unsubstituted placeholder {countryName} remains after substitution; check that the query template's column names match the param file's header (personId, maxDate) C4. Zero unit tests for 700 lines of code. Added a tests module covering median_of (incl. the even-N regression A3 caught earlier), substitute (int/string/bad-type/quote-rejection/ unmatched-placeholder), and find_unsubstituted_placeholder (true positives + false-positive guards on descriptor shorthand and repetition quantifiers). 13 tests, all green. Plus reframed the warmup recommendation honestly. Earlier commits said "empirically not needed at SF0.1" based on 6 data points — too strong a claim from too little data. New text acknowledges OS page-cache mechanics (real, well-known) but is calibrated about what we've actually measured: "varies by OS / RAM / storage; on a warm machine with free RAM >= dataset size it usually doesn't help, on cold or cache-constrained machines it can absorb a cold-iter spike." Default stays 0; recommendation is "raise it if the per-row summary shows iter 0 consistently dominating." What's still NOT fixed (deliberately, low impact): C5 TOML schema versioning, C6 absolute-path footgun in `params_file`, C7 .gdb freshness vs loader version. Each is documentation territory or a system-wide gqlite concern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): final pre-merge fixes — arg parsing + idempotency Three more issues found in the fourth review pass: E1. The --ic value parser still used .expect() (panic with raw Rust error) for non-numeric values like `--ic 2,abc`. Inconsistent with the C1 fix from the previous round. Now a clean error: "invalid --ic value \"2,abc\": invalid digit found in string. expected: N, N,M,K, all, or blocked". F1. bench_setup's arg parser had the same C1-style index-out-of-bounds risk (`bench_setup --data-dir` with no value would panic). Added the same `need_value` helper used in ldbc_bench. F2. bench_setup's idempotency check was "does the directory exist?". A partially-extracted directory (interrupted run, network drop mid-extract) would skip re-extraction and feed garbage data into the .gdb build. Now checks for a known sub-path that only exists post-successful-extract: - dataset: csv_dir/dynamic/ must be a directory - params: params_inner/interactive_2_param.txt must be a file Detected the partial-extract case in a synthetic test (empty csv_dir → bench_setup correctly tried to re-download instead of skipping). 13 unit tests still pass. End-to-end smoke tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): loop-review fixes — error handling, idempotency, lint, doc User asked for a loop-review until I find no issues. Multiple passes; this commit is everything that surfaced. Pass 1 — `.expect()` panics replaced with clean exits: - load_params on empty file (ldbc_bench) - fs::create_dir_all on the data-dir (bench_setup) - env::current_exe lookup (bench_setup) - fs::remove_file on --rebuild (bench_setup) Pass 2 — TOML schema strictness: - #[serde(deny_unknown_fields)] on IcQuery so a typo like `parms_file` produces an immediate parse error naming the unknown key, instead of silently leaving params_file=None and surfacing later as "missing `params_file`". Pass 3 — three more `.expect()` on backend dispatch: - csv_dir on memory backend → "--csv-dir is required..." - csv_loader::load_from_ldbc_csv_dir → propagates the io::Error - LazyGraphStore::open / DiskGraphStore::open → ditto Pass 4 — README's flag table had stale strong claim "--warmup empirically doesn't help at SF0.1" — softened to match the calibrated runner help text (varies by OS / RAM / storage). Pass 5 — `--iters 0` was silently allowed but produced no output: the bench would run warmup iters (if any), record zero measured iters, emit no CSV, and print "✓ done" with no per-IC line. Now rejected at arg parse with "--iters must be >= 1". Pass 8 — clippy warnings: - cmp_owned: `data_dir == PathBuf::from("...")` → `Path::new` - doc_overindented_list_items: re-indented two doc bullets Pass 9 — `--ic 99` (no match) printed "✓ done — 0 IC(s) ran to completion" which falsely implied success. Now prints "⚠ done — 0 IC(s) ran (none of the requested ICs were implemented in the catalog). Use --ic blocked to see the inventory." 13 unit tests still pass. cargo clippy clean. cargo fmt clean. End-to-end smoke (--ic blocked, --ic 2, --ic 99, --ic 0, --ic 2 with bad value) all produce clean error/output paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * bench(ldbc): rewrite stderr docs — full sample + RSS/baseline glossary Two issues a reader pointed out in the README's stderr section: 1. The example was incomplete — it skipped the setup lines (the "Loaded N IC definitions / RSS baseline / Loading .gdb / RSS after open" header) and the "✓ done" final summary that the bench actually emits. README only showed the per-row middle. Now shows a complete stderr trace from --ic 2 --iters 3, top to bottom, so readers can match what they see live to what's documented. 2. RSS and "baseline" were used in the example without definition, leaving readers (correctly) confused about what the numbers mean. New "What RSS and baseline mean" subsection: RSS is OS- reported RAM use; baseline is process startup; the "+X MiB over baseline" form makes per-step cost readable without manual subtraction. Worked example using the actual sample numbers. Per-row docs (min/med/mean/max meaning, sample-size note when n=1) unchanged — those were already correct, just buried under the incomplete example. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

The LTJ TripleIndex was being rebuilt from scratch on every call to run_join / run_concat_pattern (engine.rs:561, :633), even when the graph hadn't changed. For SF0.1 that's a ~700ms tax on every query — reading 1.5M directed + undirected edges through the page cache and sorting six orderings of the resulting triples. The CLAUDE.md `Current limits` section already flagged this as #4: "TripleIndex rebuilt per query: no cross-query caching (could go on Runtime with a RefCell)." This commit is exactly that change. Add `triple_index: RefCell<Option<TripleIndex>>` to Runtime. A new Self::triple_index() lazily builds-and-caches on first use; both LTJ call sites borrow it through the cache instead of allocating. The graph is read-only over a Runtime's lifetime, so the index never needs invalidation. Measured on `bench/data/ldbc-sf0.1.gdb`: Same query 3× back-to-back (single-branch IC2): Before: 0.709s, 0.681s, 0.693s (linear — no caching) After: 0.709s, 0.001s, 0.001s (700× warm-query speedup) LDBC IC2 across-row median (full union, 15 params × 3 iters, --warmup 1, --limit 20): No index, no cache: 2417 ms Auto hash + btree, no cache: 1377 ms (1.76×) Auto hash + btree, cached: 8.74 ms (276× total, 158× from cache) For reference: GraphQLite (SQLite extension with Cypher) was 32.82ms median on the same IC2. We're now 3.75× faster on this query. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per request to actually verify rather than just analyze: tried to build webbery/gqlite on Windows + MSVC 2022 + bundled CMake (3.31.6-msvc6), May 2026. Hit FOUR sequential build failures, each requiring manual intervention: 1. **git-describe submodule version**: shallow clone breaks libmdbx's CMake tag-matcher. Worked around by --unshallow + synthetic tag. 2. **flex/bison not on PATH for MSBuild**: README claims the bundled `tool/{flex,bison}.exe` "needs no dependency" but MSBuild doesn't honor `WORKING_DIRECTORY` for cwd-based binary lookup. Worked around by pre-generating parser/lexer manually AND prefixing `tool/` to PATH for the build invocation. 3. **libmdbx version-mismatch** in CMake-templated `version.c`: `MDBX_VERSION_MAJOR/MINOR` mismatch between mdbx.h (0,11) and the generated file (0,0) fires a preprocessor `#error`. Worked around by patching the generated version.c. 4. **MASM assembly file not built/linked**: project contains x86_64 coroutine assembly stubs but CMake doesn't `enable_language(ASM_MASM)`, so MSBuild compiles the .cpp consumers but never assembles the .asm files into .objs. Linker fails with `LNK1181: cannot open input file 'jump_x86_64_ms_pe_masm.obj'`. We stopped at #4. The point isn't "Windows is hard" — every other system in this bench (kuzu, graphqlite, gqlite) builds clean from a fresh `git clone` on the same Windows + MSVC 2022 setup. webbery's build issues are specific to its pre-2024 CMake config and unmaintained submodules. **Combined with the grammar-level blockers (no LIMIT, no label disjunction), the build issues are the second- order signal — even if all four were patched, the DSL still can't express IC2 faithfully.** The verdict is unchanged: discard. But the SKIPPED.md is now empirically grounded rather than just grammar-reading. A reviewer who clones the repo will hit the same wall.

Felipe705x added 4 commits April 27, 2026 11:58

Felipe705x closed this Apr 27, 2026

Felipe705x deleted the chore/fmt-typecheck-repl branch April 27, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore/fmt typecheck repl#4

Chore/fmt typecheck repl#4
Felipe705x wants to merge 4 commits into
mainfrom
chore/fmt-typecheck-repl

Felipe705x commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Felipe705x commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant