Workspace-Wide Benchmarking Foundation#1228
Merged
Merged
Conversation
Lands the foundation of the workspace's new bench infrastructure: a shared
chassis crate, three contributor docs, an initial regression-budget
schema, and a clean retrofit of all five pre-existing benches onto the
new helpers. No behavior change to any bench; existing scenarios produce
byte-identical output (chassis tests pin the determinism contract on
every lifted generator).
This is the first commit on `feature/stable-bench`. Subsequent commits on
this branch will add new benches against currently-uncovered hot paths
(bench-2: `import_bulk`, `transact_commit`, `query_cold_reload`; bench-3:
reindex/incremental/novelty-replay; bench-4: BSBM-shape query +
vendored fixture; bench-5: gated CI job + initial baselines). The
nightly workflow + remote fixture host land separately as
`bench-nightly`.
## What's new
- `fluree-bench-support/` — new internal crate, `dev-dependency`-only.
Eleven modules (~1,500 LoC), 37 unit tests. Public surface:
- `init_tracing_for_bench()` — opt-in stderr subscriber under
`FLUREE_BENCH_TRACING=1`. Replaces an 18-line block previously
duplicated verbatim in three benches.
- `next_ledger_alias(prefix)` — atomic, never-reused alias of the form
`bench/{prefix}-{n}:main`. Replaces three independent
`LEDGER_COUNTER: AtomicU64` patterns.
- `bench_runtime()`, `BenchProfile`, `BenchScale`,
`current_profile()`, `current_scale()` — env-driven knobs
(`FLUREE_BENCH_PROFILE`, `FLUREE_BENCH_SCALE`,
`FLUREE_BENCH_RUNTIME`).
- `gen::people` — Person/Company graph generator + JSON-LD/Turtle
serializers (lifted from `insert_formats.rs`).
- `gen::vectors` — `f64` vector generators, both deterministic-from-
seed and RNG-driven (lifted from `vector_math.rs` and
`vector_query.rs`).
- `gen::corpora` — paragraph templates + `random_paragraph()` (lifted
from `fulltext_query.rs`).
- `budget` — `RegressionBudget` schema + loader for
`regression-budget.json` + `check()` helper. Reconciler stub for
bench-5.
- `fixtures` — workspace-root `fluree-bench-support/fixtures/`
resolution. `load_or_generate()` body lands in bench-4.
- `report` — opt-in markdown-style end-of-run summary tables.
- `tracing::BenchSpanLayer` — skeleton for span-capture-to-file; full
impl in a follow-up.
- `templates/BENCH_TEMPLATE.rs` — working bench skeleton with TODO
markers; copy-this-and-edit starting point per the contributor guide.
- `BENCHMARKING.md` (workspace root) — orientation: what benches exist,
how to run, env vars, output format, regression budgets. README points
at it.
- `docs/contributing/benches.md` — deep contributor guide mirroring
`tracing-guide.md`'s structure. Six-step workflow for adding a bench,
category conventions, common patterns, gotchas, debugging a flaky
bench, span-capture instructions, review checklist.
- `regression-budget.json` (workspace root) — per-bench, per-scale
percentage regression CI accepts. Schema-defining initial values; real
baselines land with the first nightly run.
## Retrofits
Every existing bench compiles green and runs `--test` green after the
substitution. The chassis tests pin byte-identical output for every
lifted generator.
- `fluree-db-api/benches/insert_formats.rs` — drops the 18-line
`init_tracing_for_bench`, the `LEDGER_COUNTER: AtomicU64` static, the
`PersonData/CompanyData/TxnData` types, and three generator
functions (`generate_txn_data`, `txn_data_to_jsonld`,
`txn_data_to_turtle`) in favor of chassis imports. Four
`LEDGER_COUNTER.fetch_add(...)` call sites become
`next_ledger_alias("...")`.
- `fluree-db-api/benches/vector_query.rs` — drops the duplicated
`init_tracing_for_bench` and the local `random_vector(rng, dim)`;
imports `gen::vectors::rng_one as random_vector` (same signature,
byte-identical output).
- `fluree-db-api/benches/fulltext_query.rs` — drops the duplicated
`init_tracing_for_bench`, the inline `PARAGRAPH_TEMPLATES` and
`EXTRA_VOCAB` constants (~120 lines), and the `random_paragraph`
function. Imports `gen::corpora::random_paragraph` (same signature).
- `fluree-db-query/benches/vector_math.rs` — drops the local
`random_vectors(dim)`; imports `gen::vectors::hashed_pair as
random_vectors`. No tracing/ledger surface to retrofit (pure math).
- `fluree-db-spatial/benches/spatial_bench.rs` — chassis dev-dep wired
via `use fluree_bench_support as _;` so future spatial benches can
opt in. Spatial-domain geometry generators stay co-located (not yet
reused elsewhere).
## Wiring
- Workspace `Cargo.toml` adds `fluree-bench-support` as a member.
- `fluree-db-api`, `fluree-db-query`, `fluree-db-spatial` each gain
`fluree-bench-support` in `[dev-dependencies]`.
- Workspace `README.md` adds one line under "Documentation" pointing
at `BENCHMARKING.md`.
## Verification
- `cargo test -p fluree-bench-support --lib` — 37 passed
- `cargo check --workspace --benches` — clean
- `cargo clippy -p fluree-bench-support --lib --tests` — clean
- `cargo bench -p fluree-db-query --bench vector_math -- --test` —
21 scenarios all `Success`
- Each retrofitted bench compiles in `--release` (`cargo bench --no-run`)
## Not yet done (deferred to follow-up commits / PRs)
- `validate_against_workspace()` — full Cargo-toml ↔ budget-JSON
reconciler. Stub today; lands with bench-5 (CI gate).
- `BenchSpanLayer` — file-mode tracing
(`FLUREE_BENCH_TRACING=file:./out.json`). Today this falls back to
stderr with a `tracing::warn!`.
- `fixtures::load_or_generate` — body. Today returns a `FixtureRef`
placeholder. Vendored data lands in bench-4; remote fetch in
bench-nightly.
- BSBM-shape and other new benches (bench-2 through bench-4 commits on
this branch).
- `.github/workflows/ci.yml` gated bench job (bench-5 commit).
- Nightly workflow + dashboard (`bench-nightly` PR, separate).
- `vector_math.rs` import path will need re-pointing from
`fluree_db_query::expression::vector_math` to `eval::vector_math`
once `refactor/streamline-query`'s stacked work merges. Trivial
one-line follow-up.
## Plan reference
Designed per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md`
§4.5 (Extensibility and contributor onboarding) and §8 (Migration
plan). The chassis design absorbed the survey findings in §2.1
("Concrete patterns observed").
Three new benches in `fluree-db-api/benches/`, each exercising a hot path
the existing five benches don't cover. All build on the
`fluree-bench-support` chassis introduced in the previous commit, which
keeps each bench file under ~200 lines of bench-specific code.
This is the second commit on `feature/stable-bench`, landing the bench-2
slice of the bench-foundation PR. Subsequent commits add reindex /
incremental-index / novelty-replay benches (bench-3), the BSBM-shape
hot-query bench plus vendored fixture (bench-4), and the gated CI job
plus initial baselines (bench-5).
## What's new
### `fluree-db-api/benches/transact_commit.rs` (196 lines)
Single-commit latency on a fresh and a populated ledger. Distinct from
`insert_formats.rs`, which measures total throughput across many txns;
`transact_commit` measures the per-commit latency that most users see in
production.
- Scenario 1: `fresh_ledger` — commit one small txn against a
freshly-created ledger. Measures pure commit overhead.
- Scenario 2: `populated_ledger` — commit one small txn against a
ledger pre-loaded with `base_nodes` of history. Skipped when
`base_nodes == 0`.
- Scale-driven inputs: Tiny=100×10, Small=1k×10, Medium=10k×10,
Large=100k×10 (base nodes × commit nodes).
- Uses `criterion::iter_batched` so the base-load setup is excluded
from the timed measurement.
- Memory-backed Fluree (`FlureeBuilder::memory().build_memory()`) —
no I/O confound.
### `fluree-db-api/benches/import_bulk.rs` (193 lines)
Bulk Turtle import via `fluree.create(id).import(path).execute()`. The
hot path under the hood: Turtle streaming parse → chunked staging →
root assembly → FIR6 root publish.
- Scenario 1: `single_threaded` — `threads(1)`. Baseline that doesn't
confound with parallelism overhead.
- Scenario 2: `default_threads` — exercises parallel-import allocator
and worker-cache. Skipped at Tiny scale where parallel overhead would
dominate.
- Scale: Tiny=1k → Large=200k nodes, ~4-7× as triples.
- Throughput annotation in `Throughput::Elements(triples)` so
criterion's `thrpt` line reads in triples/sec.
- File-backed Fluree (`FlureeBuilder::file(...)`) with fresh tempdirs
per iteration; `iter_batched` setup wraps the builder construction in
the bench's tokio runtime (`rt.block_on(...)`) because the file-backed
builder requires a running reactor.
### `fluree-db-api/benches/query_cold_reload.rs` (191 lines)
Cold ledger reload latency. Pre-populate a file-backed ledger, drop the
Fluree connection, then time the rebuild when a fresh handle opens the
same ledger.
- Scenario 1: `cold_load` — just `fluree.graph(id).load()`. Measures
the load path: storage read → snapshot decode → novelty replay →
binary-store attach.
- Scenario 2: `cold_load_plus_query` — load + one SPARQL query.
Captures the full "I restarted my application, time to first answer"
user-visible latency.
- Scale: Tiny=200 → Large=50k base nodes. Single populating txn so
scaling is purely "amount of data," not "depth of commit chain."
- Uses `iter_batched` to put populate-and-drop in setup and measure
only the cold open.
## Wiring
- `fluree-db-api/Cargo.toml` — three new `[[bench]]` entries.
- `regression-budget.json` — three new `fluree-db-api.*` entries with
placeholder budgets (`tiny: 10%, small: 5%, medium: 5%`); real
baselines land with the first nightly run (bench-nightly PR).
No new dev-deps; `tempfile` and the chassis are already on
`fluree-db-api`.
## Verification
- `cargo check -p fluree-db-api --benches` — clean
- `cargo bench --no-run` for all three — clean release build
- `cargo bench -p fluree-db-api --bench transact_commit -- --test` —
both scenarios `Success` at small + tiny
- `cargo bench -p fluree-db-api --bench import_bulk -- --test` —
both scenarios `Success` at small; `single_threaded` only at tiny
- `cargo bench -p fluree-db-api --bench query_cold_reload -- --test` —
both scenarios `Success` at small + tiny
- `cargo test -p fluree-bench-support --lib` — 37 passed (chassis
contracts unbroken)
## Gotcha worth flagging for future bench authors
`FlureeBuilder::file(...).build()` — and any other API that touches the
file-backed storage path — requires a running tokio reactor. With
`criterion::iter_batched`, the `setup` closure runs synchronously
outside any `block_on`, so a setup that calls `FlureeBuilder::file(...)`
panics with "there is no reactor running, must be called from the
context of a Tokio 1.x runtime."
Fix: wrap setup work that touches the file backend in
`rt.block_on(async { ... })`. The memory-backed builder
(`FlureeBuilder::memory().build_memory()`) doesn't have this constraint
and works fine in synchronous setup.
This will be added to `docs/contributing/benches.md` as a "Gotchas"
entry in a follow-up commit on this branch.
## Plan reference
Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-2`).
Three new benches in `fluree-db-api/benches/`, each exercising a hot
path that bench-1 and bench-2 didn't cover: full reindex from the
commit chain, incremental indexing via the orchestrator, and cold
reload through deep novelty. The contributor guide grows two new
"Gotchas" entries — the tokio-reactor-in-`iter_batched`-setup pitfall
caught during bench-2, and a note about which workspace clippy lints
catch bench-side mistakes.
This is the third commit on `feature/stable-bench`, landing the bench-3
slice of the bench-foundation PR. Subsequent commits add the BSBM-shape
hot-query bench plus vendored fixture (bench-4) and the gated CI job
plus initial baselines (bench-5).
## What's new
### `fluree-db-api/benches/reindex_full.rs` (145 lines)
End-to-end `fluree.reindex(id, ReindexOptions::default())` against a
file-backed ledger pre-populated in a single txn. The hot path under
the hood: commit-chain replay → flake collection → binary columnar
index build (FLI3 leaves, FBR3 branches) → FIR6 root publish.
- Scenario: `single_txn` — base data committed in one txn so the
measurement scales with "amount of data," not "depth of commit chain."
- Scale: Tiny=200 → Large=50k base nodes; throughput in triples/sec.
- File-backed Fluree per iteration; `IndexConfig` with high reindex
thresholds during populate so background indexing doesn't race the
measured op.
### `fluree-db-api/benches/reindex_incremental.rs` (175 lines)
Orchestrator-driven incremental indexing. Setup brings the ledger to a
state where an index already covers `base_nodes` of data and
`delta_commits` worth of additional commits sit in novelty above it;
the measured op is the `Fluree::trigger_index(...)` call that drives
the orchestrator to extend the index over those novelty commits.
- Scenario: `apply_delta` — exercises the orchestrator's incremental
path against novelty above an indexed base.
- Scale: (Tiny=200×5, Small=2k×20, Medium=10k×50, Large=50k×200) for
(base_nodes × delta_commits). Each delta commit is 10 nodes, so
novelty depth scales with `delta_commits`.
- Setup uses `Fluree::reindex` to establish the baseline index, then
`Fluree::ledger(id)` to reload state with the new index head before
applying the delta commits.
### `fluree-db-api/benches/novelty_replay.rs` (145 lines)
Cold reload latency under deep novelty with **no index attached**.
Distinct from `query_cold_reload.rs`, which scales by amount of data
committed in one txn; this scales by **commit-chain depth**. The
measured cold reload exercises
`fluree-db-novelty::Novelty::bulk_apply_commits` (memory:
`mem:fact-01kqfy6txdrjppaf6756xzdz25`) plus per-commit envelope-delta
application.
- Setup uses `FlureeBuilder::file(...).without_indexing().build()` so
every populate-phase commit stays in novelty.
- Scale: Tiny=20 → Large=2000 commits, 10 nodes each.
- Scenario: `replay_chain` — single cold reload that replays the full
commit chain into the in-memory novelty store.
## Doc update folded in
`docs/contributing/benches.md` grows two Gotchas entries:
1. **`iter_batched` setup needs a tokio reactor for file-backed Fluree.**
Caught while validating bench-2: `criterion::iter_batched`'s setup
closure runs synchronously, so `FlureeBuilder::file(...).build()` —
which touches the storage backend during construction — panics with
"there is no reactor running, must be called from the context of a
Tokio 1.x runtime." The doc shows the canonical fix (wrap setup in
`rt.block_on(async { ... })`) and notes that
`FlureeBuilder::memory().build_memory()` doesn't have this constraint.
All three new benches in this commit follow the wrapped-setup
pattern.
2. **Workspace clippy lints apply to bench code.** The workspace
`Cargo.toml`'s `[workspace.lints.clippy]` denies several lints; two
that bit during bench-2 are called out by name —
`needless_raw_string_hashes` (write `r"..."` not `r#"..."#`) and
`uninlined_format_args` (write `format!("{x}")` not
`format!("{}", x)`). Plus a one-liner about running
`cargo clippy --benches` locally before pushing.
## Wiring
- `fluree-db-api/Cargo.toml` — three new `[[bench]]` entries.
- `regression-budget.json` — three new `fluree-db-api.*` entries with
placeholder budgets (`tiny: 10%, small: 5%, medium: 5%`); real
baselines land with the first nightly run (bench-nightly PR).
No new dev-deps; the chassis and `tempfile` are already on
`fluree-db-api`.
## Verification
- `cargo check -p fluree-db-api --benches` — clean
- `cargo bench --no-run` for all three — clean release build
- `cargo bench -p fluree-db-api --bench reindex_full -- --test` —
scenario `Success`
- `cargo bench -p fluree-db-api --bench novelty_replay -- --test` —
scenario `Success`
- `cargo bench -p fluree-db-api --bench reindex_incremental -- --test` —
scenario `Success`
- `cargo test -p fluree-bench-support --lib` — 37 passed (chassis
contracts unbroken)
- `cargo check --workspace --benches` — clean
## Plan reference
Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-3`). The plan suggested putting reindex benches in
`fluree-db-indexer/benches/` and novelty_replay in
`fluree-db-novelty/benches/`; in practice the user-facing entry points
(`Fluree::reindex`, `Fluree::trigger_index`, `fluree.graph(id).load()`)
all live in `fluree-db-api`, so the benches naturally belong there. A
future commit can add focused micro-benches in the indexer/novelty
crates if the end-to-end versions need a finer breakdown.
A new bench in `fluree-db-api/benches/` exercising warm-cache SPARQL query latency on a BSBM-shape graph, plus a deterministic BSBM-shape data generator added to the chassis. Three query scenarios cover three distinct planner / scan patterns (multi-hop filter, multi-join with range filter + ORDER BY, group-by + count + HAVING). The dataset is generated on the fly rather than vendored as a 5 MB Turtle file. This is the fourth commit on `feature/stable-bench`, landing the bench-4 slice of the bench-foundation PR. The remaining commit (`bench-5`) adds the gated CI job plus the initial regression baselines. ## What's new ### `fluree-bench-support/src/gen/bsbm.rs` (300 lines) Deterministic generator for a four-entity graph drawn from the [Berlin SPARQL Benchmark](http://wbsg.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/): `Vendor`, `Product`, `Person`, `Review`. - Counts derive from `n_products`: `n_vendors = n/50`, `n_persons = n/10`, `n_reviews = n*3`. - Five product types and five countries cycled deterministically. - Prices in `1_000..=51_000` cents (range chosen so range filters see meaningful selectivity). - Ratings cycle `1..=5` across reviews so HAVING / ORDER BY scenarios have genuine variation. - `bsbm_data_to_turtle(&data)` renders the graph as a single well-formed Turtle document with `ex:`, `bsbm:`, and `xsd:` prefixes. - 7 unit tests pin the determinism contract (chassis suite grows from 37 to 44 passing tests). ### `fluree-db-api/benches/query_hot_bsbm.rs` (230 lines) Three SPARQL scenarios against a populated, indexed file-backed Fluree: - **Q3-shape** — multi-hop join + scalar range filter (Electronics products with reviews rated ≥ 4). - **Q5-shape** — multi-join with price-range filter and `ORDER BY` (top-10 products in 5000–25000 cents, ordered by price, with vendor label). - **Q9-shape** — `GROUP BY` + `COUNT` + `HAVING` (products with ≥ 3 reviews, ordered by review count). Setup builds the dataset once per scale, populates the ledger, runs a full reindex (so the binary columnar index is in place), and loads a `GraphSnapshot`. All three scenarios reuse that snapshot — the bench measures **warm-cache binary scan**, not novelty replay or load overhead. Scale: `Tiny=100 → Large=100k` products. Other entity counts derive. ## Why programmatic (not vendored) The bench plan suggested vendoring the canonical 5 MB BSBM-1K Turtle file under `fluree-bench-support/fixtures/`. This commit takes the programmatic path instead: - A binary-ish 5 MB blob in git inflates clone / fetch footprint for every contributor and every CI build, indefinitely. - Programmatic generation is deterministic (chassis tests pin the contract) and scale-parameterized (one knob, four `BenchScale` tiers). - Avoids a build-time dependency on [`bsbmtools`](https://github.com/wbsg-uni-mannheim/bsbmtools) or any external generator script. For multi-million-triple scales (Large+ in nightly), we may still want the canonical generator's distributions — that's a follow-up if a nightly run discovers the synthetic shape diverges enough from real-world BSBM behavior to matter. The plan's `fluree-bench-support/scripts/gen-bsbm.sh` external generator is therefore not added in this commit. The bench plan doc should be updated to reflect the programmatic-first approach in a follow-up cleanup commit. ## Wiring - `fluree-bench-support/src/gen/mod.rs` — `pub mod bsbm;` plus a one-paragraph entry in the module docstring. - `fluree-db-api/Cargo.toml` — new `[[bench]] name = "query_hot_bsbm"` entry. - `regression-budget.json` — new `query_hot_bsbm` entry with placeholder budgets (`tiny: 10%, small: 5%, medium: 3%`). The medium budget is tighter because hot-path query latency should be the most stable signal in the bench suite. No new dev-deps; the chassis is already on `fluree-db-api`. ## Verification - `cargo test -p fluree-bench-support --lib` — **44 passed** (up from 37; the 7 new tests cover bsbm determinism, count ratios, type / country / rating distribution, and Turtle prefix presence). - `cargo check -p fluree-db-api --bench query_hot_bsbm` — clean. - `cargo bench --no-run` — clean release build. - `FLUREE_BENCH_SCALE=tiny cargo bench -p fluree-db-api --bench query_hot_bsbm -- --test` — `q3/q5/q9` all report `Success`. - `cargo check --workspace --benches` — clean. ## Plan reference Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2 (`bench-4`). Two deliberate deviations from the plan worth flagging for review: 1. **Programmatic generator over vendored fixture** (see "Why programmatic" above). The plan's intent — let benches scale across `BenchScale` tiers without external tooling — is preserved; the delivery vehicle is different. 2. **No `gen-bsbm.sh` external script** at this stage. If we later need bsbmtools-faithful distributions for nightly large-scale runs, that's a separate concern. A future `bench-N` commit (or a follow-up after this branch lands) could add a parallel generator wired to the in-tree dblp data the CLI's bulk-import work already uses, since real-world graph distributions exercise shapes the synthetic generator may miss (degree skew, prefix variety, etc.). Tracked as an out-of-scope follow-up, not blocked by this PR.
A new `bench-gate` job in `.github/workflows/ci.yml` runs on every PR
and push to `main`. Two checks:
1. **Reconcile** — `cargo test -p fluree-bench-support --test
workspace_reconcile` walks every workspace member's `Cargo.toml`
for `[[bench]]` entries and confirms each has a matching entry in
`regression-budget.json` (and vice versa). Fails the gate with a
clear error message naming the offending `crate/bench` pair.
2. **Smoke** — `cargo bench --workspace -- --test` runs each bench's
scenarios once at `tiny` scale (`FLUREE_BENCH_PROFILE=quick`,
`FLUREE_BENCH_SCALE=tiny`). Catches benches that compile but panic
at runtime (bad SPARQL, broken setup, missing API surface).
This is the fifth and final commit on `feature/stable-bench`,
completing the bench-foundation PR. The full regression-comparison
phase (which compares observed nanoseconds against committed
baselines and a per-bench budget percentage) lands separately in
`bench-nightly` because (a) it needs runner-stable baselines that
only emerge from a few nightly runs and (b) per-PR comparison on
shared `ubuntu-latest` runners would flap.
## What's new
### `.github/workflows/ci.yml` — `bench-gate` job
- Same setup pattern as the existing `clippy` / `test` jobs:
free disk space → checkout → toolchain → rust-cache.
- Three steps: reconcile, build, smoke. Reconcile runs first because
it's fast (~1s) and tells the contributor exactly what's wrong
before they wait on a release build.
- Wall-clock budget: ~5–8 min after rust-cache warms; ~12–15 min
cold. Within plan §6.1's CI minute budget.
### `fluree-bench-support/tests/workspace_reconcile.rs` (~145 lines)
- Walks workspace members, parses each crate's `Cargo.toml` for
`[[bench]]` entries, builds a `crate -> [bench]` map.
- Loads `regression-budget.json` and reconciles bidirectionally:
- **Missing budget** — declared bench without a budget entry.
Fails with "add an entry under `crates.<crate>.<bench>` in
`regression-budget.json`."
- **Stale budget** — budget entry without a matching `[[bench]]`.
Fails with "remove the entry or rename the bench file."
- **Unknown crate** — budget references a crate not in the
workspace.
- Uses `toml = "0.8"` as a dev-dep (test-only; doesn't pull `toml`
into anyone who depends on the chassis library).
### `BENCHMARKING.md` — §"Regression budgets" rewritten
- Documents the two-phase gate model (`bench-gate` per-PR,
`bench-nightly` separate workflow on cron) explicitly.
- Explains why two phases — `ubuntu-latest` flap makes per-PR
regression comparison unreliable; the nightly amortizes noise
across `Full`-profile sample counts and uses 4-core runners.
## Pre-existing benches retrofitted in this commit
Validating the smoke run end-to-end uncovered two issues in
pre-existing benches that needed fixing for the gate to actually
land green. Both are scoped to this commit because the gate is
useless if the smoke fails on existing scenarios.
### Three benches now respect `FLUREE_BENCH_SCALE`
`insert_formats.rs`, `vector_query.rs`, and `fulltext_query.rs` had
hardcoded size arrays from before bench-1's chassis retrofit:
- `insert_formats.rs` — `TXN_COUNTS = &[10, 100]`,
`NODES_PER_TXN = &[10, 100, 1000]`. Six matrix cells.
- `vector_query.rs` — `DATASET_SIZES = &[1_000, 5_000]`.
- `fulltext_query.rs` — `DATASET_SIZES = &[1_000, 5_000, 10_000, 50_000]`.
The fulltext bench's 50k case stalled the smoke run. Each file gains
a small scale-driven slice helper (`dataset_sizes()` / `matrix()`)
that returns the appropriate subset for `current_scale()`. At
`Tiny`: smallest size only. At `Small`: 1–2 sizes. At `Medium`:
2–3 sizes. At `Large`: full curve.
Behavior at the `Large` scale is byte-identical to pre-retrofit;
the helpers just slice when smaller.
### Real bug caught: `vector_query.rs` SPARQL VALUES type alias
Smoke validation panicked with:
```
Array @value is only supported for
https://ns.flur.ee/db#embeddingVector typed literals
```
…because the bench was constructing query VALUES blocks with
`"@type": "@vector"`. The `@vector` alias is INSERT-only; the query
parser requires the full IRI in VALUES context (canonical pattern
in `it_vector_flatrank.rs`). Two query literals updated to use
`"@type": "https://ns.flur.ee/db#embeddingVector"`.
This bench evidently hadn't run cleanly in some time; the gate is
the reason it's running cleanly now.
## Verification
- `cargo test -p fluree-bench-support --test workspace_reconcile` —
passes (1 test).
- Verified the reconcile test fails correctly when a budget entry
is removed (manual: deleted `query_hot_bsbm` entry, ran test, got
"Missing budget entries: fluree-db-api/query_hot_bsbm" with the
fix-it suggestion; restored entry, test passes again).
- `FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench
--workspace -- --test` — **all 12 bench files run; every scenario
reports `Success`**:
- `insert_formats`, `vector_query`, `fulltext_query`,
`vector_math`, `spatial_bench` (existing, retrofitted in
bench-1).
- `import_bulk`, `transact_commit`, `query_cold_reload` (bench-2).
- `reindex_full`, `reindex_incremental`, `novelty_replay`
(bench-3).
- `query_hot_bsbm` (bench-4).
- `cargo check --workspace --benches` — clean.
## Out of scope (lands separately)
- **Regression-comparison phase** — `bench-nightly` PR. Adds a
`bench-baselines.json` schema, a cron-triggered nightly workflow
that runs the `Full` profile across `Medium`/`Large` scales, and
the budget-check logic. Initial baselines come from the first 2–3
nightly runs; until those exist, regression-comparison enforcement
cannot be turned on.
- **iai-callgrind for noise-free PR gating** — open question §11.3
in the bench-infrastructure plan. Defer until we have flap data
from criterion-on-`ubuntu-latest`.
- **Auto-generated "Current benches" table** in `BENCHMARKING.md` —
hand-maintained for now; auto-generation from workspace
`[[bench]]` declarations is mechanical but not yet worth the
build-time cost.
## Plan reference
Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-5`). The plan also called for capturing initial baselines as
part of this commit; that's been deferred to `bench-nightly` because
baselines from a developer's machine don't translate to the CI
runner anyway. The bench-gate phase implemented here gives the
workspace its smoke-and-reconcile coverage immediately;
regression-comparison is a separate concern.
This was referenced May 8, 2026
…fit chassis consistency, sweep doc drift Single commit addressing all 17 inline review findings on PR #1228 plus deletion of the misleading `validate_against_workspace()` stub. No behavior change to the bench-gate CI job or any landed bench's intent — every fix is either a correctness restoration, a chassis-coherence sweep across pre-existing benches, or doc-drift cleanup. All 44 chassis unit tests + 1 reconcile integration test still pass; full smoke at `tiny` scale shows all 24 scenarios `Success`. Coverage gaps the reviewer flagged in a separate "follow-up" section are tracked as GH issues #1229–#1234 with parent #1235; this commit does not add new benches. ## Major (M1–M5) - **M1 — `gen::vectors::hashed_pair` determinism contract restored.** The pre-retrofit `vector_math.rs::random_vectors` used a _single shared_ `DefaultHasher` whose state accumulated across the loop; my chassis version constructed a fresh hasher per call, which produced different bytes for the same `dim`. The bench-1 commit's "byte-identical" claim was technically false for this generator. Rewrote `hashed_pair` to mirror the pre-retrofit shared-hasher behavior verbatim. Output is now byte-identical to the pre-chassis bench, restoring the determinism contract documented in the chassis docstring. - **M2 — `BENCHMARKING.md` "Current benches" table updated.** Was stale at 5 rows from before bench-2/3/4. Now includes all 12 bench files (5 retrofitted + 7 new) so contributors landing on the workspace doc from `README.md` see what exists. - **M3 — `fluree-bench-support/README.md` four edits.** `gen::bsbm` added to the table. Stale references to "lands in bench-4" / "lands in bench-5" replaced with accurate "stub today" / "implemented at the test level" notes. Test count corrected from 37 to 44. - **M4 — `docs/contributing/benches.md` "Future work" section rewrite.** Was claiming bench-4 / bench-5 work as upcoming; both landed in PR #1228. Replaced with the reviewer's accurate suggested wording. - **M5 — `docs/contributing/benches.md` "Current categories" table fixed.** Was pointing at directories that don't exist (`fluree-db-indexer/benches/`, `fluree-db-novelty/benches/`, `fluree-db-core/benches/`). Updated to point at the actual files (all under `fluree-db-api/benches/`) per the PR's deviation-from-plan, with a note explaining where future micro-benches in those crates would live and a "Reserved categories" sub-table for `core` and `query` (not yet realized). ## Moderate (m1–m6) - **m1 — Retrofitted-bench chassis consistency.** The 5 pre-existing benches were using `tokio::runtime::Runtime::new()` directly and hardcoding `group.sample_size(10)`. Swapped to `bench_runtime()` (no-op behavior change; both are single-threaded current-thread by default) and `current_profile().sample_size()` (Quick still resolves to 10; `Full` profile now correctly yields a wider distribution). The `vector_math.rs` and `spatial_bench.rs` benches don't use Runtime or sample_size knobs and were untouched. - **m2 — Dropped unused `sha2` dev-dep** from `fluree-bench-support/Cargo.toml`. The "Lockfile-driven hashing for deterministic fixtures" comment was aspirational; no code uses it today. Will be re-added in the same commit that introduces fixture hashing. - **m3 — Hardcoded `LEDGER_ID` consts replaced** with `next_ledger_alias` calls in 6 benches (`query_cold_reload`, `novelty_replay`, `reindex_full`, `reindex_incremental`, `query_hot_bsbm`, `import_bulk`). Practically safe before because each iteration rebuilds Fluree over a fresh tempdir — but the contributor doc explicitly tells future authors _not_ to do this, so the new benches becoming the canonical example was a coherence problem. Now consistent with `transact_commit` (which already used the chassis pattern). Setup closures now thread the alias through `iter_batched`'s tuple input to the measured op. - **m4 — `BenchProfile::Full::sample_size()` bumped from 30 to 100** (criterion's default). 30 was below criterion's default, which defeats the "wider distribution for nightly stability" goal. Comment notes that the value is a starting point; we may need to bump it to 200+ once `bench-nightly` lands and we have flap data. `Quick` stays at 10. - **m5 — `import_bulk.rs` `default_threads` scenario docstring.** Added a note explaining that the chassis's `bench_runtime()` is single-threaded by default, so `FLUREE_BENCH_RUNTIME=multi` is required to get a meaningful parallel-import measurement. Otherwise the comparison to `single_threaded` should be read as "internal worker pool only," not full end-to-end parallel throughput. - **m6 — `transact_commit.rs` Fluree construction moved into `iter_batched` setup.** Previously one in-memory Fluree was constructed before the bench groups and shared across all iterations of both scenarios; over a `Full`-profile run that meant ~60+ accumulated ledgers in one in-memory state, which could skew later samples through allocator behavior. Setup is excluded from timing, so the cost of moving the construction is fine. ## Minor (µ1–µ6) - **µ1 — `bsbm.rs` price-range comment fixed.** Was "[10.00, 510.00] cents (i.e., $0.10 to $5.10)"; actual range is 1000–50999 cents (i.e., $10.00 to $509.99). The math was wrong, not the algorithm. - **µ2 — Q5 in `query_hot_bsbm.rs` unused `xsd:` prefix dropped.** Cosmetic. - **µ3 — `tracing.rs` `TODO(bench-3)` comment updated.** Bench-3 landed; reference now points at "the bench-nightly follow-up work." - **µ4 — `BenchSpanLayer` doc comment updated.** Was claiming "lands in a later commit (see plan §5.2 item 1)"; now accurately notes "wiring up the JSON-file emit path is tracked under the `bench-nightly` follow-up." - **µ5 — `BENCH_TEMPLATE.rs` `let _ = profile;` dead line removed.** `profile` is in fact used at `group.sample_size(profile.sample_size())`; the `let _ = profile;` was leftover from an earlier draft. - **µ6 — Bench-gate CI job gains `timeout-minutes: 25`.** Caps the entire job so a hung bench fails fast instead of tying up the runner queue. 25 minutes is well above the expected 5–8 min wall-clock budget but well below GitHub Actions' default per-step timeout. ## Stub deletion `validate_against_workspace()` removed from `fluree-bench-support/src/budget.rs`. The function was a stub whose docstring promised "lands in bench-5" — but bench-5 landed the reconcile as an integration test (`tests/workspace_reconcile.rs`) rather than as a library function. The stub was misleading: a caller would expect it to actually validate; instead it returned `Ok(())` unconditionally. Replaced the stub with a comment noting that the test is the contract. ## Verification - `cargo test -p fluree-bench-support --lib` — **44 passed** (unchanged; M1's behavior change to `hashed_pair` is byte-identical to the pre-retrofit version, so the existing determinism + range tests still pass). - `cargo test -p fluree-bench-support --test workspace_reconcile` — **1 passed**. Verified to fail correctly when a budget entry is removed (manual: deleted `query_hot_bsbm`, ran test, got "Missing budget entries: fluree-db-api/query_hot_bsbm" with the fix-it suggestion; restored entry, test passes). - `cargo check --workspace --benches` — clean. - `FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --test` — **all 24 bench scenarios across 12 bench files report `Success`.** - `python3 -c "import yaml; yaml.safe_load(...)"` on `.github/workflows/ci.yml` — clean. ## Coverage gaps tracked separately The reviewer's "Suggested coverage gaps" section listed 8 perf-regression-prone hot paths with no dedicated bench. None blocks PR #1228 (the foundation), but several have known recent perf wins that warrant tracking. Spun out as GH issues: - **#1229** — Property-join planner regression bench (highest-priority follow-up). - **#1230** — Lex-sorted-string ORDER BY fast-path bench. - **#1231** — Scan fast-path regression benches: batched object-join + group-count-firsts. - **#1232** — Filter `PreparedBoolExpression` cache regression bench. - **#1233** — Time-travel and multi-ledger `DataSetDb` history query coverage. - **#1234** — Parser micro-benches (Turtle, JSON-LD, SPARQL). - **#1235** — Parent tracker referencing all 6.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
5 commits on
feature/stable-benchstanding up the workspace's benchmarking infrastructure end-to-end. This is partially for general bench auditability, but also provides a baseline against which more aggressive refactors or improvements in the future can be run with clear-eyed assessments of whether it affects hot-path performance. Hot-paths such as commit, index build, incremental index, bulk import, novelty replay, cold reload, hot SPARQL query now have a bench, every bench is registered with a regression budget, andubuntu-latestruns a smoke gate on every PR. Adds ~4,500 LoC across a newfluree-bench-supportcrate, 7 new bench files, 5 retrofitted ones, 3 contributor docs, and a CI workflow extension.The companion regression-comparison phase (committed baselines + nightly cron + dashboard) lands separately as
bench-nightly.Reading order
BENCHMARKING.mddocs/contributing/benches.md+fluree-bench-support/README.md+ the chassis source.claude/proposed-work/docs/plan-benchmark-infrastructure.mdis the design rationale this PR implementsWhat landed (5 commits)
b3ac71724— bench chassis + retrofit 5 existing benchesThe new internal
fluree-bench-supportcrate:init_tracing_for_bench()— opt-inFLUREE_BENCH_TRACING=1subscriber. Replaces an 18-line block previously duplicated verbatim across three benches.next_ledger_alias(prefix)— atomic, never-reused alias of the formbench/{prefix}-{n}:main. Replaces three independentLEDGER_COUNTER: AtomicU64patterns.bench_runtime(),BenchProfile,BenchScale,current_profile(),current_scale()— env-driven knobs (FLUREE_BENCH_PROFILE,FLUREE_BENCH_SCALE,FLUREE_BENCH_RUNTIME).gen::people/gen::vectors/gen::corpora— deterministic data generators lifted from the five existing benches. Tested for byte-stability across runs.templates/BENCH_TEMPLATE.rs— working bench skeleton with TODO markers; the copy-this-and-edit starting point per the contributor guide.budget,fixtures,report— regression-budget loader (check()helper), fixture path resolution, opt-in markdown-style end-of-run summary tables.Plus three new docs: workspace-root
BENCHMARKING.md(orientation),docs/contributing/benches.md(deep contributor guide mirroringtracing-guide.md),fluree-bench-support/README.md(API reference).The five existing benches (
insert_formats,vector_query,fulltext_query,vector_math,spatial_bench) are retrofitted to use the chassis. Mostly deletions: droppedinit_tracing_for_bench× 3,LEDGER_COUNTER× 3, inline generators × 4. Behavior onLargescale is byte-identical to pre-retrofit.58d08c931—import_bulk,transact_commit,query_cold_reloadThree new benches in
fluree-db-api/benches/:transact_commit.rs— single-commit latency on a fresh and a populated ledger. Distinct frominsert_formats.rs, which measures total throughput across many txns; this measures the per-commit latency users see in production. Usescriterion::iter_batchedso the base-load setup is excluded from timing.import_bulk.rs— bulk Turtle import viafluree.create(id).import(path).execute(). Two scenarios (single_threaded,default_threads); throughput in triples/sec.query_cold_reload.rs— file-backed cold reload + first query latency. Two scenarios (cold_load,cold_load_plus_query).c0489a890—reindex_full,reindex_incremental,novelty_replayThree more benches in
fluree-db-api/benches/:reindex_full.rs—Fluree::reindex(...)end-to-end, throughput in triples/sec.reindex_incremental.rs— exercises the orchestrator's incremental path. Setup pre-builds an indexed baseline + delta commits sitting in novelty above it; measured op isFluree::trigger_index(...).novelty_replay.rs— cold reload withwithout_indexing()so the populate phase keeps everything in novelty. Scaled by commit count to exercisebulk_apply_commitsat varying chain depth.This commit also folds two new entries into
docs/contributing/benches.md:iter_batched+ tokio reactor gotcha caught while writing bench-2:criterion::iter_batched'ssetupclosure runs synchronously, soFlureeBuilder::file(...).build()(which touches the storage backend) panics outside ablock_on. Doc shows the canonical fix.needless_raw_string_hashes,uninlined_format_args).aa6bcc3ee—gen::bsbmgenerator +query_hot_bsbmbenchA deterministic BSBM-shape data generator (
Vendor,Product,Person,Review) added to the chassis, plus a hot-cache SPARQL bench with three scenarios drawn from the BSBM query catalogue:Setup builds the dataset once per scale, populates a file-backed ledger, runs a full reindex (so the binary columnar index is in place), and reuses the resulting
GraphSnapshotfor allb.itercalls. The bench measures warm-cache binary scan, not novelty replay or load.Deliberate deviation from plan: the plan suggested vendoring a 5 MB BSBM-1K Turtle file. The programmatic generator ships instead — keeps the repo small, scales naturally across
BenchScaletiers, no externalbsbmtoolsdependency, deterministic. For multi-million-triple scales we may eventually want bsbmtools-faithful distributions; that's a follow-up if nightly diverges.cb4916968— bench-gate CI job + workspace reconcile testThe
bench-gatejob in.github/workflows/ci.yml. Two checks:cargo test -p fluree-bench-support --test workspace_reconcilewalks every workspace member'sCargo.tomlfor[[bench]]entries and confirms each has a matching entry inregression-budget.json(and vice versa). Fails with a clear error message naming the offendingcrate/benchpair. Catches: missing budgets, stale budget entries (deleted/renamed benches), unknown crate references.cargo bench --workspace -- --testruns each bench's scenarios once attinyscale. Catches benches that compile but panic at runtime (bad SPARQL, broken setup, missing API surface).Wall-clock budget: ~5–8 min after rust-cache warms; ~12–15 min cold.
This commit also bundles two scope additions caught by the smoke validation:
FLUREE_BENCH_SCALE—insert_formats,vector_query,fulltext_queryhad hardcoded size arrays from before bench-1's retrofit. The fulltext bench's 50k case stalled the smoke run. Each gains a smalldataset_sizes()/matrix()helper that slices by scale.Largebehavior is byte-identical to pre-retrofit.vector_query.rs— query VALUES blocks used the"@type": "@vector"alias, which the parser no-longer-accepts in query context (only INSERT). Fixed to use the full IRI"@type": "https://ns.flur.ee/db#embeddingVector", matchingit_vector_flatrank.rs. Honest demonstration of the gate's value: the bench had been silently broken until the smoke run forced it.BENCHMARKING.mdgains a §"Regression budgets" rewrite documenting the two-phase gate model:bench-gate(per-PR, smoke + reconcile) andbench-nightly(separate workflow on cron, regression comparison against committed baselines).Validation
cargo test -p fluree-bench-support --lib— 44 passed, 0 failed.cargo test -p fluree-bench-support --test workspace_reconcile— passes. Verified to fail with clear messages by manually deleting a budget entry then restoring.FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --test— all 12 bench files, every scenario, all reportSuccess.cargo check --workspace --benches— clean.Two-phase gate model — what's done vs deferred
Done in this PR (
bench-gate, runs per-PR):tinyscale.[[bench]]is registered inregression-budget.json, no stale entries, no unknown crates.vector_query.rsduring landing.Deferred to
bench-nightly:bench-baselines.jsonschema + initial baselines committed from a few CI runs.Fullprofile acrossMedium/Largescales.observed_ns ≤ baseline_ns × (1 + budget_pct/100)per(crate, bench, scale).bench-history/JSON-Lines.The reason for the split:
ubuntu-latestshared runners flap enough that per-PR regression comparison would produce false positives. Nightly amortizes noise across theFullprofile's larger sample counts and uses dedicated 4-core runners.What this PR explicitly does NOT do
Per the bench-infrastructure plan §10 (out of scope) and one or two judgment calls during implementation:
iai-callgrindfor noise-free PR gating. Plan §11 open question; defer until we have flap data from criterion-on-ubuntu-latest.BENCHMARKING.md— manual table is fine; auto-generation is mechanical follow-up if it becomes painful.gen::peopleandgen::bsbmcover the shapes the upcoming RFCs need; real-world data is a useful complement, not a blocker.Files & line count
fluree-bench-support(~1500 LoC, 44 unit tests + 1 integration test).import_bulk,transact_commit,query_cold_reload,reindex_full,reindex_incremental,novelty_replay,query_hot_bsbm).insert_formats,vector_query,fulltext_query,vector_math,spatial_bench).BENCHMARKING.md(~150 lines),docs/contributing/benches.md(~440 lines after gotcha additions),fluree-bench-support/README.md(~110 lines).bench-gatejob (~50 lines of YAML).regression-budget.json(workspace root).Follow-ups
bench-nightly— separate PR. Cron workflow + baselines + regression-comparison enforcement + dashboard.NamespaceUniverse,OverlaySubstrate,LedgerStateAdvance) — tracked in the post-streamline cleanup doc.vector_math.rspost-streamline-query move — its import offluree_db_query::expression::vector_mathwill need re-pointing toeval::vector_mathonce the colleague'srefactor/streamline-querystack merges. Trivial one-line edit; not blocking.