Skip to content

[bench] Lex-sorted-string ORDER BY fast-path regression bench #1230

@aaj3f

Description

@aaj3f

Context

Follow-up bench from PR #1228 review (item 3 in coverage gaps). The 200× ORDER BY ?label LIMIT speedup (memory: mem:fact-01kkq57f5xpx815scy9fsb6jzr) depends on the lex_sorted_string_ids flag being preserved through bulk import + the first post-import write. A regression here would silently regress production ORDER BY queries on freshly imported datasets — the failure mode is "code still runs, query just gets slow," which is exactly the class the gate exists to catch.

Scope

Add fluree-db-api/benches/query_orderby_strings.rs (alternatively rolled into import_bulk.rs as a follow-up scenario, but a separate file is cleaner). Specifically:

  • Setup: bulk-import a dataset with string-typed predicate values that exercise the lex-sorted-strings fast path. Use gen::people or gen::bsbm (whichever's distribution stresses it best) — pick deterministically so the baseline is stable.
  • Two scenarios:
    • orderby_freshly_imported: ORDER BY ?label LIMIT N immediately after a bulk import (the lex_sorted_string_ids flag should still be set).
    • orderby_after_first_write: same query, but after a single post-import write txn (which clears the flag per the invariant doc). The bench documents the latency cliff so a regression that fails to clear the flag AND a regression that clears it prematurely are both visible.
  • Throughput metric: ns/query or rows/sec (criterion default vs. Throughput::Elements).
  • Use the chassis pattern. bench_runtime(), current_profile(), current_scale(), next_ledger_alias().
  • Register a budget in regression-budget.json. Tighten the budget on the post-import scenario specifically — that's the one most prone to silent regression.

Acceptance

  • Bench compiles and runs --test green at tiny scale.
  • regression-budget.json has an entry.
  • Both scenarios show clearly-different latencies in a manual cargo bench run (verifies the bench is actually exercising the flag-driven path, not just measuring noise).
  • BENCHMARKING.md's "Current benches" table gets a row.
  • Bench-gate CI job picks it up.

References

  • PR Workspace-Wide Benchmarking Foundation #1228 review: coverage gap #3.
  • Memory: mem:fact-01kkq57f5xpx815scy9fsb6jzr (lex-sorted-strings ORDER BY 200× speedup).
  • Adjacent bench: import_bulk.rs covers the import side; this bench covers the consequence on subsequent reads.

Out of scope

The deeper correctness invariant (when should the flag be set/cleared) is the subject of existing tests. This bench is purely about catching perf regressions that flip a known fast-path off.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions