fix: bind `@op` and emit sidecar events for history-range queries by bplatz · Pull Request #1198 · fluree/db

bplatz · 2026-04-24T14:49:01Z

Fixes two stacked bugs that left history-range queries ("from": "ledger@t:1", "to": "ledger@t:latest") returning wrong results on indexed ledgers:

The @op variable always bound to null — the only call site that would populate it (Binding::from_object_with_t_op) had zero callers.
Retracts and superseded asserts that lived in the history sidecar were never emitted — the binary cursor only read current-state base rows, and the sidecar was only consulted for point-in-time (AS OF t) replay.

Together these meant the reporter's query

{
  "from": "ledger:main@t:1",
  "to":   "ledger:main@t:latest",
  "select": ["?v", "?t", "?op"],
  "where": [{
    "@id": "<concept>",
    "skos:notation": {"@value":"?v","@t":"?t","@op":"?op"}
  }]
}

returned one row (the current asserted value) with ?op = null instead of the full timeline of assert / retract events.

What changed

New module: `fluree-db-query/src/binary_history.rs`

BinaryHistoryScanOperator is a thin wrapper around BinaryScanOperator. When ctx.history_mode is false it transparently delegates. When true, it runs a dedicated three-source merge in open():

History sidecar entries (HistEntryV2) for each matching leaflet — assert + retract events with explicit op.
Base leaflet rows whose t falls in [from_t, index_t] — emitted as op = assert.
Overlay / novelty flakes in (index_t, to_t] — carry their own flake.op.

The merged Vec<Flake> is handed to the existing BinaryScanOperator via a new prime_history_flakes pub(crate) helper, which routes it through the already-history-aware flakes_to_bindings pipeline (binary_scan.rs:~704 copies flake.op onto the emitted Binding::Lit).

Narrowing

Leaves and leaflets are narrowed by the pattern's bound components so we only touch the relevant slice of the index:

subject bound → branch.find_leaves_for_subject(s_id) on SPOT
predicate-only bound → PSOT branch, with leaflet p_const / o_type_const skip
sidecar segment skipped when history_len == 0 or history_max_t < from_t

A subject-history query for one subject touches only the leaves containing that subject's s_id — no broad database scan.

Wiring

Every scan construction site now produces the history-aware wrapper:

ScanDatasetBuilder::build in dataset_operator.rs (the main planner path)
execute_pattern / execute_pattern_at / execute_pattern_with_overlay / execute_pattern_with_overlay_at in lib.rs (used by staging, policy, config resolver)

Non-history queries pay one extra vtable hop per next_batch (~1–2 ns, negligible against batch sizes of 100+ rows).

Helper-visibility changes

BinaryScanOperator::prime_history_flakes — new pub(crate) method that primes the operator to drain a pre-collected Vec<Flake> via the existing flake pipeline.
extract_bound_terms_snapshot, build_filter_from_snapshot_sids, filter_flakes_by_policy — bumped from fn to pub(crate) fn for reuse from binary_history.rs.

Cost model

All heavy lifting happens in open(), so that's where the fuel guardrail lives:

Persisted pass: 1000 micro-fuel per leaflet touched — same rate BinaryCursor::next_batch uses for non-history scans.
Novelty pass: 1 micro-fuel per matched novelty flake, charged during the walk. A captured FuelExceededError short-circuits the callback before the next flake is pushed. Pattern filter is applied inside the callback so we only charge for flakes we'd actually retain — otherwise a wide predicate on a crowded novelty slice would charge for every flake touched.
Per-flake emit: 1 micro-fuel per flake is charged downstream in flakes_to_bindings when next_batch drains the collected vec.

Together these prevent unbounded eager work before the first batch emission.

Tests

Six regression tests in fluree-db-api/tests/it_query_history_range.rs:

history_range_emits_sidecar_events_with_op — reporter's scenario: reindex + upsert + reindex, verify all three events ((Alice, 1, assert), (Alice Smith, 2, assert), (Alice, 2, retract)) emit with correct @t and @op.
history_range_novelty_only — unindexed ledger, verifies the novelty-only path also binds @op.
history_range_op_constant_filter_assert — "@op": "assert" as a constant filter returns only asserts.
history_range_op_constant_filter_retract — "@op": "retract" returns only retracts.
history_range_sidecar_plus_novelty_boundary — reindex at t=1, transact t=2, query spanning index_t; exercises the to_t > index_t novelty-merge path.
history_range_subject_unbound — predicate-only history across multiple subjects.

Known limitations / follow-ups

Documented in the binary_history.rs module header:

Multi-pattern history with joins. When a history query has two or more triple patterns, the first becomes the history-aware scan, but patterns 2+ are driven by batched_subject_probe_binary in join.rs, which still reads current-state base rows only. ?s values joined in from a history-aware outer scan will fetch today's label, not their historical labels. Fix needs a per-pattern history_semantics flag (propagated from parser's existing t_var/op_var tracking) plus a history-aware branch in the join probe.
Per-leaf sidecar pruning. open_leaf_handle(..., need_replay=true) fetches the whole sidecar blob up front. For leaves whose directory doesn't yet reveal any leaflet with history_max_t >= from_t, this is wasted I/O on local/cached reads. Fixing cleanly needs either a two-pass open (dir first, sidecar on demand) or leaf-level history_max_t on LeafEntry.
Streaming emit. collect_history_flakes materialises the matched event set into a Vec<Flake> before next_batch drains it. Subject- or predicate-bound queries match a small set so this is cheap; broad unconstrained history queries over wide time ranges could benefit from leaflet-at-a-time streaming.
BinaryRangeProvider::range hardcodes op: true (binary_range.rs:~495). Non-query callers that pass RangeOptions::history_mode: true (SHACL, reasoner, etc.) would get collapsed-state. No active caller does this today per grep, but it's a sharp edge worth cleaning up separately.

zonotope

I have two concerns. I think the fuel calculation could be a real issue for large queries, but I think the fix is easy.

The other concern is architectural. I think it would be worth pushing the history mode decision to the planner to simplify execution, but that change might be too big to do at this point. If that is the case, then we can save it for a later refactor.

Other than that, 👍🏾

Approved.

zonotope · 2026-04-28T17:39:34Z

+                            // it, a broad history query could do unbounded column /
+                            // sidecar loads in `open()` before the caller ever sees
+                            // the first batch.
+                            ctx.tracker.consume_fuel(1000)?;


The semantics of BinaryCursor::next_batch is different here. That charges per call, and each call is processing a single batch. A leaflet larger than the batch size means multiple calls and so multiple fuel charges. In the history case, since all the work is done up front in open(), we charge fuel at most once per leaflet regardless of how many rows it has. Large leaflets where we have to do a lot of work would incur the same cost as small ones where we don't. Perhaps as a quick fix we could multiply the fuel charge by the row count. We have access to the row count before we do any work, so we should be able to do this before incurring any cost, and could short circuit before the overrun. The tradeoff is that we could overcharge on some queries though.

Addressed in c203a39

zonotope · 2026-04-28T18:10:34Z

+use crate::var_registry::VarId;
+
+/// Scan operator that activates a dedicated history-range walk when
+/// `ctx.history_mode` is true.


A history mode flag on the context spreads control flow over multiple layers of execution, which increases code complexity. We have to decide whether or not to set history mode in the context, then we have to change behavior at each point we read the flag. That makes the decision tree for query execution much more complex.

I think it would be simpler to remove the concept of history mode from the execution context and instead make it a planner concern. That way we would make the decision once, and then build a separate operator tree for history queries rather than increasing the complexity of an individual operator by making it apply in both cases. That would make the decision tree less "complected". You can take me out of Clojure, but you can't take Clojure out of me ;)

That might be a bigger refactor than this branch warrants, so I'm fine with saving it for later if it isn't feasible to do now.

Sounds good and looked at change an big enough for another PR so I can work on that, thanks for pointing it out.

bplatz added 3 commits April 24, 2026 07:41

history range test

f2cde38

history aware wrapper, execution ctx

8e6fcd7

history range query path

86866ae

bplatz requested review from aaj3f and zonotope April 24, 2026 14:49

Merge remote-tracking branch 'origin/main' into fix/history-ranges

f167339

bplatz mentioned this pull request Apr 25, 2026

fix: history @t/@op for IRI-valued objects + boolean @op #1203

Merged

zonotope approved these changes Apr 28, 2026

View reviewed changes

bplatz added 2 commits April 29, 2026 06:54

make history fuel consistent with query

c203a39

Merge remote-tracking branch 'origin/main' into fix/history-ranges

6ac3ff5

bplatz merged commit facccdf into main Apr 29, 2026
10 of 12 checks passed

bplatz deleted the fix/history-ranges branch April 29, 2026 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bind `@op` and emit sidecar events for history-range queries#1198

fix: bind `@op` and emit sidecar events for history-range queries#1198
bplatz merged 6 commits intomainfrom
fix/history-ranges

bplatz commented Apr 24, 2026

Uh oh!

zonotope left a comment

Uh oh!

zonotope Apr 28, 2026

Uh oh!

bplatz Apr 29, 2026

Uh oh!

zonotope Apr 28, 2026

Uh oh!

bplatz Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bplatz commented Apr 24, 2026

What changed

New module: fluree-db-query/src/binary_history.rs

Narrowing

Wiring

Helper-visibility changes

Cost model

Tests

Known limitations / follow-ups

Uh oh!

zonotope left a comment

Choose a reason for hiding this comment

Uh oh!

zonotope Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

bplatz Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

zonotope Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

bplatz Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New module: `fluree-db-query/src/binary_history.rs`