Skip to content

Fix: rdfs:Class COUNT Misses#1209

Merged
aaj3f merged 3 commits into
mainfrom
fix/partial-leaflet-issue
May 4, 2026
Merged

Fix: rdfs:Class COUNT Misses#1209
aaj3f merged 3 commits into
mainfrom
fix/partial-leaflet-issue

Conversation

@aaj3f
Copy link
Copy Markdown
Contributor

@aaj3f aaj3f commented Apr 29, 2026

Summary

Fixes #1208COUNT { ?c rdf:type rdfs:Class } returns 0 on indexed datasets, while the parallel COUNT { ?p rdf:type rdf:Property } and any user-class variant return the correct count. The data is on disk and reachable via FILTER / GROUP BY / non-COUNT shapes; only the COUNT-with-bound-object fast path drops the rows.

The bugs live in count_bound_object_v6 and group_count_v6 (fluree-db-query/src/fast_group_count_firsts.rs) — the fast paths that answer SELECT (COUNT(?s) AS ?n) WHERE { ?s <p> <o> } and SELECT ?o (COUNT(?s) AS ?n) WHERE { ?s <p> ?o } GROUP BY ?o from POST-leaflet directory metadata.

Root cause

POST leaflets store one directory entry per leaflet whose first_key is the leaflet's first row. The fast path walks those entries and decides per-leaflet whether to skip, fast-count, or row-scan. POST sorts by (p_id, o_type, o_key, …). Several compounding errors:

1. Skip-by-first-prefix

Old logic:

if prefix < target_prefix { continue; }
if prefix > target_prefix { break; }
// fall through: prefix == target → fast-count by row_count

A leaflet's rows span [first_prefix, next_leaflet_first_prefix). A target value can sit inside a leaflet whose first row is some smaller value. In the reproducer the very first POST leaflet contains rdf:Property (the first row), then a few rdfs:Class rows, then a run of user-class rows; the loop saw first_prefix < target and continued, then saw the next leaflet's first_prefix > target and broke — never decoding the leaflet that actually held rdfs:Class.

2. Mixed-predicate skip

Old logic:

if entry.p_const != Some(p_id) { continue; }

p_const is None for any leaflet whose rows don't all share a predicate — i.e. boundary leaflets that straddle a p_id change. Those can still contain rows for our predicate, but the check skipped them outright.

3. prefix_v6_from_entry strips p_id

POST's primary sort key is p_id. The fast path's prefix_v6_from_entry returns just (o_type, o_key), which is a meaningful range bound within a predicate but unrelated to our range when the next leaflet starts a different predicate. With the predicate boundary fix above (point 2), this becomes a real footgun: the next leaflet's (o_type, o_key) for some other predicate could happen to be lexicographically less than the target's, falsely tripping the "leaflet ends before target → skip" condition. In the mixed-predicate regression test this fired exactly: a homogeneous rdf:type leaflet with prefix == target was skipped because the next leaflet started on rdfs:label with a smaller (o_type, o_key).

4. Early break unsafe for mixed leaflets

prefix > target_prefix → break is sound only for homogeneous-predicate leaflets. In a mixed-predicate leaflet the first row may belong to a different predicate whose (o_type, o_key) exceeds the target while later rows in the leaflet (or in subsequent leaflets) still belong to our predicate.

5. Non-cached column projection misses p_id

load_v6_batch's no-cache branch built a projection with only OKey (and optionally OType). The per-row p_id check inside mixed-predicate leaflets needs the PId column; without it batch.p_id.get_or(row, 0) returns 0 and every row gets silently rejected. The cached path always loads all columns so production was unaffected, but the no-cache path is reachable from test setups and is a footgun for any future caller.

6. Boundary-equality fast count needs homogeneous gate

The next_prefix == Some(prefix) → += entry.row_count shortcut is only valid when the leaflet is entirely target rows. In a mixed-predicate leaflet that condition can hold while the leaflet still contains rows for other predicates that must be excluded.

Why the asymmetry (rdfs:Class fails, rdf:Property works)

In POST sort order, rdf:type rows are grouped by object IRI (o_key). In the reproducer, rdf:Property happens to be the first row of the leaflet (first_prefix == target), so the existing prefix == target branch catches it. rdfs:Class lives between rdf:Property and the first user class within the same leaflet — first_prefix < target — and gets dropped by error #1 above. Any class that lands as the very first row of a POST leaflet works; any class that lands mid-leaflet silently returns 0.

Why only on indexed/bulk-imported data

count_bound_object_v6 and group_count_v6 only fire when there's a binary index. Memory-only / pure-novelty queries never reach them.

Fix

In count_bound_object_v6 (and the same set of fixes mirrored into group_count_v6):

  • Mixed-predicate leaflets are no longer skipped on p_const. Replace entry.p_const != Some(p_id) { continue } with: only skip when p_const = Some(other_pid); verify p_id per-row inside mixed leaflets.
  • Skip a leaflet only when it ends strictly before the target. Compute next_prefix and skip the leaflet only when next_prefix < target_prefix. Otherwise the target could be inside the leaflet — fall through to either the boundary-equality fast count or the row-level scan.
  • Predicate-qualified next_prefix. Added pid_prefix_v6_from_entry returning (p_id, o_type, o_key). When the next leaflet starts a different predicate, treat next_prefix as None (unknown upper bound) rather than comparing against an unrelated (o_type, o_key) from a different predicate's rows.
  • Gated early break. if prefix > target_prefix { break; } is now if prefix > target_prefix && entry.p_const == Some(p_id) { break; }.
  • load_v6_batch non-cached projection includes ColumnId::PId when entry.p_const.is_none(), so the per-row p_id check works regardless of cache state.
  • Boundary-equality fast count is gated on prefix == target_prefix && next_prefix == Some(target_prefix) && entry.p_const == Some(p_id) — the entire leaflet is target rows for our predicate.

Test plan

  • Add count_bound_object_first_key_skip_regression (fluree-db-api/tests/it_query_rdfs_class_repro.rs) — bulk-imports a small TTL where multiple distinct classes share a single homogeneous-rdf:type leaflet, then asserts COUNT for rdf:Property, rdfs:Class, and a 3000-instance user class. Exercises bug #1.
  • Add count_mixed_predicate_leaflet_regression — bulk-imports a tiny TTL designed to land in a single leaflet that straddles rdf:type and rdfs:label, then asserts COUNT { ?c a rdfs:Class } and the parallel SELECT.
  • All 12 existing tests in fluree-db-api/tests/it_fast_group_count.rs pass unchanged.
  • All 997 fluree-db-query library unit tests pass.
  • Verified end-to-end on the originally-reported dataset: COUNT for rdfs:Class, COUNT for rdf:Property, and the FILTER rewrite all agree, and the full-IRI form matches the prefixed form.

Follow-ups (out of scope for this PR)

There are ~16 other call sites in the codebase using the same entry.p_const != Some(p_id) skip pattern (fast_count.rs, fast_path_common.rs, fast_exists_join_count_distinct_object.rs, fast_sum_strlen_group_concat.rs, count_plan_exec.rs, join.rs). They don't all share the prefix-skip variant of the bug, but the mixed-predicate-leaflet under-counting concern likely affects some of them. Worth a focused audit pass — tracking separately.

@aaj3f aaj3f force-pushed the fix/partial-leaflet-issue branch from c0e1285 to a2515d3 Compare April 30, 2026 15:05
aaj3f and others added 2 commits April 30, 2026 11:14
The V6 fast path in `count_bound_object_v6` walked POST-leaflet directory
entries and used each leaflet's first key as a prefix to decide whether
the leaflet could contain the target object. Two compounding errors made
it skip leaflets that did contain matching rows:

1. When `leaflet.first_prefix < target`, the loop did `continue` — but a
   leaflet's rows span `[first_prefix, next_leaflet_first_prefix)`, so a
   target value can sit *inside* a leaflet whose first row is some
   smaller value. The fix skips a leaflet only when
   `next_prefix < target_prefix`, i.e. the leaflet ends strictly before
   the target.

2. `entry.p_const != Some(p_id)` skipped any leaflet without a constant
   predicate — including mixed-predicate leaflets that contain rows for
   our predicate. The fix only skips when `p_const = Some(other_pid)`,
   and verifies `p_id` per row inside mixed leaflets.

The boundary-equality fast count (using `entry.row_count` as the count
for the entire leaflet) is now gated on both ends of the leaflet equal
to the target — meaning the whole leaflet is target rows, not just the
first row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ast paths

Follow-up to the initial `count_bound_object_v6` fix. Code review surfaced
three latent issues introduced by that change plus a parallel bug in
`group_count_v6`. A new mixed-predicate regression test caught a fourth.

In `fast_group_count_firsts.rs`:

- `load_v6_batch` non-cached projection now includes `ColumnId::PId` when
  `entry.p_const.is_none()`. Without it, the per-row `p_id` check inside
  mixed-predicate leaflets reads `ColumnData::AbsentDefault` and silently
  rejects every row whenever the leaflet cache is unconfigured.
- `if prefix > target_prefix { break; }` is now gated on
  `entry.p_const == Some(p_id)`. In a mixed-predicate leaflet the first
  row's `(o_type, o_key)` may belong to a different predicate and exceed
  `target_prefix` while later rows in the leaflet still match.
- Added `pid_prefix_v6_from_entry` returning `(p_id, o_type, o_key)`.
  POST sorts by `(p_id, o_type, o_key, …)`, so a stripped `(o_type, o_key)`
  bound is only meaningful within a single predicate. When the next
  leaflet starts a different predicate, treat next_prefix as `None`
  (unknown upper bound) — fall through to row-level scan.
- The same set of fixes applied to `group_count_v6`, which had the
  parallel `entry.p_const != Some(p_id)` skip, missing per-row `p_id`
  filter, and unguarded boundary-equality fast count.
- Boundary-equality fast count (`next_prefix == Some(prefix) →
  += entry.row_count`) now gated on `entry.p_const == Some(p_id)` in
  both functions — a mixed-predicate leaflet may share its first
  `(o_type, o_key)` with the next leaflet but still contain rows for
  *other* predicates.
- Half-open interval comments unified on `[prefix, next_prefix)` with
  the spillover note.

Test cleanup (`fluree-db-api/tests/it_query_rdfs_class_repro.rs`):

- Removed `#[ignore]`-d diagnostic tests that depended on a local dataset
  outside the repo; trimmed remaining tests to the documented A–G + JSON-LD
  shapes with generic `ex:ClassA`…`E` synthetic ontology.
- Added `count_mixed_predicate_leaflet_regression` exercising the
  `p_const = None` branch (a leaflet straddling `rdf:type` and
  `rdfs:label`). This test surfaced the predicate-qualified-prefix bug
  fixed above.
- Renamed the prior regression test to
  `count_bound_object_first_key_skip_regression` for clarity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aaj3f aaj3f force-pushed the fix/partial-leaflet-issue branch from a2515d3 to 1172bca Compare April 30, 2026 15:14
@aaj3f aaj3f marked this pull request as ready for review April 30, 2026 15:17
@aaj3f aaj3f requested review from bplatz and zonotope April 30, 2026 15:17
Copy link
Copy Markdown
Contributor

@bplatz bplatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with one small consistency fix requested inline.

Comment thread fluree-db-query/src/fast_group_count_firsts.rs Outdated
Add `&& entry.p_const == Some(p_id)` to the boundary-equality fast-count
gate in `count_bound_object_v6`, matching the existing gate in
`group_count_v6` and the PR description.

After the upstream `entry.p_const != Some(other_pid)` filter, `p_const`
is either `Some(p_id)` (homogeneous) or `None` (mixed-predicate). In the
mixed case, `prefix` is the leaflet's first row's `(o_type, o_key)`
regardless of that row's predicate. If a mixed-predicate leaflet's
first row happens to land at `target_prefix` for some _other_ predicate
and the next leaflet also starts at `target_prefix` for our predicate,
the previous gate would add the entire `row_count` — including rows
for the other predicate — instead of falling through to the per-row
scan. The per-row scan path already verifies `p_id` per row when
`p_const.is_none()`, so falling through is correct.

Narrow scenario in practice, but closes a real correctness gap and
restores symmetry with `group_count_v6`.
@aaj3f aaj3f merged commit 5cf0b20 into main May 4, 2026
13 checks passed
@aaj3f aaj3f deleted the fix/partial-leaflet-issue branch May 4, 2026 20:03
@bplatz bplatz mentioned this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COUNT ?s rdf:type rdfs:Class returns 0 on indexed data when rdfs:Class is mid-leaflet

2 participants