Fix: `rdfs:Class` COUNT Misses by aaj3f · Pull Request #1209 · fluree/db

aaj3f · 2026-04-29T21:03:31Z

Summary

Fixes #1208 — COUNT { ?c rdf:type rdfs:Class } returns 0 on indexed datasets, while the parallel COUNT { ?p rdf:type rdf:Property } and any user-class variant return the correct count. The data is on disk and reachable via FILTER / GROUP BY / non-COUNT shapes; only the COUNT-with-bound-object fast path drops the rows.

The bugs live in count_bound_object_v6 and group_count_v6 (fluree-db-query/src/fast_group_count_firsts.rs) — the fast paths that answer SELECT (COUNT(?s) AS ?n) WHERE { ?s <p> <o> } and SELECT ?o (COUNT(?s) AS ?n) WHERE { ?s <p> ?o } GROUP BY ?o from POST-leaflet directory metadata.

Root cause

POST leaflets store one directory entry per leaflet whose first_key is the leaflet's first row. The fast path walks those entries and decides per-leaflet whether to skip, fast-count, or row-scan. POST sorts by (p_id, o_type, o_key, …). Several compounding errors:

1. Skip-by-first-prefix

Old logic:

if prefix < target_prefix { continue; }
if prefix > target_prefix { break; }
// fall through: prefix == target → fast-count by row_count

A leaflet's rows span [first_prefix, next_leaflet_first_prefix). A target value can sit inside a leaflet whose first row is some smaller value. In the reproducer the very first POST leaflet contains rdf:Property (the first row), then a few rdfs:Class rows, then a run of user-class rows; the loop saw first_prefix < target and continued, then saw the next leaflet's first_prefix > target and broke — never decoding the leaflet that actually held rdfs:Class.

2. Mixed-predicate skip

Old logic:

if entry.p_const != Some(p_id) { continue; }

p_const is None for any leaflet whose rows don't all share a predicate — i.e. boundary leaflets that straddle a p_id change. Those can still contain rows for our predicate, but the check skipped them outright.

3. `prefix_v6_from_entry` strips `p_id`

POST's primary sort key is p_id. The fast path's prefix_v6_from_entry returns just (o_type, o_key), which is a meaningful range bound within a predicate but unrelated to our range when the next leaflet starts a different predicate. With the predicate boundary fix above (point 2), this becomes a real footgun: the next leaflet's (o_type, o_key) for some other predicate could happen to be lexicographically less than the target's, falsely tripping the "leaflet ends before target → skip" condition. In the mixed-predicate regression test this fired exactly: a homogeneous rdf:type leaflet with prefix == target was skipped because the next leaflet started on rdfs:label with a smaller (o_type, o_key).

4. Early break unsafe for mixed leaflets

prefix > target_prefix → break is sound only for homogeneous-predicate leaflets. In a mixed-predicate leaflet the first row may belong to a different predicate whose (o_type, o_key) exceeds the target while later rows in the leaflet (or in subsequent leaflets) still belong to our predicate.

5. Non-cached column projection misses `p_id`

load_v6_batch's no-cache branch built a projection with only OKey (and optionally OType). The per-row p_id check inside mixed-predicate leaflets needs the PId column; without it batch.p_id.get_or(row, 0) returns 0 and every row gets silently rejected. The cached path always loads all columns so production was unaffected, but the no-cache path is reachable from test setups and is a footgun for any future caller.

6. Boundary-equality fast count needs homogeneous gate

The next_prefix == Some(prefix) → += entry.row_count shortcut is only valid when the leaflet is entirely target rows. In a mixed-predicate leaflet that condition can hold while the leaflet still contains rows for other predicates that must be excluded.

Why the asymmetry (rdfs:Class fails, rdf:Property works)

In POST sort order, rdf:type rows are grouped by object IRI (o_key). In the reproducer, rdf:Property happens to be the first row of the leaflet (first_prefix == target), so the existing prefix == target branch catches it. rdfs:Class lives between rdf:Property and the first user class within the same leaflet — first_prefix < target — and gets dropped by error #1 above. Any class that lands as the very first row of a POST leaflet works; any class that lands mid-leaflet silently returns 0.

Why only on indexed/bulk-imported data

count_bound_object_v6 and group_count_v6 only fire when there's a binary index. Memory-only / pure-novelty queries never reach them.

Fix

In count_bound_object_v6 (and the same set of fixes mirrored into group_count_v6):

Mixed-predicate leaflets are no longer skipped on p_const. Replace entry.p_const != Some(p_id) { continue } with: only skip when p_const = Some(other_pid); verify p_id per-row inside mixed leaflets.
Skip a leaflet only when it ends strictly before the target. Compute next_prefix and skip the leaflet only when next_prefix < target_prefix. Otherwise the target could be inside the leaflet — fall through to either the boundary-equality fast count or the row-level scan.
Predicate-qualified next_prefix. Added pid_prefix_v6_from_entry returning (p_id, o_type, o_key). When the next leaflet starts a different predicate, treat next_prefix as None (unknown upper bound) rather than comparing against an unrelated (o_type, o_key) from a different predicate's rows.
Gated early break. if prefix > target_prefix { break; } is now if prefix > target_prefix && entry.p_const == Some(p_id) { break; }.
load_v6_batch non-cached projection includes ColumnId::PId when entry.p_const.is_none(), so the per-row p_id check works regardless of cache state.
Boundary-equality fast count is gated on prefix == target_prefix && next_prefix == Some(target_prefix) && entry.p_const == Some(p_id) — the entire leaflet is target rows for our predicate.

Test plan

Add count_bound_object_first_key_skip_regression (fluree-db-api/tests/it_query_rdfs_class_repro.rs) — bulk-imports a small TTL where multiple distinct classes share a single homogeneous-rdf:type leaflet, then asserts COUNT for rdf:Property, rdfs:Class, and a 3000-instance user class. Exercises bug #1.
Add count_mixed_predicate_leaflet_regression — bulk-imports a tiny TTL designed to land in a single leaflet that straddles rdf:type and rdfs:label, then asserts COUNT { ?c a rdfs:Class } and the parallel SELECT.
All 12 existing tests in fluree-db-api/tests/it_fast_group_count.rs pass unchanged.
All 997 fluree-db-query library unit tests pass.
Verified end-to-end on the originally-reported dataset: COUNT for rdfs:Class, COUNT for rdf:Property, and the FILTER rewrite all agree, and the full-IRI form matches the prefixed form.

Follow-ups (out of scope for this PR)

There are ~16 other call sites in the codebase using the same entry.p_const != Some(p_id) skip pattern (fast_count.rs, fast_path_common.rs, fast_exists_join_count_distinct_object.rs, fast_sum_strlen_group_concat.rs, count_plan_exec.rs, join.rs). They don't all share the prefix-skip variant of the bug, but the mixed-predicate-leaflet under-counting concern likely affects some of them. Worth a focused audit pass — tracking separately.

The V6 fast path in `count_bound_object_v6` walked POST-leaflet directory entries and used each leaflet's first key as a prefix to decide whether the leaflet could contain the target object. Two compounding errors made it skip leaflets that did contain matching rows: 1. When `leaflet.first_prefix < target`, the loop did `continue` — but a leaflet's rows span `[first_prefix, next_leaflet_first_prefix)`, so a target value can sit *inside* a leaflet whose first row is some smaller value. The fix skips a leaflet only when `next_prefix < target_prefix`, i.e. the leaflet ends strictly before the target. 2. `entry.p_const != Some(p_id)` skipped any leaflet without a constant predicate — including mixed-predicate leaflets that contain rows for our predicate. The fix only skips when `p_const = Some(other_pid)`, and verifies `p_id` per row inside mixed leaflets. The boundary-equality fast count (using `entry.row_count` as the count for the entire leaflet) is now gated on both ends of the leaflet equal to the target — meaning the whole leaflet is target rows, not just the first row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ast paths Follow-up to the initial `count_bound_object_v6` fix. Code review surfaced three latent issues introduced by that change plus a parallel bug in `group_count_v6`. A new mixed-predicate regression test caught a fourth. In `fast_group_count_firsts.rs`: - `load_v6_batch` non-cached projection now includes `ColumnId::PId` when `entry.p_const.is_none()`. Without it, the per-row `p_id` check inside mixed-predicate leaflets reads `ColumnData::AbsentDefault` and silently rejects every row whenever the leaflet cache is unconfigured. - `if prefix > target_prefix { break; }` is now gated on `entry.p_const == Some(p_id)`. In a mixed-predicate leaflet the first row's `(o_type, o_key)` may belong to a different predicate and exceed `target_prefix` while later rows in the leaflet still match. - Added `pid_prefix_v6_from_entry` returning `(p_id, o_type, o_key)`. POST sorts by `(p_id, o_type, o_key, …)`, so a stripped `(o_type, o_key)` bound is only meaningful within a single predicate. When the next leaflet starts a different predicate, treat next_prefix as `None` (unknown upper bound) — fall through to row-level scan. - The same set of fixes applied to `group_count_v6`, which had the parallel `entry.p_const != Some(p_id)` skip, missing per-row `p_id` filter, and unguarded boundary-equality fast count. - Boundary-equality fast count (`next_prefix == Some(prefix) → += entry.row_count`) now gated on `entry.p_const == Some(p_id)` in both functions — a mixed-predicate leaflet may share its first `(o_type, o_key)` with the next leaflet but still contain rows for *other* predicates. - Half-open interval comments unified on `[prefix, next_prefix)` with the spillover note. Test cleanup (`fluree-db-api/tests/it_query_rdfs_class_repro.rs`): - Removed `#[ignore]`-d diagnostic tests that depended on a local dataset outside the repo; trimmed remaining tests to the documented A–G + JSON-LD shapes with generic `ex:ClassA`…`E` synthetic ontology. - Added `count_mixed_predicate_leaflet_regression` exercising the `p_const = None` branch (a leaflet straddling `rdf:type` and `rdfs:label`). This test surfaced the predicate-qualified-prefix bug fixed above. - Renamed the prior regression test to `count_bound_object_first_key_skip_regression` for clarity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bplatz

Approving with one small consistency fix requested inline.

Add `&& entry.p_const == Some(p_id)` to the boundary-equality fast-count gate in `count_bound_object_v6`, matching the existing gate in `group_count_v6` and the PR description. After the upstream `entry.p_const != Some(other_pid)` filter, `p_const` is either `Some(p_id)` (homogeneous) or `None` (mixed-predicate). In the mixed case, `prefix` is the leaflet's first row's `(o_type, o_key)` regardless of that row's predicate. If a mixed-predicate leaflet's first row happens to land at `target_prefix` for some _other_ predicate and the next leaflet also starts at `target_prefix` for our predicate, the previous gate would add the entire `row_count` — including rows for the other predicate — instead of falling through to the per-row scan. The per-row scan path already verifies `p_id` per row when `p_const.is_none()`, so falling through is correct. Narrow scenario in practice, but closes a real correctness gap and restores symmetry with `group_count_v6`.

aaj3f force-pushed the fix/partial-leaflet-issue branch from c0e1285 to a2515d3 Compare April 30, 2026 15:05

aaj3f and others added 2 commits April 30, 2026 11:14

aaj3f force-pushed the fix/partial-leaflet-issue branch from a2515d3 to 1172bca Compare April 30, 2026 15:14

aaj3f marked this pull request as ready for review April 30, 2026 15:17

aaj3f requested review from bplatz and zonotope April 30, 2026 15:17

bplatz approved these changes Apr 30, 2026

View reviewed changes

Comment thread fluree-db-query/src/fast_group_count_firsts.rs Outdated

aaj3f merged commit 5cf0b20 into main May 4, 2026
13 checks passed

aaj3f deleted the fix/partial-leaflet-issue branch May 4, 2026 20:03

bplatz mentioned this pull request May 21, 2026

release v4.0.4 #1250

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: `rdfs:Class` COUNT Misses#1209

Fix: `rdfs:Class` COUNT Misses#1209
aaj3f merged 3 commits into
mainfrom
fix/partial-leaflet-issue

aaj3f commented Apr 29, 2026 •

edited

Loading

Uh oh!

bplatz left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aaj3f commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

1. Skip-by-first-prefix

2. Mixed-predicate skip

3. prefix_v6_from_entry strips p_id

4. Early break unsafe for mixed leaflets

5. Non-cached column projection misses p_id

6. Boundary-equality fast count needs homogeneous gate

Why the asymmetry (rdfs:Class fails, rdf:Property works)

Why only on indexed/bulk-imported data

Fix

Test plan

Follow-ups (out of scope for this PR)

Uh oh!

bplatz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aaj3f commented Apr 29, 2026 •

edited

Loading

3. `prefix_v6_from_entry` strips `p_id`

5. Non-cached column projection misses `p_id`