Skip to content

perf(sql): apply parallel top-K to queries with SELECT projections#6993

Merged
bluestreak01 merged 4 commits into
questdb:masterfrom
DHRUV6029:fix/issue-6528-topk-projection
May 12, 2026
Merged

perf(sql): apply parallel top-K to queries with SELECT projections#6993
bluestreak01 merged 4 commits into
questdb:masterfrom
DHRUV6029:fix/issue-6528-topk-projection

Conversation

@DHRUV6029
Copy link
Copy Markdown
Contributor

@DHRUV6029 DHRUV6029 commented Apr 19, 2026

Fixes #6528

Summary

Extend the parallel top-K gate in SqlCodeGenerator#generateOrderBy so it also fires when a column-projection wrapper (SelectedRecordCursorFactory or VirtualRecordCursorFactory) sits between the ORDER BY ... LIMIT N and the filtered page-frame scan. Before this change, any non-literal SELECT list combined with a WHERE clause forced the query onto the generic Sort light path, because the projection wrapper hid the steal-able filter from the gate.

Problem

Reproduction from the issue:

CREATE TABLE tab AS (
    SELECT x, x::timestamp AS ts, x::timestamp AS ts2
    FROM long_sequence(1_000)
) TIMESTAMP (ts) PARTITION BY HOUR WAL;
Query Plan before Plan after
SELECT * FROM tab WHERE ts2 IN '2025' ORDER BY ts2 DESC LIMIT 10 Async JIT Top K Async JIT Top K
SELECT x FROM tab WHERE ts2 IN '2025' ORDER BY ts2 DESC LIMIT 10 Async JIT Top K Async JIT Top K
SELECT x, * FROM tab WHERE ts2 IN '2025' ORDER BY ts2 DESC LIMIT 10 Sort light Async JIT Top K
SELECT x + 1 AS xp, ts2 FROM tab WHERE ts2 IN '2025' ORDER BY ts2 DESC LIMIT 10 Sort light Async JIT Top K

The first two cases already hit the top-K gate because the generated plan is AsyncJitFilter -> PageFrame with no wrapper in between. The third and fourth cases generate an extra SelectedRecord (duplicate columns) or VirtualRecord (computed column) above the filter, and the gate's supportsFilterStealing() / supportsPageFrameCursor() checks look only at the immediate child, so they return false and fall through to Sort light. The result set is correct, but the query materializes and sorts every matching row instead of keeping a bounded heap of N.

Approach

The codebase already has a projection-peel precedent in generateAsOfJoin and generateLatestBy: when a projection wrapper hides a useful optimization target, peel the wrapper, apply the optimization to the inner factory, and re-wrap.

This PR collapses the direct page-frame, direct filter-stealing, and projection-peel shapes into a single unified branch in generateOrderBy, driven by two default methods on RecordCursorFactory:

  • translateOrderByColumnToBase(int projectedIndex) — returns the corresponding column index in the base metadata, or a negative value when the projected column cannot be resolved (e.g., a computed VirtualRecord column). The default is identity; SelectedRecordCursorFactory and VirtualRecordCursorFactory override it to chain through their base, so nested wrappers resolve transparently.
  • rewrapOverTopK(RecordCursorFactory topK, RecordMetadata orderedMetadata) — re-wraps a freshly-built top-K factory so the output shape is preserved. The default is a pass-through. Projection wrappers override to re-create themselves over the new base.

A cheap structural predicate canReachPageFrameLeafForTopK decides whether the gate should attempt the peel at all. When it holds, a shared helper buildAsyncTopKOverStolenFilter steals the filter state from the inner filter factory via halfClose(), builds AsyncTopKRecordCursorFactory on the page-frame leaf, and the call site re-wraps via the projection wrapper's override. Ownership of the half-closed inner base transfers to the new top-K factory, matching the AsOf/LatestBy peel precedent.

The predicate and the call site peel a single projection layer. Two or more stacked projection wrappers above the filter (rare today) fall through to Sort light rather than recursing.

Incidental fix: AsOf join peel guards (keyed and non-keyed)

Widening isProjection() to also return true on VirtualRecordCursorFactory is required by the unified branch above. That widening incidentally made two existing branches in SqlCodeGenerator#generateJoinAsof reachable for VirtualRecord slaves: the keyed peel at line 4369 and the non-keyed peel at line 4502. Both peel a slave projection into a FilteredAsOfJoin*FastRecordCursorFactory and require a non-null column cross index — only SelectedRecord provides one. Before this PR, isProjection() was effectively equivalent to "is SelectedRecord", so the branches relied on that implicit invariant.

The keyed branch trips assert stolenCrossIndex != null. The non-keyed branch is worse: with assertions disabled, FilteredAsOfJoinNoKeyFastRecordCursorFactory treats a null cross index as "no projection remap" and would silently drop the VirtualRecord layer's column translation, producing wrong results.

This PR adds an explicit getColumnCrossIndex() != null guard at both call sites so VirtualRecord slaves fall through to whichever later branch catches them — the same path they took before the widening. The fix is a single extra condition per branch; the AsOf branches' logic is otherwise untouched and the deferred AsOf refactor remains a follow-up (as noted during review). Two regression tests pin the fix: AsOfJoinTest.testAsOfJoinWithTopDownFilteredDateaddTimestampProjection covers the keyed path, and AsOfJoinTest.testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection covers the non-keyed path with row-level assertions, so a regression surfaces both with and without -ea.

Alternatives considered

  • Teach AsyncTopKRecordCursorFactory to accept a projection wrapper directly. Rejected: it would spread projection concerns into the execution engine and duplicate the column-translation logic that SelectedRecordCursorFactory already owns.
  • Push the projection below the filter at parse time. Rejected: it changes semantics for any projection whose computed columns reference the filter's bound variables, and it would interact poorly with isProjection() consumers elsewhere in the generator.
  • Extend the gate to cover ORDER BY on computed (non-passthrough) columns. Rejected for this PR: RecordComparatorCompiler operates on the base metadata, and lifting computed keys into it is a larger, separate change. The passthrough-only constraint keeps row-level semantics identical to the non-peeled plan.
  • Three separate disjuncts (the original attempt before the refactor). Rejected: duplicated filter-stealing code across branches. The unified branch with wrapper-side defaults is smaller, uniformly testable, and lets nested wrappers compose.
  • Revert VirtualRecordCursorFactory.isProjection() to false and keep the AsOf branch untouched. Rejected: it would force the top-K gate back to instanceof VirtualRecordCursorFactory checks, diverging from the unified-branch design.

Changes

  • core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java — two new default methods: translateOrderByColumnToBase and rewrapOverTopK.
  • core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java — overrides both defaults; translateOrderByColumnToBase guards bounds and chains through base; rewrapOverTopK recursively rewraps base first, then wraps over the result.
  • core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java — overrides isProjection() to return true; overrides both defaults. translateOrderByColumnToBase returns a negative value when the projected column is a computed expression (ColumnFunction + PriorityMetadata.getBaseColumnIndex() check), making the gate fall back to Sort light transparently.
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java — unified top-K branch in generateOrderBy with single-pass key translation (collapses the two former passes over listColumnFilterA); a canReachPageFrameLeafForTopK predicate; a shared buildAsyncTopKOverStolenFilter helper with a leak guard around compileWorkerFiltersConditionally. Also narrows the slave-projection peel guards in generateJoinAsof (lines 4369 and 4502) with an explicit getColumnCrossIndex() != null check (see Incidental fix above).
  • core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java — plan-shape tests for single-key and multi-key ORDER BY through both wrappers, plus negative cases that pin the Sort light fallback when guards don't hold.
  • core/src/test/java/io/questdb/test/griffin/LimitTest.java — row-level correctness tests through the full WAL lifecycle. ts2 uses a reversed sequence ((1_001 - x)::timestamp) so a regression mistranslating ORDER BY ts2 to the designated timestamp ts surfaces as a row-order diff. The virtual-projection test runs five iterations to catch resource-cleanup regressions.
  • core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java — updates the two testJoinAndUnionQueryWithJoinOnDesignatedTimestampColumnWithLastFunction plan assertions to the new top-K shape: SelectedRecord now sits above Async Top K, and the key references the base column (ts) rather than the projected alias (LAST). Behavior is unchanged; only the plan arrangement reflects the unified-branch design.
  • core/src/test/java/io/questdb/test/griffin/engine/join/AsOfJoinTest.java — adds testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection, the non-keyed analog of the existing keyed regression test. Asserts row-level results so a regression in the line-4502 guard surfaces both with -ea (assertion error) and without (silent wrong results).

Tradeoffs and limitations

  • When the gate fires above a projection wrapper, queries that were already fast on Sort light (small result set after filtering) pay the parallel top-K worker-queue setup cost. The break-even point is dataset-dependent.
  • ORDER BY on a computed column still uses Sort light. Lifting computed keys into RecordComparatorCompiler is a larger follow-up.
  • The predicate and call site peel only a single projection layer. Nested projection wrappers above the filter fall back to Sort light.
  • Widening isProjection() to cover VirtualRecord required guards in generateJoinAsof's slave-projection peels at lines 4369 and 4502 (see Incidental fix). No functional divergence for pre-PR inputs: factories that used to enter those branches still enter them; factories that used to skip them still skip them.

Test plan

  • mvn -Dtest=ExplainPlanTest test passes, including new plan-shape tests for single-key, multi-key, chained-wrapper, and negative-fallback cases.
  • mvn -Dtest=LimitTest test passes, including the top-K projection tests that exercise drainWalQueue() and assert row ordering with the reversed ts2 sequence.
  • mvn -Dtest=ParallelTopKFuzzTest test passes.
  • mvn -Dtest=OrderByAdviceTest,OrderByWithFilterTest,OrderByWithAsyncFilterTest,OrderByExpressionTest,OrderByWithIntervalFilterTest test passes unchanged.
  • mvn -Dtest=SqlOptimiserTest#testJoinAndUnionQueryWithJoinOnDesignatedTimestampColumnWithLastFunction,SqlOptimiserTest#testQueryPlanForJoinAndUnionQueryWithJoinOnDesignatedTimestampColumnWithLastFunction test passes with the updated plan assertions.
  • mvn -Dtest=AsOfJoinTest#testAsOfJoinWithTopDownFilteredDateaddTimestampProjection test passes with the keyed slave-peel guard (line 4369).
  • mvn -Dtest=AsOfJoinTest#testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection test passes with the non-keyed slave-peel guard (line 4502); row-level assertions catch both the -ea assertion path and the production silent-wrong-result path.
  • Manual verification against the issue reproduction: all four queries either keep their prior Async JIT Top K plan or move from Sort light to Async JIT Top K.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5319640f-2415-42d8-b7fb-048f2a74685b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Peels projection wrappers to reach the base page-frame leaf, optionally steals compiled filter/bind state to build an Async TopK path, translates ORDER BY keys to base coordinates, and rewraps TopK under projection wrappers when applicable. Adds explain-plan and runtime tests covering these cases.

Changes

Cohort / File(s) Summary
SqlCodeGenerator (TopK + filter stealing)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
Added canReachPageFrameLeafForTopK(...) and buildAsyncTopKOverStolenFilter(...); refactored generateOrderBy(...) to peel SelectedRecordCursorFactory/VirtualRecordCursorFactory, translate ORDER BY keys to base indexes, optionally steal compiled filter/bind state and call halfClose(), construct AsyncTopKRecordCursorFactory, and rewrap TopK under projection wrappers when present.
RecordCursorFactory API
core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java
Added default methods rewrapOverTopK(RecordCursorFactory, RecordMetadata) and translateOrderByColumnToBase(int) to allow wrappers to remap ORDER BY columns and rewrap TopK results.
Projection wrappers
core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java, core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java
Implemented rewrapOverTopK(...) to rebuild wrapper shape over a provided TopK factory and implemented translateOrderByColumnToBase(...) to map projected ORDER BY indices to underlying/base indexes (returning negative when unresolvable).
Explain plan tests
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
Added tests (testSelectWhereOrderByLimit4testSelectWhereOrderByLimit9, including chained-wrapper cases) asserting planner selects Async TopK/Async JIT TopK when reachable and falls back to Sort light when ORDER BY targets virtual/computed columns or LIMIT has offset.
Runtime limit/TopK tests
core/src/test/java/io/questdb/test/griffin/LimitTest.java
Added testTopKThroughProjection() and testTopKThroughVirtualProjection() to validate TopK execution through projection/virtual projection and to check for no cross-execution state leakage.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • bluestreak01
  • glasstiger
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: applying parallel top-K optimization to queries with SELECT projections to fix a performance issue where projections previously disabled the optimization.
Linked Issues check ✅ Passed The PR addresses #6528 by implementing SelectedRecord and VirtualRecord disjuncts to restore top-K optimization when projections wrap filters, enabling passthrough column ORDER BY keys and falling back to Sort light when required ORDER BY indices cannot be resolved.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #6528: SQL code generation for top-K optimization through projections, along with targeted test coverage verifying plan shapes and result correctness; no unrelated modifications detected.
Description check ✅ Passed The pull request description is directly related to the changeset, clearly documenting the problem, approach, changes made, and test plan for extending the parallel top-K optimization to queries with SELECT projections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DHRUV6029 DHRUV6029 closed this Apr 19, 2026
@DHRUV6029 DHRUV6029 reopened this Apr 19, 2026
@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch 2 times, most recently from c6c1548 to 436e431 Compare April 19, 2026 23:22
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)

6697-6715: Rename the boolean to use the required prefix.

allBaseColumns should use an is.../has... boolean name, e.g. hasOnlyBaseColumns.

♻️ Proposed rename
-                                        boolean allBaseColumns = true;
+                                        boolean hasOnlyBaseColumns = true;
                                         final IntList orderByBaseIndices = new IntList(listColumnFilterA.size());
                                         for (int i = 0, n = listColumnFilterA.size(); i < n; i++) {
                                             int signed = listColumnFilterA.getQuick(i);
                                             int wrappedIdx = (signed > 0 ? signed : -signed) - 1;
                                             Function fn = wrappedFunctions.getQuick(wrappedIdx);
                                             if (!(fn instanceof ColumnFunction columnFn)) {
-                                                allBaseColumns = false;
+                                                hasOnlyBaseColumns = false;
                                                 break;
                                             }
                                             int baseIdx = priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex());
                                             if (baseIdx < 0) {
-                                                allBaseColumns = false;
+                                                hasOnlyBaseColumns = false;
                                                 break;
                                             }
                                             orderByBaseIndices.add(baseIdx);
                                         }
 
-                                        if (allBaseColumns) {
+                                        if (hasOnlyBaseColumns) {

As per coding guidelines, "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` around lines
6697 - 6715, Rename the boolean local variable allBaseColumns to a prefixed
boolean name (e.g., hasOnlyBaseColumns or isOnlyBaseColumns) and update all
local references in this block accordingly: the declaration and the two
assignments where it is set to false, plus the if check "if (allBaseColumns)".
This change should be made in the loop that iterates over listColumnFilterA,
uses wrappedFunctions and ColumnFunction, computes baseIdx via
priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex()), and later checks
orderByBaseIndices; keep all logic identical, only rename the symbol
consistently within this scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 6743-6749: The virtual projection rewrap is using
virtualFactory.getMetadata() which loses the adjusted ordered metadata used for
ORDER BY on a passthrough timestamp column; update the
VirtualRecordCursorFactory construction to pass the preserved orderedMetadata
(the same metadata used on the SelectedRecord path) instead of
virtualFactory.getMetadata(), so the timestamp-index adjustment remains
available to downstream/nested operators.

In `@core/src/test/java/io/questdb/test/griffin/LimitTest.java`:
- Around line 1405-1407: The tests in LimitTest are using assertSql(...) which
only validates rows and misses cursor factory/leak properties; replace the two
assertSql(...) calls (and the similar ones at lines referencing the same test
block) with assertQueryNoLeakCheck(...) calls so the assertions also verify
cursor factory/TopK optimization behavior — specifically change the assertSql
invocations in LimitTest (the calls supplying the SQL like "select x, * from tab
where ts2 in '1970' order by ts2 desc limit 5" and "select x, * from tab where
ts2 in '2099' order by ts2 desc limit 5", plus the related group at 1428-1433)
to use assertQueryNoLeakCheck(...) with the same expected output and SQL
arguments.

---

Nitpick comments:
In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 6697-6715: Rename the boolean local variable allBaseColumns to a
prefixed boolean name (e.g., hasOnlyBaseColumns or isOnlyBaseColumns) and update
all local references in this block accordingly: the declaration and the two
assignments where it is set to false, plus the if check "if (allBaseColumns)".
This change should be made in the loop that iterates over listColumnFilterA,
uses wrappedFunctions and ColumnFunction, computes baseIdx via
priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex()), and later checks
orderByBaseIndices; keep all logic identical, only rename the symbol
consistently within this scope.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 04db8ff0-71c6-4c29-a902-c462c80a905b

📥 Commits

Reviewing files that changed from the base of the PR and between 8d7f800 and 436e431.

📒 Files selected for processing (3)
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
  • core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
  • core/src/test/java/io/questdb/test/griffin/LimitTest.java

Comment thread core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java Outdated
Comment thread core/src/test/java/io/questdb/test/griffin/LimitTest.java Outdated
@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from d56d79f to afcf38e Compare April 20, 2026 00:09
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@puzpuzpuz @bluestreak01 would you mind taking a look when you get a chance? Thanks!

@DHRUV6029 DHRUV6029 marked this pull request as ready for review April 20, 2026 00:42
@puzpuzpuz puzpuzpuz changed the title fix(sql): apply parallel top-K through SELECT projections perf(sql): apply parallel top-K to queries with SELECT projections Apr 20, 2026
@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Apr 20, 2026
@puzpuzpuz
Copy link
Copy Markdown
Contributor

puzpuzpuz commented Apr 20, 2026

There seems to be a slightly simpler approach. Could you take a look?

PR #6993 — Simpler approach

A proposal to collapse the three near-identical filter-stealing branches in SqlCodeGenerator.generateOrderBy into one, by pushing the projection-specific ORDER BY translation onto the wrapper factories.

Problem with the current PR

The PR adds two new disjuncts (SelectedRecord peel, VirtualRecord peel) next to the existing direct filter-stealing branch. All three do the same high-level work:

  1. halfClose() a filter-stealable factory.
  2. Take ownership of compiledFilter / filter / bindVarMemory /
    bindVarFunctions.
  3. Build AsyncTopKRecordCursorFactory on the page-frame leaf.
  4. Optionally re-wrap the result with the projection wrapper.

The only thing that varies between the three branches is how an ORDER BY column index expressed in the outer metadata is translated to the column index in the base (page-frame) metadata:

Branch Translation
Direct (existing) identity (projected == base)
SelectedRecord peel columnCrossIndex.getQuick(projectedIdx)
VirtualRecord peel priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex()) with a passthrough-only bail

Because that one varying piece is inlined into each branch, each branch carries its own copy of the steal/build/re-wrap scaffold. The author's own tradeoffs section acknowledges this:

The shared helper takes the already-translated ListColumnFilter and
first-key index as inputs, so the two disjuncts still carry their own
translation logic inline.

The result: the VirtualRecord rewrap initially passed virtualFactory.getMetadata() instead of orderedMetadata (caught by CodeRabbit in afcf38e). The SelectedRecord branch got it right. That kind of drift is exactly what duplication invites.

Proposed shape

Push the translation onto the wrapper factory, and have one top-K branch consume it. The default on RecordCursorFactory is identity; the two wrapper types override it.

1. A translation contract on the factory

// RecordCursorFactory.java

/**
 * Translates an ORDER BY column index expressed in this factory's output
 * metadata to the corresponding column index in the base (page-frame) metadata.
 *
 * Returns the input unchanged for factories that do not re-arrange or hide
 * base columns. Returns a negative value if the projected column cannot be
 * resolved to a base column (e.g. a computed VirtualRecord column); callers
 * must fall back to the generic sort path in that case.
 */
default int translateOrderByColumnToBase(int projectedIndex) {
    return projectedIndex;
}

2. Wrapper overrides

// SelectedRecordCursorFactory.java

@Override
public int translateOrderByColumnToBase(int projectedIndex) {
    return columnCrossIndex.getQuick(projectedIndex);
}
// VirtualRecordCursorFactory.java

@Override
public int translateOrderByColumnToBase(int projectedIndex) {
    Function fn = functions.getQuick(projectedIndex);
    if (!(fn instanceof ColumnFunction columnFn)) {
        return -1;
    }
    return priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex());
}

3. One top-K branch in generateOrderBy

The three current branches collapse into one. Pseudocode (not final — shows the shape, not every edge case already handled in the method):

// Single top-K branch. Handles:
//   - direct page-frame or direct filter-stealing factory (identity translation)
//   - projection wrapper (SelectedRecord or VirtualRecord) around a filter-stealing factory
if (parallelTopKEnabled && canReachPageFrameLeafForTopK(recordCursorFactory)) {
    final RecordCursorFactory projectionWrapper = recordCursorFactory.isProjection()
            || recordCursorFactory instanceof VirtualRecordCursorFactory
            ? recordCursorFactory
            : null;
    final RecordCursorFactory filterFactory = projectionWrapper != null
            ? projectionWrapper.getBaseFactory()
            : recordCursorFactory;
    final RecordCursorFactory pageFrameLeaf = filterFactory.supportsFilterStealing()
            ? filterFactory.getBaseFactory()
            : filterFactory;
    final RecordMetadata baseMetadata = pageFrameLeaf.getMetadata();

    // Translate every ORDER BY key, bail if any cannot be resolved.
    final ListColumnFilter baseOrderByFilter = new ListColumnFilter();
    int baseFirstOrderByIdx = -1;
    boolean allKeysResolved = true;
    for (int i = 0, n = listColumnFilterA.size(); i < n; i++) {
        int signed = listColumnFilterA.getQuick(i);
        int projectedIdx = (signed > 0 ? signed : -signed) - 1;
        int baseIdx = recordCursorFactory.translateOrderByColumnToBase(projectedIdx);
        if (baseIdx < 0) {
            allKeysResolved = false;
            break;
        }
        baseOrderByFilter.add(signed > 0 ? (baseIdx + 1) : -(baseIdx + 1));
        if (i == 0) {
            baseFirstOrderByIdx = baseIdx;
        }
    }

    if (allKeysResolved) {
        IQueryModel.restoreWhereClause(expressionNodePool, model);

        final RecordCursorFactory topK = buildAsyncTopKOverStolenFilter(
                executionContext,
                filterFactory,
                pageFrameLeaf,
                baseMetadata,
                baseOrderByFilter,
                baseFirstOrderByIdx,
                lo
        );

        return projectionWrapper == null
                ? topK
                : projectionWrapper.rewrapOverTopK(topK, orderedMetadata);
    }
}

rewrapOverTopK is the small amount of rewrap logic that does differ per wrapper type (one re-creates a SelectedRecordCursorFactory, the other a VirtualRecordCursorFactory reusing priorityMetadata + functions). It is the only projection-specific piece left; everything else is one path.

canReachPageFrameLeafForTopK is a cheap structural predicate: the outer factory itself is a filter-stealing factory or a page-frame factory, or is a projection wrapper over one. It replaces the per-branch guard conditions.

4. buildAsyncTopKOverStolenFilter stays

The helper introduced by the PR stays — with one signature change so callers pass a filterFactory (which may be the outer factory itself in the direct case, or the projection base in the peel case). When filterFactory.supportsFilterStealing() is false (pure page-frame case, no filter to steal), the helper short-circuits and calls AsyncTopKRecordCursorFactory with null filter state. That already matches what the existing direct branch does at line 6554 when the factory supports page frames but not filter stealing.

What this fixes

Duplication goes away

Three copies of "steal filter state and build top-K" become one. The only code that varies per wrapper is the translation (4 lines) and the rewrap (one factory constructor call). Both live on the wrapper type that understands
them.

The orderedMetadata class of bug becomes unrepresentable

The current PR's VirtualRecord rewrap was shipped with virtualFactory.getMetadata() and had to be fixed in a follow-up commit after CodeRabbit flagged it. In the proposed shape, rewrapOverTopK is called once from the top-K branch with a single orderedMetadata argument, so both wrapper implementations receive the same value and cannot disagree.

Extending to a new wrapper is one method, not one branch

If a future wrapper type wants to participate in top-K stealing, the change is to implement translateOrderByColumnToBase and rewrapOverTopK on that type. No new disjunct in generateOrderBy, no new copy of the steal/build scaffold.

Migration notes

  • The AsOf precedent at lines 4264-4308 / 4396-4430 has the same two-disjunct shape (SelectedRecord peel + direct filter stealing, no VirtualRecord). The same refactor applies there. Not required for this PR — file as a follow-up
    so the blast radius stays contained.
  • Keep the existing tests unchanged. They pin plan shapes; the proposed refactor produces the same plans, so the same assertions hold.
  • The test gap noted in the review (SelectedRecord -> VirtualRecord -> filter -> pageframe at a single model level) should be added regardless of which approach lands. In the proposed shape it is a natural case because translateOrderByColumnToBase can be chained through a nested wrapper by delegating to the base factory's own implementation — something the current PR cannot do without a third disjunct.

Scope

  • Touches RecordCursorFactory (one default method), SelectedRecordCursorFactory and VirtualRecordCursorFactory (one override each, plus a small rewrapOverTopK helper), and the top-K branch in
    SqlCodeGenerator.generateOrderBy (replaces three disjuncts with one).
  • Net line count: fewer than the current PR.
  • Keeps buildAsyncTopKOverStolenFilter introduced by the PR; generalises its caller set from two to one.

@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from afcf38e to c84eea5 Compare April 22, 2026 06:09
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)

1008-1008: Use is.../has... prefixes for new boolean names.

canReachPageFrameLeafForTopK and allKeysResolved are new boolean names but do not follow the repository convention.

Proposed rename
-    private static boolean canReachPageFrameLeafForTopK(RecordCursorFactory f) {
+    private static boolean hasReachablePageFrameLeafForTopK(RecordCursorFactory f) {
@@
-                                if (parallelTopKEnabled && canReachPageFrameLeafForTopK(recordCursorFactory)) {
+                                if (parallelTopKEnabled && hasReachablePageFrameLeafForTopK(recordCursorFactory)) {
@@
-                                        boolean allKeysResolved = true;
+                                        boolean hasResolvedAllKeys = true;
@@
-                                                allKeysResolved = false;
+                                                hasResolvedAllKeys = false;
@@
-                                        if (allKeysResolved) {
+                                        if (hasResolvedAllKeys) {

As per coding guidelines, "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate."

Also applies to: 6625-6654

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` at line 1008,
Rename boolean identifiers to use is/has prefixes: change the method
canReachPageFrameLeafForTopK to isPageFrameLeafReachableForTopK (or similar
is-prefixed name) and rename the boolean allKeysResolved to areAllKeysResolved
or hasAllKeysResolved; update all call sites, tests, JavaDoc/comments and
imports to use the new names, preserve method signature/visibility, and run a
compile to catch any missed references (also apply the same renaming pattern for
the related occurrences in the 6625-6654 region).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java`:
- Line 27: Change the rewrapOverTopK API to accept the metadata interface and
preserve existing wrapper chains: remove the GenericRecordMetadata import,
change RecordCursorFactory.rewrapOverTopK parameter type from
GenericRecordMetadata to RecordMetadata, and modify
SelectedRecordCursorFactory.rewrapOverTopK() and
VirtualRecordCursorFactory.rewrapOverTopK() so they first call
base.rewrapOverTopK(topK, base.getMetadata()) (i.e., recursively rewrap the
base) before applying their own wrapping logic to retain inner wrapper column
mappings/functions.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 1450-1484: Wrap the block that calls
compileWorkerFiltersConditionally(...) and constructs
AsyncTopKRecordCursorFactory in a try/catch/finally so that if an exception
occurs while evaluating constructor arguments you explicitly release the stolen
state you took ownership of after filterFactory.halfClose(): if
stolenCompiledFilter, stolenBindVarMemory, stolenBindVarFunctions, stolenFilter
or stolenFilterUsedColumnIndexes are non-null, call their appropriate cleanup
methods (close()/free()/clear() as applicable), set them to null, and also
release baseOrderedMetadata (call its release/close method) before rethrowing;
ensure the final created AsyncTopKRecordCursorFactory still receives valid
references when no exception occurs.

---

Nitpick comments:
In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Line 1008: Rename boolean identifiers to use is/has prefixes: change the
method canReachPageFrameLeafForTopK to isPageFrameLeafReachableForTopK (or
similar is-prefixed name) and rename the boolean allKeysResolved to
areAllKeysResolved or hasAllKeysResolved; update all call sites, tests,
JavaDoc/comments and imports to use the new names, preserve method
signature/visibility, and run a compile to catch any missed references (also
apply the same renaming pattern for the related occurrences in the 6625-6654
region).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c4dd53df-f910-4c00-a8d7-97e3227a0629

📥 Commits

Reviewing files that changed from the base of the PR and between afcf38e and c84eea5.

📒 Files selected for processing (6)
  • core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
  • core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java
  • core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java
  • core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
  • core/src/test/java/io/questdb/test/griffin/LimitTest.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/src/test/java/io/questdb/test/griffin/LimitTest.java
  • core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java

Comment thread core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java Outdated
Comment thread core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch 2 times, most recently from e00b8b1 to 34ca865 Compare April 23, 2026 03:21
@DHRUV6029 DHRUV6029 marked this pull request as draft April 23, 2026 04:22
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@coderabbitai review this PR critically

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at the key files changed in this PR

# Check RecordCursorFactory new default methods
rg -n "translateOrderByColumnToBase|rewrapOverTopK|canReachPageFrameLeafForTopK|buildAsyncTopKOverStolenFilter" --type=java -A5 -B2

# Check VirtualRecordCursorFactory implementation
fd -e java -g "VirtualRecordCursorFactory.java" --exec grep -n "translateOrderByColumnToBase\|rewrapOverTopK\|ColumnFunction\|getBaseColumnIndex" {} \;

# Check SelectedRecordCursorFactory implementation
fd -e java -g "SelectedRecordCursorFactory.java" --exec grep -n "translateOrderByColumnToBase\|rewrapOverTopK\|columnCrossIndex" {} \;

Length of output: 18508


🏁 Script executed:

#!/bin/bash
# Deep dive into the critical parts of SqlCodeGenerator changes

# Look at the full canReachPageFrameLeafForTopK implementation
rg -n "canReachPageFrameLeafForTopK" --type=java -A 25 -B 2 core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

# Look at the full generateOrderBy top-K branch (around line 6634)
sed -n '6600,6720p' core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

# Look at buildAsyncTopKOverStolenFilter
sed -n '1420,1530p' core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

Length of output: 17664


🏁 Script executed:

#!/bin/bash
# Check VirtualRecordCursorFactory's page-frame and filter-stealing support
rg -n "supportsPageFrameCursor\|supportsFilterStealing\|isProjection\|getBaseFactory" --type=java \
    core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java

# Look at the chained wrapper test
sed -n '8835,8920p' core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java

# Check what structure canReachPageFrameLeafForTopK handles for chained wrappers
# Also check if SelectedRecord isProjection() returns true
rg -n "isProjection\|getBaseFactory" --type=java \
    core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java | head -30

# Check the VirtualRecordCursorFactory rewrapOverTopK to understand what it does with base.rewrapOverTopK
sed -n '160,200p' core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java

# Check what happens with the `nested` variable (line 6668) that is asserted non-null but never used
sed -n '6660,6690p' core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

Length of output: 7165


🏁 Script executed:

#!/bin/bash
# Check VirtualRecordCursorFactory for filter-stealing delegation and getBaseFactory
grep -n "supportsFilterStealing\|getBaseFactory\|supportsPageFrameCursor" \
    core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java

# Check if SelectedRecordCursorFactory also delegates supportsFilterStealing
grep -n "supportsFilterStealing\|getBaseFactory\|supportsPageFrameCursor" \
    core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java

# Check what AsyncJITFilterRecordCursorFactory returns for supportsPageFrameCursor
fd -e java -g "AsyncJit*" --exec grep -n "supportsPageFrameCursor\|supportsFilterStealing\|getBaseFactory" {} \;
fd -e java -g "AsyncFilter*" --exec grep -n "supportsPageFrameCursor\|supportsFilterStealing\|getBaseFactory" {} \;

# Get the actual base class for filter-stealing factories
rg -n "supportsFilterStealing" --type=java -B2 -A5 | grep -v "test\|Test" | head -80

Length of output: 10125


🏁 Script executed:

#!/bin/bash
# Check if VirtualRecordCursorFactory overrides supportsFilterStealing or supportsPageFrameCursor
grep -n "override\|supportsFilterStealing\|supportsPageFrameCursor\|isProjection" \
    core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java

# Check if the `nested` variable (line 6668) is actually used anywhere after the assert
sed -n '6665,6685p' core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

# Check for resource leak - is projectionWrapper explicitly closed after rewrapOverTopK?
sed -n '6636,6690p' core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

# Check if SelectedRecord or VirtualRecord isProjection() is overridden
rg -n "isProjection" --type=java \
    core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java \
    core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java

# Check whether canReachPageFrameLeafForTopK can ever see SelectedRecord wrapping VirtualRecord
# by verifying if the optimizer ever produces such a plan shape
rg -n "SelectedRecord.*VirtualRecord\|VirtualRecord.*SelectedRecord" --type=java \
    core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java | head -20

Length of output: 5325


🏁 Script executed:

#!/bin/bash
# Understand the overall structure - how many topK gates exist and their scopes
sed -n '6590,6700p' core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

# Check if SelectedRecord.isProjection returns true
sed -n '188,196p' core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java

# Check if VirtualRecord has any supportsPageFrameCursor or supportsFilterStealing override at all
grep -n "Override\|supportsPage\|supportsFilter\|usesCompiled\|getFilter\|halfClose\|getCompiledFilter\|getBindVar" \
    core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java

Length of output: 8225


@DHRUV6029 Here is a thorough critical review of the PR as requested, drawing on the actual source code retrieved above.


Critical Review: PR #6993 — Parallel Top-K Through SELECT Projections

Summary of approach (post-puzpuzpuz refactor)

The refactored design is cleaner than the original three-disjunct approach: translateOrderByColumnToBase and rewrapOverTopK are pushed onto the wrapper factories, and a single gate in generateOrderBy handles all shapes. The design is sound in principle. The issues below are concrete and several are correctness risks.


Bug 1 — Dead variable nested (Line 6667–6668)

IQueryModel.restoreWhereClause(expressionNodePool, model);
final IQueryModel nested = model.getNestedModel();
assert nested != null;   // <-- nested is declared and asserted...

final RecordCursorFactory topK = buildAsyncTopKOverStolenFilter(...);
// ...but `nested` is never read again
return projectionWrapper == null ? topK : projectionWrapper.rewrapOverTopK(topK, orderedMetadata);

nested is a leftover from an earlier version that passed nested to buildAsyncTopKOverStolenFilter. It is never used after the assertion, so the assertion is the only observable effect (and disabled in production). Remove this variable entirely. If the assertion is load-bearing for a correctness argument, document why; otherwise dead assertions accumulate and mislead readers.


Bug 2 — Resource leak: original projectionWrapper abandoned after rewrapOverTopK

When the optimization fires and projectionWrapper != null, the code:

  1. Calls buildAsyncTopKOverStolenFilter, which calls filterFactory.halfClose() to steal filter artifacts from the filter factory.
  2. Calls projectionWrapper.rewrapOverTopK(topK, orderedMetadata) to create a new SelectedRecordCursorFactory / VirtualRecordCursorFactory.
  3. Returns the new wrapper.

The original projectionWrapper instance (and its internal cursor resources) is never closed. In SelectedRecordCursorFactory.rewrapOverTopK:

// SelectedRecordCursorFactory.java line 215-217
public RecordCursorFactory rewrapOverTopK(RecordCursorFactory topK, RecordMetadata orderedMetadata) {
    RecordCursorFactory rewrappedBase = base.rewrapOverTopK(topK, base.getMetadata());
    return new SelectedRecordCursorFactory(orderedMetadata, columnCrossIndex, rewrappedBase);
    // `this` (the original SelectedRecord) is now abandoned with its cursor open
}

SelectedRecordCursorFactory holds a SelectedRecordCursor (line 68: this.cursor = new SelectedRecordCursor(...)) and possibly a SelectedPageFrameCursor. These are not transferred to the new factory. There should be an explicit Misc.free(projectionWrapper) or the factory graph should be reorganized so ownership transfer is unambiguous.


Bug 3 — canReachPageFrameLeafForTopK only peels one projection layer, but VirtualRecordCursorFactory uses instanceof instead of isProjection()

canReachPageFrameLeafForTopK (lines 1008–1017):

private static boolean canReachPageFrameLeafForTopK(RecordCursorFactory f) {
    if (f.supportsPageFrameCursor() || f.supportsFilterStealing()) {
        return true;
    }
    if (f.isProjection() || f instanceof VirtualRecordCursorFactory) {  // single peel
        RecordCursorFactory base = f.getBaseFactory();
        return base != null && (base.supportsPageFrameCursor() || base.supportsFilterStealing());
    }
    return false;
}

Two sub-issues:

3a. VirtualRecordCursorFactory does not override isProjection() (only SelectedRecordCursorFactory at line 191 does). This forces a raw instanceof VirtualRecordCursorFactory check in two places (lines 1012 and 6639) instead of using the isProjection() contract. If any factory subclasses or wraps VirtualRecordCursorFactory, both call sites need manual updates. The fix is to override isProjection() in VirtualRecordCursorFactory and remove the instanceof checks.

3b. The single-peel limitation is undocumented and not surfaced as a fall-through. If the optimizer ever passes a factory where two projection layers sit above the filter (e.g., a future DistinctRecordCursorFactory → SelectedRecord → AsyncJITFilter), the gate silently falls back to Sort light with no log or counter. At minimum, add a Javadoc contract to canReachPageFrameLeafForTopK stating it only peels one wrapper.


Issue 4 — baseOrderByFilter allocated before key resolution is confirmed (Line 6648)

final ListColumnFilter baseOrderByFilter = new ListColumnFilter();
int baseFirstOrderByIdx = -1;
boolean allKeysResolved = true;
for (int i = 0, n = listColumnFilterA.size(); i < n; i++) {
    ...
    if (baseIdx < 0) {
        allKeysResolved = false;
        break;
    }
    baseOrderByFilter.add(...);
}

new ListColumnFilter() is allocated on every invocation of the gate, including the frequent !allKeysResolved path (any query with a computed ORDER BY key). Move the allocation inside if (allKeysResolved), or populate a reusable field-level buffer (the method already uses listColumnFilterA as a reusable buffer — the same pattern should apply here).


Issue 5 — VirtualRecordCursorFactory.rewrapOverTopK passes stale base.getMetadata() to recursive call (Line 165)

// VirtualRecordCursorFactory.java line 164-171
public RecordCursorFactory rewrapOverTopK(RecordCursorFactory topK, RecordMetadata orderedMetadata) {
    RecordCursorFactory rewrappedBase = base.rewrapOverTopK(topK, base.getMetadata()); // <-- base.getMetadata(), not orderedMetadata
    return new VirtualRecordCursorFactory(
            orderedMetadata,
            priorityMetadata,
            functions,
            rewrappedBase,
            priorityMetadata.getVirtualColumnReservedSlots()
    );
}

The orderedMetadata argument received from the caller is correctly used in the outer VirtualRecordCursorFactory constructor. However, base.getMetadata() (the raw filter-factory metadata) is passed to the recursive base.rewrapOverTopK call. The default implementation of rewrapOverTopK ignores its orderedMetadata argument and returns topK unchanged, so this is a no-op today. But if a future intermediate factory overrides rewrapOverTopK, it would receive the wrong metadata. Use a consistent placeholder (e.g., topK.getMetadata()) or document why base.getMetadata() is intentional here. The same pattern appears identically in SelectedRecordCursorFactory.rewrapOverTopK (line 216).


Issue 6 — Missing test: multi-key ORDER BY through projection wrapper

All six ExplainPlanTest cases and both LimitTest cases use a single-key ORDER BY (order by str desc, order by ts2). The gate handles listColumnFilterA.size() > 1 but no test exercises:

SELECT x, str FROM xx WHERE str IS NOT NULL ORDER BY str DESC, x ASC LIMIT 10

with a projection wrapper. A multi-key ORDER BY through a SelectedRecordCursorFactory exercises the full loop over listColumnFilterA and the baseOrderByFilter.add calls for each key, including the signed-index encoding (signed > 0 ? (baseIdx + 1) : -(baseIdx + 1)). Add at least one such test.


Issue 7 — VirtualRecordCursorFactory.translateOrderByColumnToBase does not guard bounds (Line 188–197)

public int translateOrderByColumnToBase(int projectedIndex) {
    Function fn = functions.getQuick(projectedIndex); // <-- no bounds check
    if (!(fn instanceof ColumnFunction columnFn)) {
        return -1;
    }
    int baseIdx = priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex());
    if (baseIdx < 0) {
        return -1;
    }
    return base.translateOrderByColumnToBase(baseIdx);
}

functions.getQuick(projectedIndex) throws IndexOutOfBoundsException if projectedIndex >= functions.size(). This can occur if the ORDER BY column index in the outer metadata exceeds the number of virtual functions (e.g., if orderedMetadata includes more columns than functions covers). The analogous SelectedRecordCursorFactory.translateOrderByColumnToBase (line 248) has the same exposure via columnCrossIndex.getQuick(projectedIndex). Add a bounds guard and return -1 for out-of-range indices, or document the invariant that callers guarantee bounds.


Minor: parallelTopKEnabled evaluated inside a deep nesting (Line 6635)

executionContext.isParallelTopKEnabled() is called at line 6635, deep inside multiple nested if blocks. The result is locally variable-bound, which is fine, but placing the check this deep makes it easy to miss when reading the control flow. Consider hoisting this flag to the same level as loFunc.isConstant() && hiFunc == null for symmetry with the LongTopK check above it.


What is working well

  • The single-gate design with translateOrderByColumnToBase / rewrapOverTopK on the factory (puzpuzpuz's suggested shape) is much cleaner than three disjuncts.
  • The buildAsyncTopKOverStolenFilter helper correctly handles the null-filter case (no filter stealing when !filterFactory.supportsFilterStealing()).
  • ExplainPlanTest plan-shape pinning is the right approach for regression protection.
  • The fallback to Sort light for computed ORDER BY keys is correctly tested.

The items above (especially Bugs 1–3) should be addressed before merging.


🧠 Learnings used
Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: In questdb/questdb, for PRs requested by javier, provide line-cited, push-back style reviews with concrete follow-ups (tests, diffs, and performance considerations).

Learnt from: jerrinot
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-04T11:18:36.492Z
Learning: For PRs by jerrinot, always use a thorough and critical review mode: question design decisions, challenge assumptions, scrutinize implementations for edge cases and potential issues, and avoid superficial validation. They prefer hard-core reviews that push back on the code.

Learnt from: javier
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2026-02-06T18:22:28.130Z
Learning: For questdb/questdb, javier expects a rigorous, push-back style review; always include concrete line-cited findings and propose targeted tests/fixes when reviewing javier’s PRs.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)

1461-1495: ⚠️ Potential issue | 🟠 Major

Clean up helper-owned resources until AsyncTopKRecordCursorFactory owns them.

The catch only covers worker-filter compilation. If the AsyncTopKRecordCursorFactory constructor throws, baseOrderedMetadata, perWorkerFilters, and stolen filter state remain owned by this helper and are leaked.

Proposed cleanup guard
-        final ObjList<Function> perWorkerFilters;
+        ObjList<Function> perWorkerFilters = null;
         try {
             perWorkerFilters = compileWorkerFiltersConditionally(
                     executionContext,
                     stolenFilter,
                     executionContext.getSharedQueryWorkerCount(),
                     stolenFilterExpr,
                     baseMetadata
             );
+
+            return new AsyncTopKRecordCursorFactory(
+                    executionContext.getCairoEngine(),
+                    configuration,
+                    executionContext.getMessageBus(),
+                    baseOrderedMetadata,
+                    pageFrameLeaf,
+                    stolenFilter,
+                    stolenFilterUsedColumnIndexes,
+                    perWorkerFilters,
+                    stolenCompiledFilter,
+                    stolenBindVarMemory,
+                    stolenBindVarFunctions,
+                    recordComparatorCompiler,
+                    baseOrderByFilter,
+                    baseMetadata,
+                    lo,
+                    executionContext.getSharedQueryWorkerCount()
+            );
         } catch (Throwable th) {
+            Misc.free(baseOrderedMetadata);
+            Misc.freeObjList(perWorkerFilters);
             Misc.free(stolenCompiledFilter);
             Misc.free(stolenBindVarMemory);
             Misc.freeObjList(stolenBindVarFunctions);
             Misc.free(stolenFilter);
             throw th;
         }
-
-        return new AsyncTopKRecordCursorFactory(
-                executionContext.getCairoEngine(),
-                configuration,
-                executionContext.getMessageBus(),
-                baseOrderedMetadata,
-                pageFrameLeaf,
-                stolenFilter,
-                stolenFilterUsedColumnIndexes,
-                perWorkerFilters,
-                stolenCompiledFilter,
-                stolenBindVarMemory,
-                stolenBindVarFunctions,
-                recordComparatorCompiler,
-                baseOrderByFilter,
-                baseMetadata,
-                lo,
-                executionContext.getSharedQueryWorkerCount()
-        );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` around lines
1461 - 1495, The AsyncTopKRecordCursorFactory construction can throw and
currently leaks helper-owned resources (baseOrderedMetadata, perWorkerFilters,
stolenFilter, stolenCompiledFilter, stolenBindVarMemory,
stolenBindVarFunctions); wrap the new AsyncTopKRecordCursorFactory(...) call in
a try block and in the corresponding catch/finally free those resources if the
constructor throws (use Misc.free / Misc.freeObjList / Misc.free as used
elsewhere), or move ownership transfer into a method that only relinquishes
ownership after successful constructor return, ensuring symbols
AsyncTopKRecordCursorFactory, perWorkerFilters, baseOrderedMetadata,
stolenFilter, stolenCompiledFilter, stolenBindVarMemory, and
stolenBindVarFunctions are properly released on exceptions.
🧹 Nitpick comments (1)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)

1008-1008: Use is... / has... names for new booleans.

canReachPageFrameLeafForTopK, parallelTopKEnabled, and allKeysResolved violate the repo boolean naming rule.

Suggested rename
-    private static boolean canReachPageFrameLeafForTopK(RecordCursorFactory f) {
+    private static boolean hasReachablePageFrameLeafForTopK(RecordCursorFactory f) {
-                final boolean parallelTopKEnabled = executionContext.isParallelTopKEnabled();
-                if (parallelTopKEnabled && canReachPageFrameLeafForTopK(recordCursorFactory)) {
+                final boolean isParallelTopKEnabled = executionContext.isParallelTopKEnabled();
+                if (isParallelTopKEnabled && hasReachablePageFrameLeafForTopK(recordCursorFactory)) {
-                                        boolean allKeysResolved = true;
+                                        boolean hasResolvedAllKeys = true;

As per coding guidelines, "When choosing a name for a boolean variable, field or method, always use the is... or has... prefix, as appropriate."

Also applies to: 6635-6650

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java` at line 1008,
Rename boolean method and variables to follow is/has convention: change method
canReachPageFrameLeafForTopK to isPageFrameLeafReachableForTopK (or similar
is... name), rename field/variable parallelTopKEnabled to isParallelTopKEnabled,
and rename allKeysResolved to areAllKeysResolved (or hasAllKeysResolved if
semantically better). Update all call sites and tests that reference
canReachPageFrameLeafForTopK, parallelTopKEnabled, and allKeysResolved (and any
related occurrences around 6635-6650) to use the new identifiers to keep
compilation consistent and adhere to the boolean naming guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java`:
- Around line 246-249: The current
SelectedRecordCursorFactory.translateOrderByColumnToBase calls
columnCrossIndex.getQuick(projectedIndex) without bounds checking and can throw
IndexOutOfBoundsException; change it to first check projectedIndex against
columnCrossIndex.size() and if out of range return a negative value per the
contract, otherwise retrieve the cross-index and call
base.translateOrderByColumnToBase with that value; ensure you reference
columnCrossIndex.getQuick(projectedIndex), columnCrossIndex.size() and
base.translateOrderByColumnToBase(...) in the updated implementation.

In
`@core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java`:
- Around line 187-198: The method translateOrderByColumnToBase should guard
against out-of-range projectedIndex when calling
functions.getQuick(projectedIndex); add a cheap bounds check using
functions.size() (or equivalent) before calling getQuick and return -1 if
projectedIndex is negative or >= functions.size(), then proceed to cast to
ColumnFunction, call
priorityMetadata.getBaseColumnIndex(columnFn.getColumnIndex()), and return -1
for negative baseIdx or the result of base.translateOrderByColumnToBase(baseIdx)
otherwise; update the method translateOrderByColumnToBase and references to
functions.getQuick to follow this defensive behavior.

In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 6680-6682: The return path in SqlCodeGenerator where it does
"return projectionWrapper == null ? topK :
projectionWrapper.rewrapOverTopK(topK, orderedMetadata);" leaks the earlier
allocated orderedMetadata when projectionWrapper is null; adjust this branch so
that before returning topK you free or release orderedMetadata (the ORDER BY
metadata created earlier, not baseOrderedMetadata created by the topK helper).
Specifically, in the block handling TopK/ORDER BY, detect the
projectionWrapper==null case and call the appropriate release/dispose method on
orderedMetadata (or transfer ownership to baseOrderedMetadata if intended) so
orderedMetadata is not leaked; keep the rewrapOverTopK call unchanged for the
projectionWrapper!=null path.

In `@core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java`:
- Around line 8771-8877: Add a new test in ExplainPlanTest that mirrors the
existing wrapper-peeling cases but uses a multi-key ORDER BY (e.g., "order by
str desc, x asc limit 10") to ensure translation of all keys; create a test
method (e.g., testSelectWhereOrderByLimit_multiKeyWrapper) that calls assertPlan
with a table like "create table xx ( x long, str varchar )" and a query such as
"select x + 1 as xp, x from xx where str is not null order by str desc, x asc
limit 10" and assert the expected plan shows keys: [str desc, x asc] flowing
through SelectedRecord/VirtualRecord to the Async JIT Top K (or Sort light as
appropriate) to validate full-key translation end-to-end.

In `@core/src/test/java/io/questdb/test/griffin/LimitTest.java`:
- Around line 1391-1408: The test creates both ts and ts2 as x::timestamp so
ORDER BY ts2 vs ts isn’t exercised; change the table creation SQL in LimitTest
(the execute("create table tab as (...") that defines ts and ts2) to produce a
different ts2 (e.g. ts as x::timestamp and ts2 as (1001 - x)::timestamp or
otherwise reversed/distinct sequence) so ordering on ts2 yields different rows,
and update the corresponding expected result strings; apply the same change to
the other identical fixture occurrence that currently defines ts and ts2 the
same.

---

Duplicate comments:
In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Around line 1461-1495: The AsyncTopKRecordCursorFactory construction can throw
and currently leaks helper-owned resources (baseOrderedMetadata,
perWorkerFilters, stolenFilter, stolenCompiledFilter, stolenBindVarMemory,
stolenBindVarFunctions); wrap the new AsyncTopKRecordCursorFactory(...) call in
a try block and in the corresponding catch/finally free those resources if the
constructor throws (use Misc.free / Misc.freeObjList / Misc.free as used
elsewhere), or move ownership transfer into a method that only relinquishes
ownership after successful constructor return, ensuring symbols
AsyncTopKRecordCursorFactory, perWorkerFilters, baseOrderedMetadata,
stolenFilter, stolenCompiledFilter, stolenBindVarMemory, and
stolenBindVarFunctions are properly released on exceptions.

---

Nitpick comments:
In `@core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`:
- Line 1008: Rename boolean method and variables to follow is/has convention:
change method canReachPageFrameLeafForTopK to isPageFrameLeafReachableForTopK
(or similar is... name), rename field/variable parallelTopKEnabled to
isParallelTopKEnabled, and rename allKeysResolved to areAllKeysResolved (or
hasAllKeysResolved if semantically better). Update all call sites and tests that
reference canReachPageFrameLeafForTopK, parallelTopKEnabled, and allKeysResolved
(and any related occurrences around 6635-6650) to use the new identifiers to
keep compilation consistent and adhere to the boolean naming guideline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 95117598-7562-44ab-942c-54ee25bcf2d9

📥 Commits

Reviewing files that changed from the base of the PR and between 3bfe763 and 94d846c.

📒 Files selected for processing (6)
  • core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
  • core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java
  • core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java
  • core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
  • core/src/test/java/io/questdb/test/griffin/LimitTest.java

Comment thread core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
Comment thread core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java
Comment thread core/src/test/java/io/questdb/test/griffin/LimitTest.java
@DHRUV6029 DHRUV6029 marked this pull request as ready for review April 23, 2026 05:29
@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 94d846c to 5481b45 Compare April 24, 2026 03:50
Comment thread core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 9e05093 to e74ba04 Compare April 26, 2026 19:03
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@puzpuzpuz thanks for the careful review. All three items addressed in e74ba041b0.

1. Critical — generateJoinAsof line 4502 guard

Added the same && slave.getColumnCrossIndex() != null guard at line 4502 that already exists at line 4369:

if (slave.isProjection() && slave.getColumnCrossIndex() != null) {
    // peel requires a column cross index; only SelectedRecord provides one

This restores the pre-PR narrow meaning at that specific call site. VirtualRecord slaves now fall through to the generic AsOfJoinLightNoKeyRecordCursorFactory path (the same path they took before isProjection() was widened), instead of entering a peel branch that needs a cross index they don't have.

I also called out the production-correctness risk in the commit body and the PR description: with -ea off, FilteredAsOfJoinNoKeyFastRecordCursorFactory treats a null slaveColumnCrossIndex as "no remap" and would have silently dropped the VirtualRecord layer's translation. Worth flagging because it would have shipped as wrong results, not a crash.

2. Moderate — non-keyed regression test

Added AsOfJoinTest.testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection, the non-keyed analog of the existing keyed test. Same fixture (fx_trades / market_data with a dateadd-style projected timestamp on the slave), same shape, just without the ON (symbol) clause.

It uses assertQueryNoLeakCheck against expected rows rather than assertPlan, so:

  • with -ea (CI): catches the AssertionError regression at line 4514;
  • without -ea (prod): catches the silent wrong-result regression in FilteredAsOfJoinNoKeyFastRecordCursorFactory.

Putting it in AsOfJoinTest.java rather than AsOfJoinNoKeyTest.java because AsOfJoinNoKeyTest is parameterized over JoinType (ASOF/LT) and the bug is ASOF-only (generateJoinAsof is only invoked from JOIN_ASOF, line 4822-4823). Keeping it next to the keyed test also reads as a pair.

3. Minor — single-pass key translation in generateOrderBy

Collapsed the two iterations over listColumnFilterA into one. The ListColumnFilter is now built incrementally and the loop bails on the first unresolved key:

final ListColumnFilter baseOrderByFilter = new ListColumnFilter();
int baseFirstOrderByIdx = -1;
boolean allKeysResolved = true;
for (int i = 0, n = listColumnFilterA.size(); i < n; i++) {
    int signed = listColumnFilterA.getQuick(i);
    int projectedIdx = (signed > 0 ? signed : -signed) - 1;
    int baseIdx = recordCursorFactory.translateOrderByColumnToBase(projectedIdx);
    if (baseIdx < 0) {
        allKeysResolved = false;
        break;
    }
    baseOrderByFilter.add(signed > 0 ? (baseIdx + 1) : -(baseIdx + 1));
    if (i == 0) {
        baseFirstOrderByIdx = baseIdx;
    }
}

This matches the pseudocode in your earlier "simpler approach" proposal verbatim. The fallback semantics are unchanged: when allKeysResolved is false, the partially-built baseOrderByFilter is discarded and the query falls through to Sort light.

Verification

Local runs all green:

  • AsOfJoinTest#testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection and testAsOfJoinWithTopDownFilteredDateaddTimestampProjection: both pass.
  • ExplainPlanTest, LimitTest, ParallelTopKFuzzTest: 576 tests, 0 failures.
  • SqlOptimiserTest: 172 tests, 0 failures.

PR description has been updated to document both peel guards and the test pair.

@DHRUV6029
Copy link
Copy Markdown
Contributor Author

Hello @puzpuzpuz addressed all the concerns and issues , Please take a look
Thanks

@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 198b25d to d23cbd4 Compare April 27, 2026 04:31
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

Hello @puzpuzpuz Gentle Reminder to please review the PR.
Thanks

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@DHRUV6029 sorry for the delay. Your PR is on my radar. Going to check the updates today.

@puzpuzpuz
Copy link
Copy Markdown
Contributor

Minor review comments only. Would be great to address item 1 while we're on it. Others are nit.

Minor

1. VirtualRecordCursorFactory.translateOrderByColumnToBase does not peel MemoizerFunction

File: core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java:197

Function fn = functions.getQuick(projectedIndex);
if (!(fn instanceof ColumnFunction columnFn)) {
    return -1;
}

When an alias is referenced more than once, SqlCodeGenerator.generateSelectVirtual (around line 8333) wraps the underlying ColumnFunction in a MemoizerFunction. The codebase already provides ColumnFunction.unwrap(fn) precisely to peel these wrappers — and generateSelectVirtual uses it at line 8240. The new translation logic does not, so a passthrough column with memoization silently falls back to Sort light.

This is consistent with the existing VirtualFunctionRecordCursor.getLongTopKColumnIndex (line 95) which has the same gap — so it is a missed-optimization carry-over, not a regression. Worth fixing both at once with ColumnFunction.unwrap(fn).

2. Test name and comment for testSelectWhereOrderByLimit_chainedWrapper are misleading

File: core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java

The test comment claims translateOrderByColumnToBase "must chain from SelectedRecordCursorFactory through VirtualRecordCursorFactory to the filter leaf". But canReachPageFrameLeafForTopK (SqlCodeGenerator.java:1015) returns false for SelectedRecord(VirtualRecord(JITFilter)):

  • SelectedRecord.supportsPageFrameCursor() forwards to VirtualRecord.supportsPageFrameCursor() which returns false (default).
  • supportsFilterStealing() is false for both.
  • The predicate's projection branch only inspects f.getBaseFactory() once — the inner VirtualRecord, whose supportsFilterStealing() is false.

The recursive call base.rewrapOverTopK(topK, base.getMetadata()) in SelectedRecordCursorFactory.rewrapOverTopK is also forward-only — no current path constructs a chained input.

In practice the test almost certainly hits the gate at the inner VirtualRecord level (single-layer peel), and the outer SelectedRecord is added later by the parent generateSelectChoose pass. The test still exercises a real path; the comment overstates what it covers. Either drop the comment or adjust the input (e.g. an explicit subquery) so the gate is actually invoked on a stacked wrapper.

3. SelectedRecordCursorFactory.recordCursorSupportsLongTopK lacks the same bounds guard as translateOrderByColumnToBase

File: core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java:205

@Override
public boolean recordCursorSupportsLongTopK(int columnIndex) {
    return base.recordCursorSupportsLongTopK(columnCrossIndex.getQuick(columnIndex));
}

The PR (correctly) added a bounds check to translateOrderByColumnToBase at line 247. The neighbouring recordCursorSupportsLongTopK override does not bound-check columnIndex. Inconsistent. Both have the same metadata-derived caller today, so it is not exploitable, but the asymmetry is a footgun for future callers.

4. Redundant cleanup in buildAsyncTopKOverStolenFilter

File: core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java:1468

The new try/catch around compileWorkerFiltersConditionally frees the stolen state on failure. But generateOrderBy's outer catch (line 6791) already calls recordCursorFactory.close(), which cascades into the (still-referenced) filter factory's _close() — see AsyncJitFilteredRecordCursorFactory._close() at line 409 — which frees compiledFilter, filter, bindVarMemory, and bindVarFunctions again.

The duplicate is harmless because all of those close() implementations are idempotent (e.g. CompiledFilter.close() guards on fnAddress > 0), but the pre-PR code did not have this redundancy. If you want to keep the explicit catch as defense in depth, fine; if not, the original pattern is cleaner.

5. Comments on the AsOf-join guards reference the SelectedRecord constraint without naming it

File: core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java:4379, 4512

The single-line comment "peel requires a column cross index; only SelectedRecord provides one" is fine but loses the rationale recorded at length in the PR description (the silent wrong-results path through FilteredAsOfJoinNoKeyFastRecordCursorFactory when the cross index is null). A one-liner pointing at the test name would make the intent harder to inadvertently undo.

@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 58a9c18 to 746678e Compare May 1, 2026 03:40
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@puzpuzpuz all 5 minor items addressed in 746678e223 (single linear commit on top of upstream/master).

# Item Status
1 ColumnFunction.unwrap(fn) applied in VirtualRecordCursorFactory.translateOrderByColumnToBase:197 and VirtualFunctionRecordCursor.getLongTopKColumnIndex:98 so memoized aliases hit top-K instead of Sort light Fixed
2 Test renamed testSelectWhereOrderByLimit_chainedWrapper -> testSelectWhereOrderByLimit_virtualRecordPeel; comment rewritten to describe the single-layer peel and note that the outer SelectedRecord is added later by generateSelectChoose Fixed
3 Bounds guard added to SelectedRecordCursorFactory.recordCursorSupportsLongTopK:205 so it matches the neighbouring translateOrderByColumnToBase override Fixed
4 Inner try/catch removed from buildAsyncTopKOverStolenFilter; ownership cascades through generateOrderBy's outer recordCursorFactory.close() -> filterFactory._close(), with a one-line comment recording the chain Fixed
5 AsOf-guard comments at SqlCodeGenerator.java:4373 and :4509 now name FilteredAsOfJoinNoKeyFastRecordCursorFactory and the silent-wrong-results path, and point at AsOfJoinTest#testAsOfJoinWithTopDownFilteredDateaddTimestampProjection / #testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection Fixed

Local regression run after the fixes: ExplainPlanTest (531), AsOfJoinTest (110), OrderByLimitTest (2) all green. Ready for another look when you have a moment.


@Override
public boolean isProjection() {
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When isProjection() returns true, it means the following:

Returns true if the factory stands for nothing more but a projection, so that
the above factory (e.g. a parallel GROUP BY one) can steal the projection.

Projection consist of cross-indexes and metadata columns.

VirtualRecordCursorFactory can return arbitrary SQL functions, not just projected base factory's columns.

@puzpuzpuz
Copy link
Copy Markdown
Contributor

puzpuzpuz commented May 4, 2026

Critical — ExtraNullColumnCursorFactory is silently peeled by the new top-K gate

Reproducer

CREATE TABLE trades (ts TIMESTAMP, sym SYMBOL, price DOUBLE) TIMESTAMP(ts) PARTITION BY DAY;
CREATE TABLE prices (ts TIMESTAMP, sym SYMBOL, price DOUBLE) TIMESTAMP(ts) PARTITION BY DAY;
INSERT INTO trades VALUES
    ('2023-01-01T09:10:00.000000Z', 'AAA', 100.0),
    ('2023-01-01T09:11:00.000000Z', 'BBB', 200.0),
    ('2023-01-01T09:12:00.000000Z', 'CCC', 300.0),
    ('2023-01-01T09:13:00.000000Z', 'DDD', 400.0);
INSERT INTO prices VALUES ('2023-01-01T09:00:00.000000Z', 'AAA', 1.0);

SELECT t.sym, t.price, t.ts, sum(p.price) AS window_price
FROM trades t
WINDOW JOIN prices p ON (0 = 1)
RANGE BETWEEN 1 MINUTE PRECEDING AND 1 MINUTE FOLLOWING
ORDER BY t.sym DESC LIMIT 3;
master this PR
plan Async Top K -> ExtraNullColumnRecord -> PageFrame Async Top K -> PageFrame (layer dropped)
output columns sym, price, ts, window_price (4) sym, price, ts (3) — window_price missing

A second variant — ORDER BY window_price DESC LIMIT 3 — throws AssertionError: index out of bounds, 3 >= 3 from baseMetadata.getColumnType(baseFirstOrderByIdx) in buildAsyncTopKOverStolenFilter (SqlCodeGenerator.java:1441); without -ea the same path produces an IndexOutOfBoundsException.

Root cause

ExtraNullColumnCursorFactory.isProjection() == true (line 146, pre-PR), but it does not override the new translateOrderByColumnToBase or rewrapOverTopK. The unified gate at SqlCodeGenerator.java:6643-6697 then:

  1. canReachPageFrameLeafForTopK returns true via the forwarded supportsPageFrameCursor().
  2. projectionWrapper = ExtraNullColumnCursorFactory, pageFrameLeaf = master.
  3. translateOrderByColumnToBase(idx) falls through to the default identity. For idx < columnSplit it succeeds; for idx >= columnSplit it returns an OOB index that hits the assertion above.
  4. projectionWrapper.rewrapOverTopK(topK, orderedMetadata) returns topK unchanged (default impl) — the ExtraNullColumn layer is silently dropped.

Pre-PR the gate wrapped recordCursorFactory directly as the top-K base, so ExtraNullColumn.getPageFrameCursor() continued to splice in the null columns. The PR sets base = pageFrameLeaf, which is the bare master.

Tie-in with r3180067265

Same disease. The Javadoc on isProjection() says "projection consists of cross-indexes and metadata columns" — SelectedRecord and ExtraNullColumn (historically). The PR widens the predicate to mean "anything the new top-K gate can peel" and then has to add getColumnCrossIndex() != null guards at AsOf lines 4371 and 4507 to keep the other isProjection() consumers honest. Those guards restore the old meaning at those callsites; the new top-K gate at line 6653 has no such guard, so ExtraNullColumn slips through.

Fix

Three options. The cleanest, IMO, is the first:

  1. Revert VirtualRecordCursorFactory.isProjection() to false, drop the AsOf getColumnCrossIndex() != null guards (they become unnecessary), and give the top-K gate its own predicate — either instanceof SelectedRecordCursorFactory || instanceof VirtualRecordCursorFactory directly, or a separate default method (canPeelForTopK() or similar) that only the two new-overrides factories return true from. Net change: the unified gate handles exactly the two factories with the new overrides; ExtraNullColumn falls back to Sort light (same as master); the existing isProjection() contract is unchanged.

  2. Keep VirtualRecord.isProjection() = true and add getColumnCrossIndex() != null to the new gate at line 6653, mirroring the AsOf guards. ExtraNullColumn (no cross-index) falls back. Minimal diff but keeps the overloaded predicate.

  3. Override translateOrderByColumnToBase and rewrapOverTopK on ExtraNullColumnCursorFactory to preserve the optimization. Best perf, more code, and still leaves isProjection() overloaded.

Either way, please add a regression test using the SQL above (both the "ORDER BY real column" silent-rows variant and the "ORDER BY null column" assertion variant — the second catches the runtime path with -ea off).

@puzpuzpuz
Copy link
Copy Markdown
Contributor

puzpuzpuz commented May 4, 2026

Smaller items

Moderate — missing regression test for MemoizerFunction peel

The PR added ColumnFunction.unwrap calls in two places (VirtualRecordCursorFactory.translateOrderByColumnToBase:197 and VirtualFunctionRecordCursor.getLongTopKColumnIndex:98) explicitly to handle the memoizer wrapping that generateSelectVirtual inserts when an alias is referenced more than once. No test in the PR actually exercises that path — the new LimitTest and ExplainPlanTest cases all use single-reference projections. Without a test, future refactors can quietly regress the optimization back to Sort light. A query along the lines of SELECT x AS a, a, a FROM ... WHERE ... ORDER BY a DESC LIMIT N (or whichever shape forces memoization through generateSelectVirtual) would pin the new unwrap behavior.

Minor — comment at SqlCodeGenerator.java:1469 references _close() instead of close()

// generateOrderBy's outer catch closes recordCursorFactory, which cascades into the
// still-referenced filterFactory._close() and frees the stolen filter handles.

_close() is the protected template method on AbstractRecordCursorFactory; the cascading call from the outer catch goes through close(). Either drop the underscore or rephrase as "cascades through close() into filterFactory._close()".

Nit — VirtualRecordCursorFactory.rewrapOverTopK shares functions between old and new wrapper

The new factory at VirtualRecordCursorFactory.java:171-177 is constructed with the same functions ObjList reference as the orphan it replaces. The Javadoc on RecordCursorFactory.rewrapOverTopK already documents this ("its state has either transferred to the returned factory or been dropped on the floor, matching the AsOf/LatestBy peel precedent"), and the orphan is unreachable after the gate returns, so there's no double-free in practice. Worth a one-line code comment on the override pointing at the Javadoc — the shared mutable reference is non-obvious next to _close() calling Misc.freeObjList(functions).

@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 87d3520 to d2c3e84 Compare May 5, 2026 03:51
@puzpuzpuz puzpuzpuz self-requested a review May 5, 2026 09:34
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is almost good to go, thanks for handling all of the review feedback. Could you address the following moderate level items? The fixes are pretty straight-forward, so it shouldn't take long.

Moderate

M1 — Latent: recursive rewrapOverTopK passes pre-ORDER-BY metadata down the chain

Files: core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java:218-224, core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java:169-181 (in-diff)

Both overrides do:

RecordCursorFactory rewrappedBase = base.rewrapOverTopK(topK, base.getMetadata());

base.getMetadata() is the wrapper's pre-ORDER-BY metadata, with the original timestamp index. Today the predicate canReachPageFrameLeafForTopK peels at most one wrapper, so the recursive call always lands on a non-projection base whose default rewrapOverTopK ignores the metadata argument — the bug is dormant. But the file comment at SqlCodeGenerator.java:1052-1056 explicitly contemplates widening the predicate. The moment a future change lets the chain run two layers deep (e.g. SR(VR(filter(pageFrame)))), the inner wrapper's getMetadata().getTimestampIndex() will lie about ordering whenever AsyncTopK's baseOrderedMetadata set the timestamp index to -1 (ORDER BY a non-timestamp column).

Fix: either thread the gate's orderedMetadata down each layer with the appropriate timestamp index, or assert !base.isProjection(); at the recursive call site so the trap is loud if the predicate is ever widened.

M2 — Comment in the keyed AsOf guard describes the wrong factory and wrong failure mode

File: core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java:4420-4424 (in-diff)

The comment in the keyed AsOf branch reads:

Without it, FilteredAsOfJoinNoKeyFastRecordCursorFactory silently produces wrong results …

But this branch builds FilteredAsOfJoinFastRecordCursorFactory (line 4396), not the NoKey variant. The PR description itself notes that the keyed branch trips assert stolenCrossIndex != null — i.e. it asserts, it does not silently produce wrong results. The non-keyed branch at line 4556 correctly references the NoKey class and the silent-wrong-results path.

Fix: rewrite the keyed comment to reference FilteredAsOfJoinFastRecordCursorFactory and the assertion failure, so future readers don't conflate the two failure modes.

M3 — isProjection() Javadoc is now inconsistent with the widened semantics

File: core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java:270-282 (out-of-diff but directly relevant)

The Javadoc still reads:

Returns true if the factory stands for nothing more but a projection… Projection consist of cross-indexes and metadata columns.

VirtualRecordCursorFactory.isProjection() now returns true even though the wrapper holds arbitrary SQL functions (not pure column references). The PR's defence is the new && slave.getColumnCrossIndex() != null guard at the two AsOf join sites, which correctly excludes VirtualRecord. Without a doc fix, future callers of isProjection() will keep assuming the old narrower contract (cross-index always non-null).

Fix: update the Javadoc to document the widened contract — something like "the wrapper hides a steal-able base; cross-index/base-column-name are guaranteed only when getColumnCrossIndex() != null".

M4 — ExtraNullColumnCursorFactory is the third isProjection()=true class and silently relies on never reaching the new gate

Files: core/src/main/java/io/questdb/griffin/engine/table/ExtraNullColumnCursorFactory.java:145, 169-171 (out-of-diff)

ExtraNullColumnCursorFactory.isProjection() returns true (pre-existing), getColumnCrossIndex() is the default null, and supportsPageFrameCursor() forwards to base. It does not override translateOrderByColumnToBase or rewrapOverTopK. Today it only wraps a window-join master (which doesn't support page-frame cursors), so canReachPageFrameLeafForTopK rejects it and the gate doesn't fire.

The arrangement is fragile: any future change that makes ExtraNull wrap a page-frame-capable base, or any base whose supportsFilterStealing() is true, will silently flow into the new gate and produce wrong results because:

  1. the default identity translateOrderByColumnToBase ignores the null-padding split, and
  2. the default rewrapOverTopK returns the bare topK and drops the null-padding wrapper entirely (output column count would shrink).

Fix: either gate the new path on getColumnCrossIndex() != null (mirroring the AsOf fix at lines 4420 and 4556) or have ExtraNull explicitly override both methods even with no-op semantics that fall back to Sort light.

@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from d2c3e84 to 038d422 Compare May 6, 2026 03:33
@DHRUV6029
Copy link
Copy Markdown
Contributor Author

@puzpuzpuz Thanks for the deep review. Applied option 1 (your preferred): introduced a dedicated canPeelForTopK() default method, reverted VirtualRecordCursorFactory.isProjection() to false, and dropped the AsOf getColumnCrossIndex() != null guards now that they're redundant. Pushed in 038d422.

Issue Status Fix
CriticalExtraNullColumnCursorFactory silently peeled by top-K gate Fixed New canPeelForTopK() predicate; only Selected and Virtual override it. ExtraNullColumn (and any future projection-like wrapper) inherits false by default and is correctly excluded.
AsOf getColumnCrossIndex() != null guards Removed Restored to slave.isProjection() only — guards are unnecessary now that isProjection() is back to its original narrow "cross-indexes and metadata columns" meaning.
VirtualRecord.isProjection() widening Reverted Override removed; falls back to default false.

Regression tests (LimitTest):

  • testTopKDoesNotPeelExtraNullColumnSplice — your WINDOW JOIN repro with ORDER BY t.sym DESC LIMIT 3. Confirms the splice layer survives (output keeps all 4 columns including window_price).
  • testTopKDoesNotPeelExtraNullColumnSpliceOrderByNullColumn — your repro with ORDER BY window_price DESC LIMIT 3. Confirms no AssertionError (with -ea) or IndexOutOfBoundsException (without -ea); top-K runs to completion over the spliced layer.

Both new tests fail on the prior version of this branch and pass after the fix. Full local sweep: LimitTest (all green), ExplainPlanTest (574 tests, 0 failures, 2 pre-existing skips), AsOfJoinTest + AsOfJoinNoKeyTest + LtJoinTest (120 tests, 0 failures), WindowJoinTest (352 tests, 0 failures, 186 parameterization skips).

The earlier smaller items from your 2026-05-04T08:24:13Z comment (memoizer test, _close() comment wording, shared-functions comment on rewrapOverTopK) were already addressed in the prior round; verified still in place.

Ready for another look.

@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 038d422 to 2b830ee Compare May 6, 2026 06:35
@DHRUV6029 DHRUV6029 requested a review from puzpuzpuz May 6, 2026 06:37
Queries of the form SELECT <projection> FROM t WHERE ... ORDER BY ...
LIMIT N previously fell through to Sort light whenever the SELECT
list contained a non-literal expression, because the projection
wrapper (SelectedRecord or VirtualRecord) hid the steal-able filter
from the parallel top-K gate in SqlCodeGenerator#generateOrderBy.

Collapse the three filter-stealing disjuncts (direct, SelectedRecord
peel, VirtualRecord peel) into one. The projection-specific pieces
move onto the wrapper factories:

- translateOrderByColumnToBase maps an ORDER BY column index from the
  outer metadata to the base page-frame metadata. The default on
  RecordCursorFactory returns the input unchanged. Wrapper overrides
  chain through the base factory, so nested wrappers (e.g.
  SelectedRecord over VirtualRecord) are handled by delegation rather
  than a dedicated disjunct.
- rewrapOverTopK re-applies the projection on top of the freshly
  built AsyncTopK factory. Each wrapper implements it with a single
  constructor call and receives the same orderedMetadata argument,
  so the two paths cannot drift.
- canPeelForTopK is a dedicated predicate that the top-K gate uses
  in place of isProjection. Selected and Virtual override it to true;
  every other factory (including ExtraNullColumn) inherits the false
  default. This keeps isProjection at its original narrow meaning
  ("cross-indexes and metadata columns") for the GROUP BY / AsOf peel
  consumers, and lets the top-K gate name exactly the invariant it
  needs: "I can rewrite my ORDER BY into a base column and rebuild
  myself over a new top-K." ExtraNullColumn cannot do either (its
  null-column splice has no base counterpart), so it falls through
  the gate as projectionWrapper=null and top-K iterates the splice
  directly via its forwarded page-frame cursor, preserving the
  spliced columns in the output.

The unified branch calls translateOrderByColumnToBase on every ORDER
BY key in a single pass: if any key cannot be resolved to a base
column (e.g. a computed VirtualRecord column), the branch bails and
Sort light handles the query. Otherwise it builds AsyncTopK on the
page-frame leaf via buildAsyncTopKOverStolenFilter and re-wraps
through rewrapOverTopK when a projection wrapper is present.

The two SqlOptimiserTest.testJoinAndUnionQuery... plan assertions are
updated to reflect the new top-K shape: SelectedRecord now sits above
the Async Top K (not below) and keys reference the base column ts
(not the projected alias LAST), matching the unified branch's design.

Tests:
- ExplainPlanTest: testSelectWhereOrderByLimit1..9 cover the bug fix,
  the direct page-frame case, the SelectedRecord peel, the
  VirtualRecord peel, the no-filter projection path, and the
  rejection guards for ORDER BY on a computed column and two-bound
  LIMIT. testSelectWhereOrderByLimit_virtualRecordPeel pins the
  single-layer VirtualRecord peel; testSelectWhereOrderByLimit_memoizedPassthrough
  pins the ColumnFunction.unwrap path that peels MemoizerFunction
  wrappers around passthrough columns.
- LimitTest: testTopKThroughProjection and
  testTopKThroughVirtualProjection exercise row-level correctness
  and repeat the virtual-projection query to catch any leak in
  filter, function, page-frame, or comparator state across the
  steal / halfClose / transfer boundary.
  testTopKDoesNotPeelExtraNullColumnSplice and the
  ...OrderByNullColumn variant pin the WINDOW JOIN ON (0=1) shape
  that constructs ExtraNullColumnCursorFactory: the splice must
  survive the gate (output keeps all four columns), and ORDER BY a
  spliced null column must not assertion-fail or drop rows.
- AsOfJoinTest.testAsOfJoinWithTopDownFilteredDateaddTimestampProjection
  and testAsOfJoinNoKeyWithTopDownFilteredDateaddTimestampProjection
  pin the keyed and non-keyed virtual-slave-projection paths against
  isProjection regressions.

Fixes questdb#6528
@DHRUV6029 DHRUV6029 force-pushed the fix/issue-6528-topk-projection branch from 4ec73b4 to 13bf78d Compare May 6, 2026 06:38
Comment thread core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java Outdated
@puzpuzpuz puzpuzpuz self-requested a review May 11, 2026 10:13
@puzpuzpuz
Copy link
Copy Markdown
Contributor

Thanks for the contribution!

@bluestreak01 bluestreak01 merged commit 1587f99 into questdb:master May 12, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Complex projection disables top-k optimization

3 participants