perf(sql): speed up keyed parallel GROUP BY in case of high cardinality count_distinct() #6432

puzpuzpuz · 2025-11-22T12:29:22Z

Include the following changes around parallel keyed GROUP BY:

Partially reverts changes from perf(sql): lazy merge in parallel count_distinct functions #6268 - merge in parallel count_distinct() functions is now eager, not lazy. That's for the optimization from the next item.
Introduces lightweight cardinality statistics to count_distinct() functions: they're now able to report how many distinct values in total they observed. The cardinality is considered by parallel keyed GROUP BY factory (AsyncGroupByRecordCursorFactory) when it checks whether it should switch to map sharding (a.k.a. radix partitioning), along with per-worker map sizes. This way, when there are high-cardinality count_distinct() hash sets, we run map merge phase in parallel.
Changes the number of map shards to be always 256: smaller maps improve CPU cache locality and help with better work distribution when merging the map. Higher count of smaller tasks means that each worker will do approximately equal amount of work.
Changes the default value of cairo.sql.parallel.groupby.sharding.threshold configuration property to 10k instead of 100k. This means that we switch to map sharding (think, parallel merge) in more cases.

Benchmarks

ClickBench results on Ryzen 7900x 64GB RAM box running Ubuntu 22.04 are below. The comparison is done with v9.1 for the sake of fair comparison with what we had before #6268

…ty count_distinct()

coderabbitai · 2025-11-22T12:29:27Z

Walkthrough

This PR implements a group-by optimization initiative: lowering the parallel sharding threshold from 100,000 to 10,000, adding cardinality tracking to group-by functions, replacing list-based accumulation with set-based merging in count-distinct implementations, removing the workerCount parameter from distinct-count functions, and expanding AsyncGroupByAtom from 128 to 256 shards with enhanced sharding decision logic.

Changes

Cohort / File(s)	Summary
Default Configuration & Thresholds `core/src/main/java/io/questdb/PropServerConfiguration.java`, `core/src/main/resources/io/questdb/site/conf/server.conf`, `pkg/ami/marketplace/assets/server.conf`	Updated default threshold for parallel GROUP BY sharding from 100,000 to 10,000 across configuration files and defaults.
Core Hashing Infrastructure `core/src/main/java/io/questdb/std/Hash.java`	Added new public method `hashShort64(short k)` to compute hash from short values using fmix64 conversion.
Map Implementation `core/src/main/java/io/questdb/cairo/map/Unordered2Map.java`, `core/src/main/java/io/questdb/cairo/map/Unordered2MapRecord.java`	Implemented non-trivial hash functions in `hash()` and `keyHashCode()` using `Hash.hashShort64()` for map sharding support.
GroupByFunction Interface `core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java`	Added two default methods: `getCardinalityStat()` returning long (default 0) and `resetStats()` (no-op default) for cardinality tracking.
AbstractCountDistinctIntGroupByFunction `core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractCountDistinctIntGroupByFunction.java`	Removed `GroupByLongList` dependency; constructor simplified from four to two hash set parameters; added `getCardinalityStat()` and `resetStats()` methods; refactored merge logic to use set-based merging with size-based optimization.
CountDistinct Int Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunctionFactory.java`	Removed `workerCount` parameter from constructor; added cardinality tracking in `computeFirst()` and `computeNext()`; factory updated to pass fewer arguments.
CountDistinct Symbol Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunctionFactory.java`	Removed `workerCount` parameter and `GroupByLongList` usage; added cardinality tracking across insertion paths; factory updated accordingly.
CountDistinct Long Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunctionFactory.java`	Removed `workerCount` parameter; replaced list-based accumulation with set-based merging; added `getCardinalityStat()` and `resetStats()`; added `isGroupBy()` to factory.
CountDistinct IPv4 Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunctionFactory.java`	Removed `workerCount` parameter from constructor; added cardinality tracking in compute methods; factory updated.
CountDistinct UUID Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunctionFactory.java`	Removed `workerCount` parameter; replaced list-based with set-based merging; added `getCardinalityStat()` and `resetStats()`.
CountDistinct Long256 Implementation `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunctionFactory.java`	Removed `workerCount` parameter; refactored from list-based to set-based merging; added cardinality tracking and stats methods; added `isGroupBy()` to factory.
Async GroupBy Engine `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java`	Expanded from 128 to 256 shards; introduced `shardedHint` and `ownerGroupByFunctions` tracking; added `maybeEnableSharding()` and `updateShardedHint()` methods; changed `MapFragment` from boolean owner to `int slotId`; updated all shard allocation and loop logic.
Async GroupBy Record Cursor `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java`	Removed `groupByFunctions` field and constructor parameter; cleanup logic updated.
Async GroupBy Record Cursor Factory `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java`	Removed `groupByFunctions` field; cursor construction updated; now retrieves group-by functions from atom via `getOwnerGroupByFunctions()`; replaced `requestSharding()` with `maybeEnableSharding()`.
Utility List Classes `core/src/main/java/io/questdb/std/DirectLongLongAscList.java`, `core/src/main/java/io/questdb/std/DirectLongLongDescList.java`	Added early-return guard when `p == capacity` in `add()` method to prevent out-of-bounds insertion.
Test Updates `core/src/test/java/io/questdb/test/PropServerConfigurationTest.java`, `core/src/test/java/io/questdb/test/ServerMainTest.java`, `core/src/test/java/io/questdb/test/std/DirectLongLongAscListTest.java`, `core/src/test/java/io/questdb/test/std/DirectLongLongDescListTest.java`	Updated expected threshold values from 100,000 to 10,000; expanded fuzz test iterations from 10,000 to 100,000 with duplicate probability logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

AsyncGroupByAtom.java: Substantial structural changes to sharding model (128→256 shards), new tracking fields (ownerGroupByFunctions, shardedHint), MapFragment refactoring from boolean to int slotId, multiple new methods (maybeEnableSharding(), updateShardedHint(), calculateLocalFunctionCardinality()), and widespread updates to allocation and loop logic.
AbstractCountDistinctIntGroupByFunction.java and similar classes: Constructor signature changes with removal of GroupByLongList parameter, significant refactoring of merge logic from list-based to set-based approach, new cardinality tracking across multiple similar implementations requiring consistency verification.
CountDistinct factory and implementation pairs*: Multiple coordinated changes across constructor signatures, cardinality tracking additions, and factory instantiation updates; verify all call sites are consistent.
AsyncGroupByRecordCursorFactory.java: Dependency shifts from local groupByFunctions field to retrieving from atom; verify data flow and initialization order.

Possibly related PRs

questdb/questdb#6268: Directly related—both PRs modify the same count-distinct implementation family (multiple CountDistinct* classes and AbstractCountDistinctIntGroupByFunction), hash utilities, and group-by infrastructure, indicating coordinated optimization effort.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly describes the main objective: optimizing parallel keyed GROUP BY for high cardinality count_distinct() functions, which is the core focus of the changeset.
Description check	✅ Passed	The PR description comprehensively explains the changes made, including the rationale for each modification (eager merging, cardinality statistics, shard count increase, threshold reduction) and includes benchmark results demonstrating the performance improvements.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch puzpuzpuz_high_cardinality_count_distinct

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.0)

core/src/main/java/io/questdb/PropServerConfiguration.java

core/src/test/java/io/questdb/test/PropServerConfigurationTest.java

[]

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

glasstiger · 2025-11-22T21:40:21Z

[PR Coverage check]

😍 pass : 130 / 164 (79.27%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunction.java	6	19	31.58%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunction.java	6	19	31.58%
🔵	io/questdb/std/DirectLongLongAscList.java	1	2	50.00%
🔵	io/questdb/std/DirectLongLongDescList.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunction.java	3	4	75.00%
🔵	io/questdb/griffin/engine/functions/groupby/AbstractCountDistinctIntGroupByFunction.java	13	17	76.47%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunction.java	20	21	95.24%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunctionFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunctionFactory.java	1	1	100.00%
🔵	io/questdb/std/Hash.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunction.java	4	4	100.00%
🔵	io/questdb/cairo/map/Unordered2MapRecord.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunctionFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunctionFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunctionFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunctionFactory.java	1	1	100.00%
🔵	io/questdb/PropServerConfiguration.java	1	1	100.00%
🔵	io/questdb/cairo/map/Unordered2Map.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/GroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByAtom.java	54	54	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java	6	6	100.00%

bluestreak01 · 2025-11-23T01:23:44Z

@CodeRabbit review

coderabbitai · 2025-11-23T01:23:48Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (8)

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunction.java (1)

145-173: Review merge strategy and cardinality implications.

The merge logic now uses eager set-based merging instead of lazy list accumulation. The size-based strategy (Line 163-171) merges the smaller set into the larger one for efficiency.

Question: Should cardinality be updated during merge operations? Currently, cardinality only tracks additions during computeFirst/computeNext but not during merge. If cardinality statistics are used for sharding decisions across parallel workers, the merge phase should not affect per-worker statistics. Please verify this is the intended behavior.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunction.java (1)

148-176: Verify cardinality behavior during merge operations.

The merge logic follows the same pattern as CountDistinctUuidGroupByFunction with size-based merging. The same question applies: should cardinality be updated during merge, or is it intentionally tracking only per-worker additions? Based on the PR context, this appears intentional for sharding decisions, but please confirm.
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java (3)
72-75: Shard constants and hash-to-shard mapping assume power-of-two NUM_SHARDS.

Using NUM_SHARDS = 256 with NUM_SHARDS_SHR = Long.numberOfLeadingZeros(NUM_SHARDS) + 1 and hashCode >>> NUM_SHARDS_SHR gives 8 high bits and maps cleanly into 256 shards. This is correct as long as NUM_SHARDS stays a power of two; if someone changes it to a non-power-of-two later, the distribution becomes incorrect.

Consider either documenting this assumption explicitly or adding a defensive check (e.g., assert (NUM_SHARDS & (NUM_SHARDS - 1)) == 0;) to prevent accidental misconfiguration in the future.

Also applies to: 674-676, 740-745

156-169: Make updateShardedHint() robust to partially initialized destShards.

destShards is sized via setPos(NUM_SHARDS) and lazily populated in reopenDestShard(). In updateShardedHint(), when sharded == true you do:
for (int i = 0; i < NUM_SHARDS; i++) {
    totalShardSize += destShards.getQuick(i).size();
}
This assumes every shard index has gone through reopenDestShard() (i.e., destShards[i] is non-null and open). If any shard is never merged (e.g., future changes avoid scheduling merges for empty shards), this will NPE.

Even if current scheduling guarantees all shards are merged, this is a subtle invariant. To make this code future-proof and tolerant of partial merges, you could guard against null/closed maps:
-        if (sharded) {
-            for (int i = 0; i < NUM_SHARDS; i++) {
-                totalShardSize += destShards.getQuick(i).size();
-            }
-        } else {
+        if (sharded) {
+            for (int i = 0; i < NUM_SHARDS; i++) {
+                final Map shard = destShards.getQuick(i);
+                if (shard != null && shard.isOpen()) {
+                    totalShardSize += shard.size();
+                }
+            }
+        } else {
             totalShardSize = ownerFragment.map.size();
         }
This keeps the heuristic semantics while avoiding a fragile assumption about merge scheduling.

Also applies to: 241-255, 609-628

509-516: reopen() should probably synchronise sharded with shardedHint to avoid sticky sharded mode.

Currently:
public void reopen() {
    if (shardedHint) {
        // Looks like we had to shard during previous execution, so let's do it ahead of time.
        sharded = true;
    }
    // maps opened lazily...
}
If sharded was set to true by maybeEnableSharding() in a previous run and shardedHint later becomes false (low cardinality run), reopen() never forces sharded back to false unless clear() is called in between. That can leave the atom permanently in sharded mode for subsequent executions, which is at least a performance concern.

If the intended lifecycle is “reuse atom across queries and let shardedHint drive pre‑sharding”, then making reopen() do:
-    public void reopen() {
-        if (shardedHint) {
-            sharded = true;
-        }
-    }
+    public void reopen() {
+        // Next execution starts according to the last hint.
+        sharded = shardedHint;
+        // Maps will be opened lazily by worker threads.
+    }
would keep sharded aligned with the latest decision. If, on the other hand, callers always invoke clear() between executions, it would be good to document that expectation clearly.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunction.java (1)

156-212: Merge logic for long distinct sets looks correct; consider clarifying size‑based heuristic

The updated merge covers all combinations:

Treats srcCount == 0 || LONG_NULL as “no contribution”, and destCount == 0 || LONG_NULL as “take src as‑is”, which aligns with setEmpty/setNull invariants.

Correctly handles inline↔inline, inline↔set, and set↔set cases, always ending with a consistent (count, ptr) pair.

Uses setA.size() > (setB.size() >>> 1) to decide whether to merge B into A or A into B, with the comment clarifying the “significantly smaller” case in the else branch.

Functionally this looks sound. For readability/maintainability you might want to:

Explicitly document the invariant that count == Numbers.LONG_NULL implies the pointer is 0 (so skipping on LONG_NULL can never drop a non‑empty set).

Optionally rephrase the condition to make the “merge smaller set into larger set” intention clearer (e.g., by computing both sizes once and naming them).

These are clarity tweaks only; no blocking issues from the current implementation.
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractCountDistinctIntGroupByFunction.java (2)
43-50: Clarify lifecycle and ownership of cardinality stat

cardinality is only touched via getCardinalityStat() and resetStats(), but this base class doesn’t update it itself. That’s fine if subclasses are responsible for incrementing it and callers consistently invoke resetStats() between runs, but it’s not obvious from here.

Consider:

Adding a short Javadoc or comment documenting that subclasses must maintain cardinality, and when resetStats() is expected to be called, or

Optionally tying resetStats() into clear() if the intended lifecycle is “one query run per clear”, to avoid stale stats if a caller forgets to reset.

This would make the new stat API harder to misuse by future implementations.

Also applies to: 63-66, 170-173

93-101: Merge logic for inline vs set-backed state looks correct; consider small documentation tweak

The updated merge() correctly handles the three representations:

count == 0 / LONG_NULL: treated as empty.

count == 1: second column as inlined value.

count > 1: second column as GroupByIntHashSet pointer, with setA/setB used to access and merge sets.

The early-return on empty srcCount, the fast path when destCount is empty, and the inline-to-set promotion paths all look consistent with that representation.

For the set–set branch:
setA.of(destPtr);
setB.of(srcPtr);

if (setA.size() > (setB.size() >>> 1)) {
    setA.merge(setB);
    ...
} else {
    // Set A is significantly smaller than set B, so we merge it into set B.
    setB.merge(setA);
    ...
}
The heuristic (“merge smaller into larger when dest is at most half the size of src”) is reasonable. To aid future readers, you might slightly expand the comment to spell out that the else branch is taken when setA.size() <= setB.size() / 2 and in that case setA is the smaller set being merged into setB.

Otherwise, the merge-direction logic and updates of count and pointer after each branch look sound.

Also applies to: 112-168

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f744c20 and f52284a.

📒 Files selected for processing (29)

core/src/main/java/io/questdb/PropServerConfiguration.java (1 hunks)
core/src/main/java/io/questdb/cairo/map/Unordered2Map.java (2 hunks)
core/src/main/java/io/questdb/cairo/map/Unordered2MapRecord.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/AbstractCountDistinctIntGroupByFunction.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunction.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunctionFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunction.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunctionFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunction.java (6 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunctionFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunction.java (8 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunctionFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunction.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunctionFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunction.java (6 hunks)
core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunctionFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java (17 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java (0 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java (6 hunks)
core/src/main/java/io/questdb/std/DirectLongLongAscList.java (1 hunks)
core/src/main/java/io/questdb/std/DirectLongLongDescList.java (1 hunks)
core/src/main/java/io/questdb/std/Hash.java (1 hunks)
core/src/main/resources/io/questdb/site/conf/server.conf (1 hunks)
core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (1 hunks)
core/src/test/java/io/questdb/test/ServerMainTest.java (1 hunks)
core/src/test/java/io/questdb/test/std/DirectLongLongAscListTest.java (1 hunks)
core/src/test/java/io/questdb/test/std/DirectLongLongDescListTest.java (1 hunks)
pkg/ami/marketplace/assets/server.conf (1 hunks)

💤 Files with no reviewable changes (1)

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-19T12:21:00.062Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.

Applied to files:

core/src/test/java/io/questdb/test/std/DirectLongLongAscListTest.java
core/src/test/java/io/questdb/test/std/DirectLongLongDescListTest.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)

GitHub Check: New pull request (Coverage Report Coverage Report)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
GitHub Check: New pull request (Hosted Running tests on windows-other-2)
GitHub Check: New pull request (Hosted Running tests on windows-other-1)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
GitHub Check: New pull request (Check Changes Check changes)

🔇 Additional comments (47)

core/src/main/java/io/questdb/std/DirectLongLongDescList.java (1)

49-51: LGTM! Defensive guard prevents out-of-bounds memory operations.

This guard prevents insertion when the list is at capacity and the binary search returns a position equal to capacity. Without it, Line 53's memmove could operate on invalid memory bounds.

core/src/main/java/io/questdb/std/DirectLongLongAscList.java (1)

49-51: LGTM! Defensive guard prevents out-of-bounds memory operations.

This guard mirrors the fix in DirectLongLongDescList.java and prevents out-of-bounds memmove operations when the list is at capacity.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunctionFactory.java (1)

54-58: LGTM! Factory updated to match simplified constructor.

The removal of the workerCount parameter aligns with the refactored CountDistinctUuidGroupByFunction constructor.

core/src/main/java/io/questdb/griffin/engine/functions/GroupByFunction.java (2)

87-94: LGTM! Well-designed API addition for cardinality tracking.

The default method pattern ensures backward compatibility while enabling implementations to expose cardinality statistics. Documentation clearly specifies the contract.

162-168: LGTM! Companion method for resetting statistics.

The default no-op implementation maintains backward compatibility while allowing count_distinct implementations to reset their cardinality counters.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctUuidGroupByFunction.java (3)

44-51: LGTM! Simplified constructor and cardinality tracking.

The removal of workerCount simplifies the API. The new cardinality field enables lightweight statistics collection for sharding decisions.

60-88: LGTM! Cardinality tracking correctly increments on new distinct values.

The cardinality counter increments only when a new UUID is added (Line 67 in computeFirst, Line 85 in computeNext when index >= 0).

175-178: LGTM! Clean implementation of stats reset.

The resetStats() implementation correctly resets the cardinality counter to zero.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunction.java (3)

45-52: LGTM! Simplified constructor matches pattern across count_distinct implementations.

The refactoring is consistent with other count_distinct functions, removing workerCount and adding cardinality tracking.

60-91: LGTM! Cardinality tracking correctly increments on distinct Long256 values.

The implementation correctly increments cardinality only when new distinct values are added.

178-181: LGTM! Stats reset implemented correctly.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLong256GroupByFunctionFactory.java (1)

54-58: LGTM! Factory updated consistently with constructor changes.

The removal of the workerCount parameter aligns with the simplified CountDistinctLong256GroupByFunction constructor.

core/src/main/resources/io/questdb/site/conf/server.conf (1)

441-441: Code default value is consistent with server.conf configuration.

The verification confirms that the code default in PropServerConfiguration.java (line 1829) sets the threshold to 10,000, which matches the server.conf setting. This consistency across both configuration sources is correct and intentional.
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java (4)
395-401: Cardinality-based sharding heuristic looks reasonable; confirm GroupByFunction stat semantics.

The new heuristic:

Resets per-fragment stats at the start of each frame via fragment.resetLocalStats().

Uses fragment.calculateLocalFunctionCardinality() in maybeEnableSharding() to trigger sharding when max(cardinalityStat) for this fragment exceeds 10 * threshold.

Accumulates totalFunctionCardinality per fragment and then sums owner + workers in updateShardedHint() to drive shardedHint across executions.

Structurally this is sound and matches the PR description (switching to sharding when high count_distinct() cardinality makes merging hash sets expensive). The only thing to validate is that all GroupByFunction implementations used here:

Implement resetStats() so that getCardinalityStat() for a fragment reflects “this frame’s contribution” (or at least a monotonic and meaningful measure per frame).

Are safe to call resetStats()/getCardinalityStat() in the same concurrency pattern as their regular updates (one thread per fragment).

Assuming those contracts hold, the heuristic is good and nicely targeted at the high‑cardinality count_distinct cases.

Also applies to: 696-702, 727-735, 620-623

301-305: Owner/per-worker function wiring for stats reset is consistent with existing allocator/updater setup.

getOwnerGroupByFunctions() plus getGroupByFunctions(int slotId) and MapFragment.resetLocalStats() correctly route:

Slot -1 (owner) and any configuration without per-worker functions to ownerGroupByFunctions.

Non-owner slots with perWorkerGroupByFunctions != null to their respective per-worker lists.

This matches how GroupByFunctionsUpdater and allocators are wired and ensures cardinality stats are reset on the same function instances that are being updated.

The “thread-unsafe” note on getOwnerGroupByFunctions() is appropriate; current usage from AsyncGroupByRecordCursorFactory.toPlan() is read-only and single-threaded, so no issues there.

Also applies to: 538-543, 646-657, 696-702

200-214: Clearing owner/per-worker group-by functions vs freeing them is consistent with factory lifecycle.

clear() now calls Misc.clearObjList(ownerGroupByFunctions) and clears each perWorkerGroupByFunctions list without freeing them, while close() frees them. Given AsyncGroupByRecordCursorFactory._close() frees recordFunctions (which includes group-by functions), this split between “reuse within the same factory” and “final teardown” looks correct and should avoid double-frees.

No changes recommended here.

646-668: MapFragment.close() correctly resets cardinality state.

Adding:
sharded = false;
totalFunctionCardinality = 0;
before closing the underlying map and shard maps ensures that cardinality stats and sharding state don’t leak across fragment lifecycles. This matches how updateShardedHint() relies on totalFunctionCardinality being per-run.

Looks good as implemented.
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java (3)

107-139: Cursor construction and plan values now correctly delegate to AsyncGroupByAtom.

The cursor is now created as new AsyncGroupByRecordCursor(engine, recordFunctions, messageBus), aligning with the updated cursor API that no longer needs a separate groupByFunctions list.

toPlan() now uses frameSequence.getAtom().getOwnerGroupByFunctions() for "values", which is the authoritative set of group-by functions held by the atom.

This removes duplication of group-by function lists in the factory and keeps the plan representation tied to the actual atom state.

Looks good.

Also applies to: 187-189

216-238: Stats reset and sharding decisions are correctly integrated into aggregate paths.

Both aggregate() and filterAndAggregate() now:

Call fragment.resetLocalStats() immediately after acquiring the slot and before touching data.

Call atom.maybeEnableSharding(fragment) after processing the frame.

This ensures per-fragment cardinality stats are fresh for each frame and that sharding decisions take into account both row/map volume and function cardinality across owner and worker fragments. The symmetry between filtered and unfiltered paths avoids skew in the heuristic.

No issues spotted here.

Also applies to: 390-413

418-424: Freeing recordFunctions in _close() remains appropriate with new ownership model.

_close() still frees recordFunctions (which, per comment, includes group-by functions). With AsyncGroupByAtom now only clearing (not freeing) its group-by function lists in clear() and freeing them in close(), this matches the expected ownership: the factory is responsible for ultimate disposal.

Looks consistent.

core/src/test/java/io/questdb/test/ServerMainTest.java (1)

403-408: Updated expected cairo.sql.parallel.groupby.sharding.threshold matches new default.

Changing the expected value from 100000 to 10000 in the testShowParameters output string is consistent with the new default sharding threshold and keeps the integration test in sync with configuration.

Looks correct.

core/src/main/java/io/questdb/PropServerConfiguration.java (1)

1826-1833: Default sharding threshold change looks consistent

Lowering the default CAIRO_SQL_PARALLEL_GROUPBY_SHARDING_THRESHOLD to 10_000 is in line with the PR’s goal (trigger sharding earlier) and remains just a tuning knob; callers still override via config when needed. No issues from this constructor-side change alone.

core/src/main/java/io/questdb/std/Hash.java (1)

123-125: hashShort64 implementation is correct and consistent

Using fmix64(Short.toUnsignedLong(k)) matches the pattern of hashInt64/hashLong64 and gives a well‑mixed 64‑bit hash for 16‑bit keys. No changes needed here.

pkg/ami/marketplace/assets/server.conf (1)

349-356: Sample config now matches new default threshold

The commented cairo.sql.parallel.groupby.sharding.threshold=10000 aligns with the new Java default and keeps the AMI config documentation consistent. Looks good.

core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (1)

312-312: LGTM! Test updated to reflect the new default threshold.

The assertion correctly validates the lowered default parallel GROUP BY sharding threshold from 100,000 to 10,000, as described in the PR objectives.

core/src/main/java/io/questdb/cairo/map/Unordered2MapRecord.java (2)

33-33: LGTM! Import added to support hashing.

The Hash import is required for the keyHashCode() implementation below.

330-335: LGTM! Proper hash implementation for map sharding.

The keyHashCode() implementation correctly:

Hashes the first 16-bit segment of the key using Hash.hashShort64()

Includes clear documentation explaining the purpose (map sharding for high-cardinality count_distinct())

Replaces the previous placeholder return of 0

This enables the AsyncGroupByRecordCursorFactory to distribute keys across multiple Unordered2Maps when cardinality is high, as described in the PR objectives.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunctionFactory.java (2)

35-35: Minor formatting change.

Added blank line after class declaration for consistency.

54-58: LGTM! Simplified constructor call.

The CountDistinctIntGroupByFunction constructor now takes only capacity and loadFactor parameters, removing the workerCount parameter. This aligns with the PR's refactoring to make count_distinct() merge eager rather than lazy.

core/src/test/java/io/questdb/test/std/DirectLongLongDescListTest.java (2)

43-43: Enhanced test coverage with 10x more iterations.

Increasing N from 10,000 to 100,000 provides more thorough testing of the DirectLongLongDescList implementation.

45-64: LGTM! Added duplicate value testing.

The test now includes:

Random duplicate probability to test handling of repeated values

Conditional reassignment of v to create duplicates

Correct assertion pattern using vLong polled from the oracle rather than the loop variable

This enhances test coverage for duplicate scenarios while maintaining deterministic behavior (QuestDB tests use deterministic random seeds).

core/src/main/java/io/questdb/cairo/map/Unordered2Map.java (2)

38-38: LGTM! Import added to support hashing.

The Hash import is required for the hash() method implementation below.

359-364: LGTM! Proper hash implementation for map sharding.

The hash() method correctly:

Computes a hash of the short key using Hash.hashShort64()

Includes clear documentation explaining the purpose (map sharding for high-cardinality scenarios)

Replaces the previous no-op return of 0

This enables the AsyncGroupByRecordCursorFactory to distribute keys across multiple Unordered2Maps when count_distinct() has high cardinality, consistent with the PR objectives.

core/src/test/java/io/questdb/test/std/DirectLongLongAscListTest.java (2)

43-43: Enhanced test coverage with 10x more iterations.

Increasing N from 10,000 to 100,000 provides more thorough testing of the DirectLongLongAscList implementation.

45-64: LGTM! Added duplicate value testing.

The test enhancements match the pattern in DirectLongLongDescListTest:

Random duplicate probability to test handling of repeated values

Conditional reassignment of v to create duplicates

Correct assertion pattern using vLong polled from the oracle

This ensures consistent test coverage for both ascending and descending list implementations while maintaining deterministic behavior.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunctionFactory.java (2)

35-35: Minor formatting change.

Added blank line after class declaration for consistency.

54-58: LGTM! Simplified constructor call.

The CountDistinctLongGroupByFunction constructor now takes only capacity and loadFactor parameters, removing the workerCount parameter. This is consistent with the refactoring across all count_distinct() implementations to enable eager merging.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunctionFactory.java (2)

35-35: Minor formatting change.

Added blank line after class declaration for consistency.

54-58: LGTM! Simplified constructor call.

The CountDistinctSymbolGroupByFunction constructor now takes only capacity and loadFactor parameters, removing the workerCount parameter. This completes the consistent refactoring pattern across all count_distinct() function factories.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIntGroupByFunction.java (2)

35-40: Constructor change matches new int count_distinct backing sets

Constructor now wires two GroupByIntHashSet instances with Numbers.INT_NULL sentinel into the abstract base, consistent with the updated parallel/merge strategy and avoiding GroupByLongList. No issues spotted.

43-85: Cardinality updates are correctly tied to new distinct int values

cardinality++ only happens when:

we see the first non-null value for a group,

we upgrade from inlined value to a set with a genuinely different second value,

or we insert a new key into an existing set (keyIndex >= 0).

Duplicates and NULLs do not affect the stat, which is what you want for a “sum of per-group distincts” metric.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunctionFactory.java (1)

36-58: Factory wiring now matches 3‑arg IPv4 count_distinct constructor

newInstance correctly drops the workerCount parameter and passes (arg, countDistinctCapacity, countDistinctLoadFactor) to CountDistinctIPv4GroupByFunction. Looks consistent with the other factories.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctIPv4GroupByFunction.java (2)

35-41: IPv4 constructor update is consistent with Int/Symbol variants

Using two GroupByIntHashSet instances with Numbers.IPv4_NULL as the empty sentinel matches the new shared pattern and the in-code comment about zero-based nulls; nothing problematic here.

44-86: IPv4 cardinality tracking matches int implementation

The cardinality++ sites line up with the same three “first-time distinct” cases as in the int implementation (first non-null, inline→set transition with a new value, and set insertion where keyIndex >= 0). This keeps distinct-count stats coherent across types.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctSymbolGroupByFunction.java (2)

42-47: Symbol constructor aligned with shared Int-based abstraction

Dropping workerCount and wiring two GroupByIntHashSet instances keyed on VALUE_IS_NULL keeps the symbol variant in lockstep with the other int-backed count_distinct functions. No issues.

56-98: Symbol cardinality stat integrates cleanly with existing early-exit logic

cardinality++ is only updated when:

a non-null symbol is seen for an empty group,

we upgrade from a single inlined symbol to a set with a different second symbol,

we insert a new key into an existing set.

This leaves earlyExit() semantics untouched (they still use the per-group count vs knownSymbolCount) while providing a consistent “sum of per-group distincts” stat for heuristics.

core/src/main/java/io/questdb/griffin/engine/functions/groupby/CountDistinctLongGroupByFunction.java (1)

39-50: Cardinality stat wiring for long count_distinct looks correct; confirm intended semantics

The new cardinality field is:

Initialised implicitly to 0 and reset via resetStats().

Incremented only when a genuinely new, non‑NULL long value is added to a group:

first non‑NULL value (computeFirst / cnt == 0),

inline→set upgrade when the second value differs,

insertion into an existing set when keyIndex >= 0.

Exposed via getCardinalityStat().

This matches the int/IPv4/symbol patterns and effectively tracks “sum of per‑group distinct insertions for this function instance”, not the size of any particular backing set or a global de‑duplicated cardinality.

merge() does not adjust cardinality, so the stat is independent of inter‑map merges.

If AsyncGroupBy’s heuristics are based on “total number of distinct entries stored across this worker’s hash sets” (i.e., a cost proxy), this implementation is appropriate. If they instead expect a post‑merge/global cardinality, the behaviour should be documented or adjusted accordingly.

Also applies to: 59-100, 107-110, 214-217

bluestreak01 · 2025-11-23T03:32:02Z

@puzpuzpuz is there diff between graal vs corretto?

puzpuzpuz · 2025-11-23T05:04:33Z

@puzpuzpuz is there diff between graal vs corretto?

I only measured on OpenJDK. Will measure on Corretto and GraalVM CE and post the findings here.

puzpuzpuz · 2025-11-23T05:51:17Z

@bluestreak01 the findings are somewhat interesting.

GraalVM CE 17 is noticeably slower than both OpenJDK 17 and Corretto 17 in Cold Run:

Here, QuestDB (patch) stands for OpenJDK 17.

In Hot Run, the total score is the same, but GraalVM is 35% slower (0.5s) in Q28 (the regexp one) and 30% faster (120ms) in Q35.

This results in the following combined score:

perf(sql): speed up keyed parallel GROUP BY in case of high cardinali…

13adce2

…ty count_distinct()

puzpuzpuz self-assigned this Nov 22, 2025

puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Nov 22, 2025

Always update sharded hint

a32b2ba

puzpuzpuz marked this pull request as ready for review November 22, 2025 19:00

puzpuzpuz added 4 commits November 22, 2025 21:10

Add coefficient

d7157b8

Always use 256 shards

512ae81

Lower sharding threshold to 10k instead of 100k

3ce17dd

Fix wrong hash function in Unordered2MapRecord

f52284a

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

bluestreak01 approved these changes Nov 23, 2025

View reviewed changes

bluestreak01 merged commit 9780f30 into master Nov 23, 2025
41 checks passed

bluestreak01 deleted the puzpuzpuz_high_cardinality_count_distinct branch November 23, 2025 03:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(sql): speed up keyed parallel GROUP BY in case of high cardinality count_distinct() #6432

perf(sql): speed up keyed parallel GROUP BY in case of high cardinality count_distinct() #6432

Uh oh!

puzpuzpuz commented Nov 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 22, 2025 •

edited

Loading

Uh oh!

glasstiger commented Nov 22, 2025

Uh oh!

bluestreak01 commented Nov 23, 2025

Uh oh!

coderabbitai bot commented Nov 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

bluestreak01 commented Nov 23, 2025

Uh oh!

Uh oh!

puzpuzpuz commented Nov 23, 2025

Uh oh!

puzpuzpuz commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

perf(sql): speed up keyed parallel GROUP BY in case of high cardinality count_distinct() #6432

perf(sql): speed up keyed parallel GROUP BY in case of high cardinality count_distinct() #6432

Uh oh!

Conversation

puzpuzpuz commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

coderabbitai bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

glasstiger commented Nov 22, 2025

[PR Coverage check]

file detail

Uh oh!

bluestreak01 commented Nov 23, 2025

Uh oh!

coderabbitai bot commented Nov 23, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

bluestreak01 commented Nov 23, 2025

Uh oh!

Uh oh!

puzpuzpuz commented Nov 23, 2025

Uh oh!

puzpuzpuz commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

puzpuzpuz commented Nov 22, 2025 •

edited

Loading

coderabbitai bot commented Nov 22, 2025 •

edited

Loading