Skip to content

Conversation

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Nov 12, 2025

Users running latency-sensitive workloads will experience more consistent query response times, particularly at p99 and above.

Previously, memory unmapping operations could cause unpredictable latency spikes when they occurred during query execution. This change moves these expensive unmapping operations to a background thread, eliminating their impact on query latency. The result is smoother, more predictable performance - especially important for applications with strict SLA requirements or real-time analytics dashboards.

This change is initially an opt-in, to enable it add the following to your server configuration: cairo.file.async.munmap.enabled=true

Before:

QUERY LATENCY DISTRIBUTION:
  Min:         0.02 ms
  Avg:        46.46 ms
  P50:         5.55 ms
  P90:       141.57 ms
  P95:       199.55 ms
  P99:       359.68 ms
  P99.9:     697.34 ms
  P99.99:    942.08 ms
  P99.999:  1008.64 ms
  Max:      1041.41 ms
  StdDev:     79.96 ms

LATENCY HISTOGRAM:
  0-1ms:          1873966  (35.07%)
  1-2ms:           390217  ( 7.30%)
  2-5ms:           377154  ( 7.06%)
  5-10ms:          179387  ( 3.36%)
  10-20ms:         284298  ( 5.32%)
  20-50ms:         653023  (12.22%)
  50-100ms:        704733  (13.19%)
  100-200ms:       614812  (11.51%)
  200-500ms:       245347  ( 4.59%)
  500ms-1s:         20012  ( 0.37%)
  >1s:                 83  ( 0.00%)

After:

QUERY LATENCY DISTRIBUTION:
  Min:         0.02 ms
  Avg:        16.02 ms
  P50:         1.87 ms
  P90:        46.59 ms
  P95:        66.11 ms
  P99:       136.19 ms
  P99.9:     531.46 ms
  P99.99:    670.21 ms
  P99.999:   730.11 ms
  Max:       773.12 ms
  StdDev:     39.83 ms

LATENCY HISTOGRAM:
  0-1ms:          2187730  (41.05%)
  1-2ms:           517785  ( 9.71%)
  2-5ms:           595206  (11.17%)
  5-10ms:          368357  ( 6.91%)
  10-20ms:         385286  ( 7.23%)
  20-50ms:         804271  (15.09%)
  50-100ms:        373314  ( 7.00%)
  100-200ms:        60688  ( 1.14%)
  200-500ms:        29486  ( 0.55%)
  500ms-1s:          7630  ( 0.14%)

The numbers above are from a test with ParallelGC. If predictable latency is more important than raw throughput then switching to ZGC has a positive effect too.

@jerrinot jerrinot added the Performance Performance improvements label Nov 12, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This PR introduces asynchronous memory unmapping support to QuestDB's Cairo engine. It adds a new configuration property cairo.file.async.munmap.enabled, implements an async munmap pipeline via MmapCache singleton with a dedicated worker job, and enforces platform-specific constraints (Windows validation). The feature conditionally enqueues munmap tasks to a consumer thread instead of performing direct unmapping in the critical path.

Changes

Cohort / File(s) Summary
Configuration property registration
core/src/main/java/io/questdb/PropertyKey.java
New enum constant CAIRO_FILE_ASYNC_MUNMAP_ENABLED for property key registration.
Configuration interface and implementations
core/src/main/java/io/questdb/cairo/CairoConfiguration.java, core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java, core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java
Added getAsyncMunmapEnabled() accessor; implemented via delegation in wrapper and returns false by default.
Server configuration
core/src/main/java/io/questdb/PropServerConfiguration.java
Added asyncMunmapEnabled field wired from PropertyKey, Windows platform validation in constructor, and public getter via PropCairoConfiguration.
Async munmap core implementation
core/src/main/java/io/questdb/std/Files.java, core/src/main/java/io/questdb/std/MmapCache.java
Files: added ASYNC_MUNMAP_ENABLED flag and getMmapCache() accessor. MmapCache: converted to singleton, added asyncMunmap() method with dedicated task queue, consumer, and async pipeline for munmap operations.
Worker pool and server integration
core/src/main/java/io/questdb/mp/WorkerPoolUtils.java, core/src/main/java/io/questdb/ServerMain.java
New setupAsyncMunmapJob() in WorkerPoolUtils (reads engine config, validates POSIX, schedules job). New AsyncMunmapJob inner class in ServerMain extending SynchronizedJob to invoke cache.asyncMunmap() periodically.
Test infrastructure and coverage
core/src/test/java/io/questdb/test/ServerMainTest.java, core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java, core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java, core/src/test/java/io/questdb/test/std/FilesCacheFuzzTest.java, core/src/test/java/io/questdb/test/tools/TestUtils.java
Added async munmap test case, integrated setup call in fuzz tests, parameterized test flags, and async unmap calls in leak checks.

Sequence Diagram

sequenceDiagram
    participant Worker as AsyncMunmapJob
    participant Cache as MmapCache
    participant Queue as munmapTaskQueue
    participant Consumer as MunmapTask Consumer

    loop Periodic Execution
        Worker->>Cache: asyncMunmap()
        alt ASYNC_MUNMAP_ENABLED
            Cache->>Queue: drain & process enqueued tasks
            Queue->>Consumer: consume MunmapTask
            Consumer->>Consumer: perform munmap() + memory accounting
            Consumer-->>Cache: return unmapped count
        else Direct Path
            Cache->>Cache: fall back to direct munmap
        end
        Cache-->>Worker: return boolean (work done)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Core logic complexity: MmapCache async pipeline with task queue, consumer, and fallback paths requires careful verification of memory safety and race condition handling.
  • Configuration boilerplate: Multiple delegation/override patterns across CairoConfiguration implementations are repetitive but straightforward.
  • Worker pool scheduling: Platform validation and conditional setup logic in WorkerPoolUtils needs review for POSIX assertions.
  • Files to focus on:
    • MmapCache.java — async pipeline implementation, task consumer logic, and async/sync path branching
    • PropServerConfiguration.java — Windows validation enforcement and field wiring
    • WorkerPoolUtils.java — setupAsyncMunmapJob conditional logic and POSIX checks

Possibly related PRs

Suggested labels

Core

Suggested reviewers

  • mtopolnik
  • puzpuzpuz
  • jerrinot

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive No pull request description was provided by the author; the description section is empty. Provide a description explaining the motivation, approach, and impact of moving munmap() operations to a background job.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: moving munmap() to a background job to improve SQL latency, which aligns with the changeset.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot
Copy link
Contributor Author

@coderabbitai please review

@jerrinot jerrinot marked this pull request as draft November 12, 2025 10:30
@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

@jerrinot: I'll review the changes in this PR for you.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/cairo/TableReader.java (1)

1352-1406: Update line 531 to handle the return value intentionally.

The method signature change to return boolean now requires explicit handling at all call sites. Line 531 in goActiveAtTxn() discards the return value without explanation, while line 627 in reload() captures it as fastPath for performance diagnostics. Either capture the return value for logging (matching the reload() pattern) or add a suppression comment explaining why it's intentionally ignored.

🧹 Nitpick comments (9)
core/src/main/java/io/questdb/std/MmapCache.java (2)

339-350: Queue-full fallback is correct; add telemetry to see pressure.

The sync fallback when next() returns no slot is good. Consider incrementing a counter or log at a throttled rate to observe sustained queue pressure.


39-45: Make capacity configurable (optional).

Hardcoded MUNMAP_QUEUE_CAPACITY (8k) may be too small/large depending on workload. Consider a config key (e.g., cairo.file.async.munmap.queue.capacity) with a sensible default.

core/src/main/java/io/questdb/cairo/TableReader.java (3)

614-636: Slow reload logging: consider configurable threshold.

50ms is hardcoded. Expose via configuration (e.g., reader.slowReloadLogMs) to tune per deployment.


627-636: Slow reconcile logging: same threshold configurability applies.

Apply the same configurable threshold to reconcile logs.


817-835: Per-column close timing: good; consider throttled logging.

Logging only when >50ms is good. Ensure logger rate limiting is applied elsewhere to avoid bursts.

core/src/main/java/io/questdb/Bootstrap.java (1)

213-217: Flag wiring LGTM; consider surfacing in config report.

Add an advisory log line in reportValidateConfig() to show async munmap state for operators.

@@ void reportValidateConfig() {
-        if (cairoConfig != null) {
+        if (cairoConfig != null) {
             log.advisoryW().$(" - attach partition suffix: ").$(cairoConfig.getAttachPartitionSuffix()).$();
             log.advisoryW().$(" - open database [").$uuid(cairoConfig.getDatabaseIdLo(), cairoConfig.getDatabaseIdHi()).I$();
+            log.advisoryW().$(" - async munmap: ").$(cairoConfig.getAsyncMunmapEnabled()).$();
core/src/main/java/io/questdb/std/Files.java (2)

72-72: Consider volatile for the feature flag (optional).

If toggled post-bootstrap (even accidentally), visibility across threads isn’t guaranteed. Marking volatile is a cheap safety net.

-    public static boolean ASYNC_MUNMAP_ENABLED = false;
+    public static volatile boolean ASYNC_MUNMAP_ENABLED = false;

233-235: Avoid exposing the whole MmapCache; provide a narrow façade.

Expose a convenience Files.asyncMunmapOnce() instead of returning the cache to reduce accidental misuse.

-    public static MmapCache getMmapCache() {
-        return mmapCache;
-    }
+    public static MmapCache getMmapCache() {
+        return mmapCache; // keep for now if used elsewhere
+    }
+
+    // Convenience façade for the background job
+    public static boolean asyncMunmapOnce() {
+        return mmapCache.asyncMunmap();
+    }
core/src/main/java/io/questdb/ServerMain.java (1)

299-303: Verify the pool selection for async munmap operations.

The TODO comment raises a valid concern about using the shared query pool for munmap operations. During high query load, this pool may have significant contention, potentially defeating the purpose of offloading munmap to improve tail latency.

Consider whether:

  • The shared query pool has sufficient capacity for this additional job
  • A dedicated pool or the shared write pool would be more appropriate
  • Munmap operations might interfere with query execution latency

You might want to run performance tests under high query load to validate this choice.

Minor: The TODO comment uses lowercase "todo:" instead of the conventional "TODO:" format.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 873ab5d and fa0f0d8.

📒 Files selected for processing (10)
  • core/src/main/java/io/questdb/Bootstrap.java (1 hunks)
  • core/src/main/java/io/questdb/PropServerConfiguration.java (3 hunks)
  • core/src/main/java/io/questdb/PropertyKey.java (1 hunks)
  • core/src/main/java/io/questdb/ServerMain.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/TableReader.java (9 hunks)
  • core/src/main/java/io/questdb/std/Files.java (2 hunks)
  • core/src/main/java/io/questdb/std/MmapCache.java (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (14)
core/src/main/java/io/questdb/std/MmapCache.java (2)

330-337: Record pooling helper looks good.

Reuses pooled records correctly and initializes via of(...).


48-53: Consumer scheduler confirmed—no action needed.

Verification complete. AsyncMunmapJob exists (ServerMain.java:505), is scheduled to sharedPoolQuery when getAsyncMunmapEnabled() is true (line 299-302), and correctly calls asyncMunmap() (line 515). The configuration propagates through Bootstrap and Files. Sequences in MmapCache constructor are wired correctly as noted. Code is ready.

core/src/main/java/io/questdb/cairo/TableReader.java (3)

757-780: Deleted partition close: added timing/logs LGTM.

Includes thread and duration; helpful for hotspots.


802-809: Partition close timing/logs LGTM.

Clear visibility with thread and millis.


1727-1742: All call sites of reloadSlow(boolean) properly handled.

Verification confirms:

  • Line 615: return value captured and used for telemetry metrics
  • Line 148: return value ignored during constructor initialization (acceptable)
core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)

129-130: New config accessor LGTM.

API shape matches others.

core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1)

104-108: Delegation LGTM.

Consistent with other getters.

core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)

134-137: Default returns false — LGTM.

Matches safe default behavior.

core/src/main/java/io/questdb/PropertyKey.java (1)

609-609: LGTM!

The new configuration property follows established naming conventions and patterns. The property key is appropriately positioned among other Cairo file configuration properties.

core/src/main/java/io/questdb/ServerMain.java (2)

50-50: LGTM!

The new imports are necessary for the AsyncMunmapJob implementation and are appropriately added.

Also applies to: 52-52


505-517: LGTM!

The AsyncMunmapJob implementation is clean and follows the established pattern for background jobs in this codebase:

  • Properly extends SynchronizedJob to serialize execution
  • Minimal, focused responsibility
  • Consistent with other jobs like EngineMaintenanceJob in terms of error handling approach
core/src/main/java/io/questdb/PropServerConfiguration.java (3)

149-149: LGTM: new async munmap flag field

Consistent naming and placement; final + private is appropriate.


2982-2985: LGTM: getter exposed via CairoConfiguration

Getter matches interface style and correctly returns the field.


1531-1531: API and property key plumbing verified—no issues found.

All three CairoConfiguration implementors (PropCairoConfiguration, DefaultCairoConfiguration, CairoConfigurationWrapper) properly declare and implement getAsyncMunmapEnabled(). The property key CAIRO_FILE_ASYNC_MUNMAP_ENABLED is defined in PropertyKey.java and correctly read in PropServerConfiguration.java (line 1530) with a sensible default of false. The value is properly wired through Bootstrap.java (line 214) to Files.ASYNC_MUNMAP_ENABLED and consumed by ServerMain.java (line 299) to initialize AsyncMunmapJob. The configuration chain is complete and correctly integrated.

@jerrinot
Copy link
Contributor Author

@coderabbitai review again please

@coderabbitai
Copy link

coderabbitai bot commented Nov 12, 2025

@jerrinot: I'll perform a full review of the changes in this PR again.

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
core/src/main/java/io/questdb/std/Files.java (1)

233-235: Consider exposing specific operations rather than the cache instance.

Returning the internal mmapCache directly breaks encapsulation. Consider exposing only the specific operations needed by external components (e.g., asyncMunmap(), unmap()) rather than the entire cache instance.

core/src/main/java/io/questdb/std/MmapCache.java (1)

53-63: LGTM: ring buffer and sequences initialized correctly.

Capacity is a power of two; MP→SC barrier chaining looks right; asyncMunmap() delegates to SC consumer.

Rename munmapProducesSequence → munmapProducerSequence for clarity (typo).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 873ab5d and 4a0ac7c.

📒 Files selected for processing (9)
  • core/src/main/java/io/questdb/Bootstrap.java (1 hunks)
  • core/src/main/java/io/questdb/PropServerConfiguration.java (3 hunks)
  • core/src/main/java/io/questdb/PropertyKey.java (1 hunks)
  • core/src/main/java/io/questdb/ServerMain.java (4 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/std/Files.java (2 hunks)
  • core/src/main/java/io/questdb/std/MmapCache.java (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (11)
core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1)

104-107: LGTM!

The delegation pattern is consistent with other methods in this wrapper class.

core/src/main/java/io/questdb/Bootstrap.java (1)

214-214: Initialization follows existing pattern.

This initialization mirrors the existing FS_CACHE_ENABLED pattern on line 213, maintaining consistency. However, note the concerns raised in the Files.java review regarding mutable static state.

core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)

129-129: LGTM!

The interface method addition is straightforward and follows the existing contract pattern.

core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)

134-137: LGTM!

The default value of false is conservative and safe, aligning with the feature opt-in approach.

core/src/main/java/io/questdb/PropertyKey.java (1)

609-609: LGTM!

The property key follows the established naming convention and is positioned appropriately in the enum.

core/src/main/java/io/questdb/PropServerConfiguration.java (3)

149-149: LGTM: config flag added.

Field introduction is correct and immutable.


1531-1531: LGTM: property wiring.

Reads CAIRO_FILE_ASYNC_MUNMAP_ENABLED with sane default (false).

Please confirm PropertyKey.CAIRO_FILE_ASYNC_MUNMAP_ENABLED is defined and exported in PropertyKey and that Bootstrap initializes Files.ASYNC_MUNMAP_ENABLED from it.


2982-2986: LGTM: API surfaced in Cairo configuration.

Getter placement and visibility are consistent with other flags.

core/src/main/java/io/questdb/ServerMain.java (2)

191-197: LGTM: safe helper.

Returns 0 when HTTP not started; avoids NPE.


504-516: Minor: guard against initialization races.

If Files.getMmapCache() can return null early, this NPEs. If Files guarantees a non-null singleton here, ignore; otherwise add a null check or lazy retrieval in runSerially().

core/src/main/java/io/questdb/std/MmapCache.java (1)

350-372: Good non-blocking fallback; one note.

When enqueue fails (queue full), falling back to synchronous munmap with exception parity is correct. This relies on the consumer actually running; see ServerMain fix to ensure the job is scheduled on an active pool.

Consider exporting a lightweight gauge for queue depth to spot sustained backlogs.

@jerrinot jerrinot marked this pull request as ready for review November 12, 2025 14:31
@jerrinot jerrinot changed the title perf(core): improve tail latency by moving munmap to a backgorund job perf(core): improve tail latency by moving munmap() to a background job Nov 12, 2025
@jerrinot
Copy link
Contributor Author

unrelated test failure, fixed by #6389

Copy link
Collaborator

@ideoma ideoma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more testing. Please make the new config option random-driven in WalWriterFuzzTest

@bluestreak01 bluestreak01 changed the title perf(core): improve tail latency by moving munmap() to a background job perf(sql): improve SQL latency by moving munmap() to a background job Nov 15, 2025
@bluestreak01
Copy link
Member

@CodeRabbit review additional testing (and changes). Also other reviewers comments, i am concerned about assert Os.posix(). how's that passing on windows ?

@coderabbitai
Copy link

coderabbitai bot commented Nov 15, 2025

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Search for Os.posix() usage in the repository
echo "=== Searching for Os.posix() usage ==="
rg -n "Os\.posix\(\)" -C 5

echo -e "\n=== Looking at MmapCache.java in detail ==="
cat core/src/main/java/io/questdb/std/MmapCache.java

echo -e "\n=== Searching for test files related to MmapCache ==="
fd -e java -t f "MmapCache.*Test" --exec echo {} \;

echo -e "\n=== Searching for async munmap in test files ==="
rg -n "asyncMunmap|AsyncMunmap" --type java -g "*Test*.java" -C 2

Length of output: 22907


🏁 Script executed:

#!/bin/bash

# Search for Os.isPosix() usage
echo "=== Searching for Os.isPosix() usage ==="
rg -n "Os\.isPosix\(\)" -C 3

# Look at the specific test file mentioned
echo -e "\n=== Looking at WalWriterFuzzTest.java around async munmap ==="
cat core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java | head -n 700 | tail -n 100

# Check if there are assertions about POSIX in the codebase
echo -e "\n=== Searching for assertions with posix ==="
rg -n "assert.*[Pp]osix|Assert.*[Pp]osix" -C 2

# Look at Os.java to understand isPosix()
echo -e "\n=== Looking at Os.java implementation ==="
fd -e java -t f "^Os.java$" --exec cat {} \;

# Check ServerMain AsyncMunmapJob implementation
echo -e "\n=== Looking at ServerMain AsyncMunmapJob ==="
rg -n "class AsyncMunmapJob|AsyncMunmapJob" --type java -A 15 -B 2

# Check WorkerPoolUtils for async munmap setup
echo -e "\n=== Looking at WorkerPoolUtils async munmap setup ==="
cat core/src/main/java/io/questdb/mp/WorkerPoolUtils.java

Length of output: 34606


@bluestreak01: I'll review the additional testing and changes, and address your concern about the assert Os.isPosix() on Windows.

Analysis of assert Os.isPosix() Concern

The assertion at line 53 in WorkerPoolUtils.java is indeed a valid concern. Here's what's happening:

How it currently works on Windows:

  1. The assertion is inside the if (config.getAsyncMunmapEnabled()) block
  2. By default, getAsyncMunmapEnabled() returns false (from DefaultCairoConfiguration)
  3. Tests explicitly check Os.isPosix() before enabling async munmap (e.g., WalWriterFuzzTest line 644: boolean newAsyncMunmapEnabled = Os.isPosix() && rnd.nextBoolean())
  4. On Windows with default config: The assertion never executes because the if-condition is false
  5. On Windows if misconfigured: The assertion would fire, catching the error

The Problem:

Assertions can be disabled in production with the -da JVM flag. If someone misconfigures async munmap on Windows AND runs with assertions disabled, the code would attempt to use POSIX-specific functionality that doesn't work on Windows.

Recommendation:

Replace the assertion with a runtime check that throws an exception:

public static void setupAsyncMunmapJob(WorkerPool pool, CairoEngine engine) {
    CairoConfiguration config = engine.getConfiguration();
    if (config.getAsyncMunmapEnabled()) {
        if (!Os.isPosix()) {
            throw new UnsupportedOperationException("Async munmap is only supported on POSIX systems");
        }
        Files.ASYNC_MUNMAP_ENABLED = true;
        ServerMain.AsyncMunmapJob asyncMunmapJob = new ServerMain.AsyncMunmapJob();
        pool.assign(asyncMunmapJob);
    } else {
        Files.ASYNC_MUNMAP_ENABLED = false;
    }
}

This would provide stronger protection against misconfiguration in production environments.

Test Coverage:

The test coverage looks appropriate:

  • FilesCacheFuzzTest parameterizes based on Os.isPosix()
  • WalWriterFuzzTest explicitly guards with Os.isPosix() && rnd.nextBoolean()
  • ServerMainTest.testAsyncMunmap() tests the feature

I'll now perform a full review of all the changes.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@bluestreak01
Copy link
Member

@jerrinot this is a user-facing change (perf), could you add PR description to allow this change to be documented in the release notes

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
core/src/main/java/io/questdb/std/Files.java (1)

72-72: Make ASYNC_MUNMAP_ENABLED safely visible across threads

Public mutable static without visibility guarantees risks stale reads. At minimum, mark it volatile.

-    public static boolean ASYNC_MUNMAP_ENABLED = false;
+    public static volatile boolean ASYNC_MUNMAP_ENABLED = false;
core/src/main/java/io/questdb/ServerMain.java (1)

500-512: Consider moving AsyncMunmapJob to a top‑level class

Keeps ServerMain lean and aligns with other job classes. Functional change not required.

🧹 Nitpick comments (3)
core/src/test/java/io/questdb/test/tools/TestUtils.java (1)

2529-2529: Drain async munmap fully in LeakCheck

Ensure the queue is empty before/after tests; loop until idle.

-            Files.getMmapCache().asyncMunmap();
+            while (Files.getMmapCache().asyncMunmap()) { /* drain */ }
@@
-            Files.getMmapCache().asyncMunmap();
+            while (Files.getMmapCache().asyncMunmap()) { /* drain */ }

Also applies to: 2569-2570

core/src/test/java/io/questdb/test/std/FilesCacheFuzzTest.java (1)

69-72: Test toggling of async munmap — minor ergonomics

  • OS‑gated parameter matrix — good.
  • Consider moving Files.ASYNC_MUNMAP_ENABLED assignment from constructor to @before and restore the previous value in @after to reduce inter‑test coupling if runner parallelism is enabled. Current @afterclass reset is fine if tests are strictly serial.

Also applies to: 74-84, 98-99

core/src/main/java/io/questdb/mp/WorkerPoolUtils.java (1)

50-60: Replace assert with runtime guard for OS check

Assertions may be disabled; make the constraint explicit to avoid accidental enablement on non‑POSIX.

-        if (config.getAsyncMunmapEnabled()) {
-            assert Os.isPosix();
+        if (config.getAsyncMunmapEnabled()) {
+            if (!Os.isPosix()) {
+                throw new UnsupportedOperationException("Async munmap is not supported on non-POSIX systems");
+            }
             Files.ASYNC_MUNMAP_ENABLED = true;
             ServerMain.AsyncMunmapJob asyncMunmapJob = new ServerMain.AsyncMunmapJob();
             pool.assign(asyncMunmapJob);
         } else {
             Files.ASYNC_MUNMAP_ENABLED = false;
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a0ac7c and decb7d4.

📒 Files selected for processing (10)
  • core/src/main/java/io/questdb/PropServerConfiguration.java (3 hunks)
  • core/src/main/java/io/questdb/ServerMain.java (4 hunks)
  • core/src/main/java/io/questdb/mp/WorkerPoolUtils.java (2 hunks)
  • core/src/main/java/io/questdb/std/Files.java (2 hunks)
  • core/src/main/java/io/questdb/std/MmapCache.java (4 hunks)
  • core/src/test/java/io/questdb/test/ServerMainTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java (2 hunks)
  • core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (23 hunks)
  • core/src/test/java/io/questdb/test/std/FilesCacheFuzzTest.java (3 hunks)
  • core/src/test/java/io/questdb/test/tools/TestUtils.java (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/src/main/java/io/questdb/PropServerConfiguration.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (34)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (17)
core/src/test/java/io/questdb/test/cairo/fuzz/AbstractFuzzTest.java (1)

198-198: Async munmap job wiring — LGTM

Job is assigned before pool start; no side effects when disabled via config.

Also applies to: 215-215

core/src/test/java/io/questdb/test/ServerMainTest.java (2)

66-101: Async munmap bootstrap test — LGTM

Covers POSIX vs Windows paths and resets static state in finally.


457-457: Parameters expectation update — LGTM

Adding cairo.file.async.munmap.enabled with default false matches config exposure.

core/src/main/java/io/questdb/std/Files.java (1)

233-236: Singleton mmap cache accessor — LGTM

Getter aligns usages and avoids multiple instances.

core/src/main/java/io/questdb/ServerMain.java (2)

191-196: Active connection count accessor — LGTM

Safe default to 0 when HTTP server is not initialized.


299-299: Initialize async munmap before query jobs — LGTM

Ordering ensures the consumer job is present prior to query workload.

core/src/test/java/io/questdb/test/cairo/fuzz/WalWriterFuzzTest.java (5)

28-28: LGTM: Clean addition of async munmap test infrastructure.

The new imports and static field are properly scoped for testing the async munmap feature. Initializing ASYNC_MUNMAP to false provides a safe default.

Also applies to: 35-35, 63-63


70-71: LGTM: Async munmap property correctly configured in test setup.

The property is set before AbstractCairoTest.setUpStatic(), ensuring the configuration takes effect during test infrastructure initialization.


75-79: LGTM: Proper test cleanup ensures isolation.

Resetting Files.ASYNC_MUNMAP_ENABLED to false in the @AfterClass method prevents state leakage between test classes and ensures proper test isolation.


642-658: LGTM: Platform-aware test parameterization with proper state management.

The method correctly guards async munmap enablement with Os.isPosix() (line 644), preventing Windows compatibility issues. Conditional reinitialization (lines 648-657) efficiently applies configuration changes only when parameters differ, improving test performance while maintaining correctness.


102-102: LGTM: Comprehensive test coverage of async munmap across fuzz test scenarios.

All parameterized tests consistently call setTestParams(rnd), ensuring the async munmap feature is exercised across diverse test scenarios with randomized configurations.

Also applies to: 162-162, 174-174, 185-185, 214-214, 266-266, 294-294, 304-304, 332-332, 362-362, 390-390, 400-400, 410-410, 447-447, 457-457, 468-468, 506-506, 536-536, 562-562, 617-617

core/src/main/java/io/questdb/std/MmapCache.java (6)

39-59: LGTM: Clean singleton pattern with properly initialized async munmap infrastructure.

The singleton pattern is appropriate for a shared cache. The ring queue with 8K capacity and properly chained producer/consumer sequences provides a solid foundation for async munmap operations. The concurrency primitives used (MPSequence, SCSequence, RingQueue) are platform-independent.


61-69: LGTM: Clear async munmap processing with documented threading constraints.

The method correctly drains all pending munmap tasks using consumeAll. The documentation clearly states the single-threaded requirement, which is enforced by the SCSequence (single-consumer) design. The boolean return value enables callers to optimize scheduling based on whether work was performed.


336-348: LGTM: Proper async munmap error handling with errno logging.

The consumer correctly logs errno on failure (line 341, 345), addressing previous review feedback. Memory accounting is only updated on success (lines 338-340), which is correct—a failed munmap means the mapping remains active. The critical log level appropriately signals silent failures in the background job.


359-388: LGTM: Well-designed async/sync munmap flow with appropriate fallbacks.

The implementation properly handles multiple scenarios:

Async path (lines 360-378):

  • CAS retry loop (lines 364-366) with Os.pause() prevents tight spinning while handling transient contention
  • Successful enqueue offloads munmap to background thread (lines 368-374)
  • Queue-full fallback to synchronous munmap (lines 375-377) ensures correctness under extreme load at the cost of latency benefit

Sync path (lines 379-388):

  • Throws exceptions on failure, providing immediate feedback to callers
  • Memory accounting updated only on success, consistent with async path

The asymmetry in error handling (async logs, sync throws) is appropriate given that async operations cannot propagate exceptions back to the original caller.


420-424: LGTM: Clean data holder for munmap tasks.

Simple, focused data structure containing exactly the fields needed for munmap operations (address, size) and memory accounting (memoryTag).


360-360: Platform-specific initialization is correctly implemented — no fixes required.

The configuration layer (PropServerConfiguration:1530-1533) validates the async munmap setting and throws a ServerConfigurationException if enabled on Windows. The setup layer (WorkerPoolUtils:52-54) includes an assertion assert Os.isPosix() before enabling Files.ASYNC_MUNMAP_ENABLED. Tests properly guard async munmap on POSIX systems (WalWriterFuzzTest:644). Windows compatibility is well-protected.

@glasstiger
Copy link
Contributor

[PR Coverage check]

😍 pass : 60 / 73 (82.19%)

file detail

path covered line new line coverage
🔵 io/questdb/std/MmapCache.java 34 45 75.56%
🔵 io/questdb/PropServerConfiguration.java 3 4 75.00%
🔵 io/questdb/ServerMain.java 3 4 75.00%
🔵 io/questdb/std/AsyncMunmapJob.java 4 4 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 1 100.00%
🔵 io/questdb/std/Files.java 4 4 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/mp/WorkerPoolUtils.java 9 9 100.00%

@bluestreak01 bluestreak01 merged commit 873c8c2 into master Nov 20, 2025
41 checks passed
@bluestreak01 bluestreak01 deleted the jh_async_munmap branch November 20, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants