Skip to content

fix(wal): correct write amplification and dedup counts in tables()#7047

Merged
bluestreak01 merged 4 commits into
masterfrom
fix-tables-amp-dedup-counter-staleness
May 7, 2026
Merged

fix(wal): correct write amplification and dedup counts in tables()#7047
bluestreak01 merged 4 commits into
masterfrom
fix-tables-amp-dedup-counter-staleness

Conversation

@nwoolmer
Copy link
Copy Markdown
Contributor

@nwoolmer nwoolmer commented Apr 30, 2026

Summary

  • tables().table_write_amp_p50/p90/p99/max and tables().wal_dedup_row_count_since_start could report values much larger than the per-commit log line for the same workload -- sometimes by several orders of magnitude -- when a WAL apply job interleaved data writes with non-data transactions.
  • Two TableWriter counters (physicallyWrittenRowsSinceLastCommit, dedupRowsRemovedSinceLastCommit) were only reset by processWalBlock (and one of them also by the non-WAL commit()), so iterations of ApplyWal2TableJob.processWalCommit that took non-resetting branches re-read the previous iteration's value when accumulating into physicalRowsAdded, totalPhysicalRowCount, and the dedup count passed to recordWalProcessed.
  • The fix resets both counters at the start of every processWalCommit call so each iteration's reads only see that iteration's writes, regardless of branch.

Affected branches in processWalCommit

Each of the following could leave the counter stale for the next iteration's read when preceded by a successful data write:

  • DATA via trySkipWalTransactions (a future REPLACE_RANGE or TRUNCATE supersedes the current txn)
  • SQL UPDATE that matches no rows (no internal commit() triggered)
  • TRUNCATE (removeAllPartitions commits the txn file, not the writer)
  • MAT_VIEW_INVALIDATE

The dedup mode of the skipped data txn (DEFAULT, UPSERT_NEW, NO_DEDUP) does not change exposure -- only the data branch matters. The VIEW_DEFINITION branch also lacks an internal reset, but views never carry DATA transactions in their WAL, so the bug pattern cannot form there in normal operation; the unconditional reset still covers it defensively.

Side effects fixed by the same change

  • wal_apply_physically_written_rows Prometheus counter no longer over-counts on skip / no-op iterations.
  • sys.telemetry_wal.physicalRowCount is no longer attributed to skipped transactions.
  • The per-job ampl= log line now matches what the iteration's writes actually produced.

Tradeoffs

  • The fix adds a single reset call per WAL apply iteration. Both counters are LongAdders, so the reset is constant-time and runs on the WAL apply thread.
  • The new public TableWriter.resetWalApplyCounters() increases the writer's API surface by one method, used only from ApplyWal2TableJob.

Test plan

New cases in RecentWriteTrackerIntegrationTest that each exercise one affected branch:

  • testWriteAmpNotInflatedByInterleavedUpdate -- UPDATE matching 0 rows
  • testWriteAmpNotInflatedByMatViewInvalidate -- MAT_VIEW_INVALIDATE
  • testWriteAmpNotInflatedBySkippedReplaceRange -- trySkipWalTransactions via REPLACE_RANGE
  • testWriteAmpNotInflatedBySkippedTruncate -- trySkipWalTransactions via TRUNCATE, cross-drain pooled-writer leakage
  • testDedupCountNotInflatedBySkippedReplaceRange -- dedup counter
  • testWriteAmpPercentilesMatchPerCommitValues -- baseline contrast

All five regression cases reproduce the inflated value (p50=2.0 or dedup count = 2x) before the fix and resolve to p50=1.0 / correct count after.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2fcdc01d-c1f1-4ffe-8716-c521144f2a5f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR introduces a new TableWriter.resetWalApplyCounters() method to clear WAL apply metrics (physically written rows and deduplicated rows removed per commit), and invokes it at the start of each WAL transaction in ApplyWal2TableJob.processWalCommit(). The integration tests are expanded to verify that counter resets prevent stale metrics from accumulating across WAL iterations.

Changes

Cohort / File(s) Summary
WAL Apply Counters Reset
core/src/main/java/io/questdb/cairo/TableWriter.java, core/src/main/java/io/questdb/cairo/wal/ApplyWal2TableJob.java
Added resetWalApplyCounters() public method to clear per-commit WAL metrics, invoked at the start of processWalCommit() to prevent stale row/dedup counts from carrying into subsequent WAL transactions.
WAL Integration Tests
core/src/test/java/io/questdb/test/cairo/pool/RecentWriteTrackerIntegrationTest.java
Expanded test suite with multiple integration tests verifying amplification histogram percentiles for append-only inserts, UPDATE with no matches, and data-free operations, plus WAL-skip path coverage (TRUNCATE/REPLACE_RANGE) and dedup counter isolation across iterations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

storage

Suggested reviewers

  • bluestreak01
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: correcting write amplification and dedup counts in the tables() function when WAL transactions are processed, which is the core change across all three modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description comprehensively explains the bug (stale counter reads causing inflated write amplification and dedup counts), root cause, affected code branches, the fix, side effects, and test coverage with specific scenario names and expected outcomes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-tables-amp-dedup-counter-staleness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nwoolmer nwoolmer added Bug Incorrect or unexpected behavior WAL Core Related to storage, data type, etc. labels Apr 30, 2026
@nwoolmer
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@core/src/test/java/io/questdb/test/cairo/pool/RecentWriteTrackerIntegrationTest.java`:
- Around line 777-798: The test in RecentWriteTrackerIntegrationTest currently
reads table_write_amp_count into the variable count but doesn't assert it;
update the assertion after reading the record (inside the try-with-resources
using select(...) and factory.getCursor(sqlExecutionContext)) to require count
== 0 (with a helpful failure message referencing the sink or table name) to
ensure no positive amplification sample was recorded for the skip+TRUNCATE case;
keep the existing max < 10.0 check but add or replace with an explicit
Assert.assertEquals/Assert.assertTrue that verifies count is zero using the same
Record r/getLong(0) variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e3d346df-c728-4fc0-945f-044286d56090

📥 Commits

Reviewing files that changed from the base of the PR and between 0772c84 and c6de4ac.

📒 Files selected for processing (3)
  • core/src/main/java/io/questdb/cairo/TableWriter.java
  • core/src/main/java/io/questdb/cairo/wal/ApplyWal2TableJob.java
  • core/src/test/java/io/questdb/test/cairo/pool/RecentWriteTrackerIntegrationTest.java

ApplyWal2TableJob.processWalCommit reads two TableWriter counters per
iteration to attribute physical writes and dedup work:

- physicallyWrittenRowsSinceLastCommit accumulates into physicalRowsAdded
  for the per-job amplification sample, and into the WAL telemetry and
  Prometheus wal_apply_physically_written_rows counter.
- dedupRowsRemovedSinceLastCommit feeds recordWalProcessed and
  stats.dedupRowCount, surfaced in tables() as
  wal_dedup_row_count_since_start.

Both counters are LongAdders on TableWriter that previously only got
reset by processWalBlock (called from commitWalInsertTransactions) and,
for the physical-row counter only, by the non-WAL commit() path. Every
other branch reachable from processWalCommit returns to the per-iteration
reads without resetting:

- DATA via trySkipWalTransactions when a future REPLACE_RANGE or TRUNCATE
  supersedes the current transaction.
- SQL UPDATE that matches no rows (no internal commit triggered).
- TRUNCATE via removeAllPartitions (commits the txn file, not the writer).
- MAT_VIEW_INVALIDATE.
- VIEW_DEFINITION (defensive only - views never carry DATA, so the bug
  pattern cannot form there in normal operation).

When any of those iterations follows a successful data write in the same
WAL apply job, the prior iteration's counter values are re-read and added
again. The result is per-job amplifications and dedup counts that scale
with the number of non-data iterations, which is what produces the
thousand-fold P90 / P99 amplification spikes reported in tables() while
the per-commit log line keeps showing low values - the log line is
emitted once per job after the loop finishes, with the inflated total.

The fix resets both counters at the start of every processWalCommit
call. After the reset, both reads only see writes performed during the
current iteration, regardless of which branch executes.

Side effects covered by the same fix:
- wal_apply_physically_written_rows Prometheus counter no longer
  over-counts on skip / no-op iterations.
- sys.telemetry_wal physicalRowCount is no longer attributed to skipped
  transactions.
- The per-job ampl= log line now matches what the per-iteration writes
  actually produced.

Adds RecentWriteTrackerIntegrationTest cases that exercise each affected
branch individually:
- testWriteAmpNotInflatedByInterleavedUpdate (UPDATE matching 0 rows)
- testWriteAmpNotInflatedByMatViewInvalidate (MAT_VIEW_INVALIDATE)
- testWriteAmpNotInflatedBySkippedReplaceRange (trySkipWalTransactions)
- testWriteAmpNotInflatedBySkippedTruncate (TRUNCATE skip, cross-drain
  pooled-writer leakage - reproduces the customer-reported magnitude)
- testDedupCountNotInflatedBySkippedReplaceRange (dedup counter)
- testWriteAmpPercentilesMatchPerCommitValues (baseline)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nwoolmer nwoolmer force-pushed the fix-tables-amp-dedup-counter-staleness branch from c6de4ac to 13bbb47 Compare April 30, 2026 18:21
The IntelliJ formatter ran in CI and flattened over-indented
continuation lines in two javadoc blocks. The "List of required
changes" precheck (git diff --exit-code) failed and skipped the rest
of the Rust Test and Lint job. Apply the same diff locally so the
formatter precheck passes and the Rust suite can run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Five minor nits flagged in review (SQL string underscores in test literals, doc comment for testDedupCountNotInflatedBySkippedReplaceRange, unasserted count, "iter" abbreviation, optional javadoc on resetWalApplyCounters) — all non-blocking.

@mtopolnik
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 4 / 4 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/TableWriter.java 3 3 100.00%
🔵 io/questdb/cairo/wal/ApplyWal2TableJob.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit f92fdba into master May 7, 2026
51 checks passed
@bluestreak01 bluestreak01 deleted the fix-tables-amp-dedup-counter-staleness branch May 7, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Incorrect or unexpected behavior Core Related to storage, data type, etc. WAL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants