fix(wal): correct write amplification and dedup counts in tables()#7047
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis PR introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@core/src/test/java/io/questdb/test/cairo/pool/RecentWriteTrackerIntegrationTest.java`:
- Around line 777-798: The test in RecentWriteTrackerIntegrationTest currently
reads table_write_amp_count into the variable count but doesn't assert it;
update the assertion after reading the record (inside the try-with-resources
using select(...) and factory.getCursor(sqlExecutionContext)) to require count
== 0 (with a helpful failure message referencing the sink or table name) to
ensure no positive amplification sample was recorded for the skip+TRUNCATE case;
keep the existing max < 10.0 check but add or replace with an explicit
Assert.assertEquals/Assert.assertTrue that verifies count is zero using the same
Record r/getLong(0) variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e3d346df-c728-4fc0-945f-044286d56090
📒 Files selected for processing (3)
core/src/main/java/io/questdb/cairo/TableWriter.javacore/src/main/java/io/questdb/cairo/wal/ApplyWal2TableJob.javacore/src/test/java/io/questdb/test/cairo/pool/RecentWriteTrackerIntegrationTest.java
ApplyWal2TableJob.processWalCommit reads two TableWriter counters per iteration to attribute physical writes and dedup work: - physicallyWrittenRowsSinceLastCommit accumulates into physicalRowsAdded for the per-job amplification sample, and into the WAL telemetry and Prometheus wal_apply_physically_written_rows counter. - dedupRowsRemovedSinceLastCommit feeds recordWalProcessed and stats.dedupRowCount, surfaced in tables() as wal_dedup_row_count_since_start. Both counters are LongAdders on TableWriter that previously only got reset by processWalBlock (called from commitWalInsertTransactions) and, for the physical-row counter only, by the non-WAL commit() path. Every other branch reachable from processWalCommit returns to the per-iteration reads without resetting: - DATA via trySkipWalTransactions when a future REPLACE_RANGE or TRUNCATE supersedes the current transaction. - SQL UPDATE that matches no rows (no internal commit triggered). - TRUNCATE via removeAllPartitions (commits the txn file, not the writer). - MAT_VIEW_INVALIDATE. - VIEW_DEFINITION (defensive only - views never carry DATA, so the bug pattern cannot form there in normal operation). When any of those iterations follows a successful data write in the same WAL apply job, the prior iteration's counter values are re-read and added again. The result is per-job amplifications and dedup counts that scale with the number of non-data iterations, which is what produces the thousand-fold P90 / P99 amplification spikes reported in tables() while the per-commit log line keeps showing low values - the log line is emitted once per job after the loop finishes, with the inflated total. The fix resets both counters at the start of every processWalCommit call. After the reset, both reads only see writes performed during the current iteration, regardless of which branch executes. Side effects covered by the same fix: - wal_apply_physically_written_rows Prometheus counter no longer over-counts on skip / no-op iterations. - sys.telemetry_wal physicalRowCount is no longer attributed to skipped transactions. - The per-job ampl= log line now matches what the per-iteration writes actually produced. Adds RecentWriteTrackerIntegrationTest cases that exercise each affected branch individually: - testWriteAmpNotInflatedByInterleavedUpdate (UPDATE matching 0 rows) - testWriteAmpNotInflatedByMatViewInvalidate (MAT_VIEW_INVALIDATE) - testWriteAmpNotInflatedBySkippedReplaceRange (trySkipWalTransactions) - testWriteAmpNotInflatedBySkippedTruncate (TRUNCATE skip, cross-drain pooled-writer leakage - reproduces the customer-reported magnitude) - testDedupCountNotInflatedBySkippedReplaceRange (dedup counter) - testWriteAmpPercentilesMatchPerCommitValues (baseline) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c6de4ac to
13bbb47
Compare
The IntelliJ formatter ran in CI and flattened over-indented continuation lines in two javadoc blocks. The "List of required changes" precheck (git diff --exit-code) failed and skipped the rest of the Rust Test and Lint job. Apply the same diff locally so the formatter precheck passes and the Rust suite can run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bluestreak01
left a comment
There was a problem hiding this comment.
Approved. Five minor nits flagged in review (SQL string underscores in test literals, doc comment for testDedupCountNotInflatedBySkippedReplaceRange, unasserted count, "iter" abbreviation, optional javadoc on resetWalApplyCounters) — all non-blocking.
[PR Coverage check]😍 pass : 4 / 4 (100.00%) file detail
|
Summary
tables().table_write_amp_p50/p90/p99/maxandtables().wal_dedup_row_count_since_startcould report values much larger than the per-commit log line for the same workload -- sometimes by several orders of magnitude -- when a WAL apply job interleaved data writes with non-data transactions.TableWritercounters (physicallyWrittenRowsSinceLastCommit,dedupRowsRemovedSinceLastCommit) were only reset byprocessWalBlock(and one of them also by the non-WALcommit()), so iterations ofApplyWal2TableJob.processWalCommitthat took non-resetting branches re-read the previous iteration's value when accumulating intophysicalRowsAdded,totalPhysicalRowCount, and the dedup count passed torecordWalProcessed.processWalCommitcall so each iteration's reads only see that iteration's writes, regardless of branch.Affected branches in
processWalCommitEach of the following could leave the counter stale for the next iteration's read when preceded by a successful data write:
DATAviatrySkipWalTransactions(a futureREPLACE_RANGEorTRUNCATEsupersedes the current txn)SQLUPDATEthat matches no rows (no internalcommit()triggered)TRUNCATE(removeAllPartitionscommits the txn file, not the writer)MAT_VIEW_INVALIDATEThe dedup mode of the skipped data txn (DEFAULT, UPSERT_NEW, NO_DEDUP) does not change exposure -- only the data branch matters. The
VIEW_DEFINITIONbranch also lacks an internal reset, but views never carryDATAtransactions in their WAL, so the bug pattern cannot form there in normal operation; the unconditional reset still covers it defensively.Side effects fixed by the same change
wal_apply_physically_written_rowsPrometheus counter no longer over-counts on skip / no-op iterations.sys.telemetry_wal.physicalRowCountis no longer attributed to skipped transactions.ampl=log line now matches what the iteration's writes actually produced.Tradeoffs
LongAdders, so the reset is constant-time and runs on the WAL apply thread.TableWriter.resetWalApplyCounters()increases the writer's API surface by one method, used only fromApplyWal2TableJob.Test plan
New cases in
RecentWriteTrackerIntegrationTestthat each exercise one affected branch:All five regression cases reproduce the inflated value (p50=2.0 or dedup count = 2x) before the fix and resolve to p50=1.0 / correct count after.