ADR-024: deferred done_entries — investigated, rejected#250
Closed
hardbyte wants to merge 7 commits into
Closed
Conversation
Focused safety+liveness spec for the proposed change where the
completion hot path writes only the lease_claim_closures row, and a
periodic maintenance pass joins closures with lease_claims and
ready_entries to materialise the terminal done_entries row out of
band.
The new safety hazard the spec is here to catch: a ready_entries
partition TRUNCATE that fires before the materialiser has
processed every closure pointing into that partition would orphan
the source data and leave done_entries permanently incomplete. The
rotation guard must block partition TRUNCATE on "all closures
referencing this partition have been materialised" — encoded as
the ClosureSourceNotPruned invariant.
State explored, multi-job config (Jobs={j1,j2}, 3 segs,
max run-lease 2): 1710 distinct states, depth 13. All seven
invariants hold:
- TypeOK / ExactlyOneOpen — shape
- ClosureSourceNotPruned — rotation guard works
- ClaimedSourceNotPruned / ReadySourceNotPruned — claim/ready data
survives until safe to drop
- ReaderProjectionConsistent — reader UNION view sees terminal
iff closure exists, regardless of materialisation timing
- NoLostJob — no silent disappearance
Liveness (single-job, Materialise+Prune fairness): 34 distinct
states; EventuallyMaterialised holds.
Out of scope (per the abstraction note):
- Postgres row locks — feedback says TLA+ at this level shouldn't
pretend to model SQL locking; we model membership and lifecycle
- retries / DLQ — orthogonal
- heartbeat / deadline rescue — produces same closure shape, same
materialisation path
Adds the architectural record for deferring done_entries
materialisation off the completion hot path, plus the two SQL
helpers the maintenance loop will drive:
- materialise_done_entries(pool, max_batch) — single-statement CTE
that backfills done_entries from
lease_claim_closures ⨝ lease_claims ⨝ ready_entries for closures
with outcome='completed' that don't yet have a terminal index row.
Filtered LEFT JOIN done_entries IS NULL is the idempotency guard;
ON CONFLICT DO NOTHING covers the remaining race.
- unmaterialised_closures_for_ready_partition(pool, ready_slot) —
the rotation guard precondition. Returns the number of
outcome=completed closures pointing into the given partition with
no done_entries row, so the partition-prune caller can drive the
materialiser until it returns 0 before TRUNCATE.
Hot-path change in complete_runtime_batch: when
deferred_done_entries() is true, the receipt path skips
insert_done_rows_tx and runs only release_unique_claims_for_done_rows_tx
(extracted from the prefix of insert_done_rows_tx so both paths
honour the same unique-claim release contract).
Config knob: QueueStorageConfig.deferred_done_entries (default
false). Pre-flighted under env flag in the worker harness wire-up
(next commit).
TLA+ spec at correctness/storage/AwaDeferredMaterialize.tla — safety
+ liveness checked exhaustive on (Jobs={j1,j2}, 3 segments,
max_run_lease=2): 1710 distinct states, all invariants hold.
Maintenance service gains two knobs and a periodic timer: - materialise_done_interval (default 100 ms) - materialise_done_batch (default 4096) The timer is only constructed when the storage backend is queue_storage AND its deferred_done_entries flag is on, so the non-receipt and non-deferred paths see zero overhead. The handler `materialise_done_entries` is one SQL round-trip per pass (the CTE in queue_storage::materialise_done_entries). It logs at debug level when rows are drained and at warn on error; swallows transient errors so a failed pass doesn't stop the maint loop. Four new integration tests against pg18: - test_deferred_done_entries_skips_synchronous_done_insert: with deferred=true, complete writes only the closure; done_entries stays empty. - test_materialiser_backfills_done_entries_from_closures: maint pass reconstructs done_entries from closures⨝lease_claims⨝ready_entries; args column matches the source ready_entries row exactly. - test_materialiser_is_idempotent: re-running the pass on the same backlog yields zero new rows; no duplicates. - test_unmaterialised_for_partition_drains_after_materialise: rotation-guard helper returns the expected pending count before maint and zero after. Backward-compat fix in benchmark_test.rs: one exhaustive QueueStorageConfig literal that pre-existed without ..Default::default(); now spread.
Implementation note: the existing pre-TRUNCATE pending-ready check
in `prune_oldest` already serves as the rotation guard the deferred
path needs.
Today's check is `LEFT JOIN done_entries WHERE done.lane_seq IS
NULL` — under the synchronous path this counted still-running /
never-started work. Under the deferred path the same predicate
*also* catches completed-but-not-materialised. When a closure has
landed but done_entries hasn't, the JOIN finds no match, count > 0,
and prune returns `SkippedActive { reason: QueuePendingReady }`.
Once the materialiser drains the backlog, the next prune tick
succeeds.
This means no new partition-level SQL was needed for the deferred
path; the materialiser task is the only addition. The dedicated
helper `unmaterialised_closures_for_ready_partition` is kept for
ops visibility but is not on the prune hot path.
`test_rotation_blocked_until_materialiser_drains` walks the full
pipeline end-to-end: enqueue → claim → complete (no done_entries) →
rotate → prune blocked → materialise → prune succeeds. Pins the
architectural claim.
ADR-024 updated to reflect the finding.
test_synchronous_path_still_writes_done_entries: runs with deferred_done_entries=false and asserts done_entries lands synchronously in complete_runtime_batch's tx. Catches the case where extracting release_unique_claims_for_done_rows_tx accidentally drops the wide INSERT. ADR-024: adds an explicit "Still to land before flipping the default" section. The hot-path skip, materialiser, and rotation guard are correct and tested at the storage layer; the remaining production-readiness gate is the read-side projection update — awa.jobs_compat() needs a fifth UNION branch synthesising "completed" for closures whose done_entries hasn't been materialised yet. Until that migration ships (v016), the deferred flag is safe for bench/throughput experiments but not for production deployments with operators polling awa.jobs. Full test suite for awa/tests/queue_storage_runtime_test.rs: 63 passed, 0 failed (6 new ADR-024 tests + 57 existing).
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Benchmarked the inline-prune-materialise version (the second iteration, after the first design with closure-prune gating produced multi-second stalls and was redesigned). At W=64 on macOS docker / pg18: sync 2,803 jobs/s, p99 1,396 ms deferred 1,839 jobs/s, p99 1,954 ms (-34% / +40%) Deferred mode was 34 % slower than the synchronous receipt-ring path, not the 20-40 % faster the ADR predicted. The deferred design adds three coupled mechanisms (periodic materialiser timer, inline catch-up in prune_oldest_claims, the existing pending-ready check acting as the rotation guard); together they introduce partition-rotation contention that exceeds the saved hot-path WAL. The done_entries INSERT was not the bottleneck on this workload — the sync path has plenty of headroom. Reverts: - QueueStorageConfig.deferred_done_entries field + accessor - complete_runtime_batch's deferred branch - materialise_done_entries SQL helper - unmaterialised_closures_for_ready_partition SQL helper - prune_oldest_claims's inline-materialise block - maintenance loop's materialise_done_entries timer + handler - 5 ADR-024 integration tests (priority-aging tests stay) - the release_unique_claims_for_done_rows_tx helper extraction (inlined back into insert_done_rows_tx) - AWA_DEFER_DONE_ENTRIES env knob in awa-bench - the bulk `deferred_done_entries: false` field added to existing QueueStorageConfig literals across the test suite Kept as record of the design space: - ADR-024 itself, status flipped to Rejected with the bench data recorded - correctness/storage/AwaDeferredMaterialize.tla, with a header comment noting rejection so future readers don't assume it lives in code Synchronous receipt-ring complete is the only first-class implementation. The throughput gap to pgque is structural — pgque trades feature surface and per-job durability; awa keeps both. The right awa improvements are bloat-resistance under idle_in_tx / active_readers (75 % / 71 % drops in the 2026-05-09 sweep), which are awa-specific and closeable.
Owner
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: REJECTED — revert committed in 053fec1
Investigated, prototyped, benchmarked, and backed out. The synchronous receipt-ring complete path stays the only first-class implementation.
What the bench actually showed
Same hardware as the 2026-05-09 sweep shape — producer-rate=50000, depth-target=4000, W=64, 30 s warmup + 60 s clean — local docker pg18:
Deferred shipped 34 % lower completion-rate and 40 % higher p99. The first-attempt design (closure prune gated on materialiser progress) was much worse — multi-second stalls — and prompted a redesign that inlined the materialise pass in `prune_oldest_claims` itself. That redesign removed the stalls but didn't recover throughput parity, let alone deliver lift.
Why it didn't pan out
The simpler architecture wins. The throughput gap to pgque (14 k vs 40 k in the sweep) is structural — pgque trades feature surface and per-job durability for throughput, awa keeps both. The right awa improvements live elsewhere — the bloat-resistance drops under `idle_in_tx` and `active_readers` (75 % / 71 %) are awa-specific and closeable; that's where the next investigation should look.
What's kept
What's removed
`QueueStorageConfig.deferred_done_entries` field, `complete_runtime_batch`'s deferred branch, `materialise_done_entries` and `unmaterialised_closures_for_ready_partition` SQL helpers, `prune_oldest_claims`'s inline-materialise, the maintenance loop's materialise timer, the 5 ADR-024 integration tests, the `release_unique_claims_for_done_rows_tx` helper extraction (inlined back), and the bulk-added `deferred_done_entries: false` lines across the test suite.
Test plan