Skip to content

ADR-024: deferred done_entries — investigated, rejected#250

Closed
hardbyte wants to merge 7 commits into
mainfrom
feat/deferred-done-entries
Closed

ADR-024: deferred done_entries — investigated, rejected#250
hardbyte wants to merge 7 commits into
mainfrom
feat/deferred-done-entries

Conversation

@hardbyte
Copy link
Copy Markdown
Owner

@hardbyte hardbyte commented May 10, 2026

Status: REJECTED — revert committed in 053fec1

Investigated, prototyped, benchmarked, and backed out. The synchronous receipt-ring complete path stays the only first-class implementation.

What the bench actually showed

Same hardware as the 2026-05-09 sweep shape — producer-rate=50000, depth-target=4000, W=64, 30 s warmup + 60 s clean — local docker pg18:

mode completion (jobs/s) enqueue depth e2e p99
sync (this branch) 2,803 3,222 4,140 1,396 ms
deferred (final design) 1,839 2,023 3,895 1,954 ms

Deferred shipped 34 % lower completion-rate and 40 % higher p99. The first-attempt design (closure prune gated on materialiser progress) was much worse — multi-second stalls — and prompted a redesign that inlined the materialise pass in `prune_oldest_claims` itself. That redesign removed the stalls but didn't recover throughput parity, let alone deliver lift.

Why it didn't pan out

  • The `done_entries` INSERT is not the bottleneck on this workload. Sync mode at 2.8 k jobs/s on macOS docker / pg18 leaves plenty of headroom; the JSONB write isn't where the time goes.
  • The deferred design adds three coupled mechanisms (periodic materialiser timer, inline catch-up in `prune_oldest_claims`, rotation guard via the existing pending-ready check). Together they introduce partition-rotation contention that exceeds the saved hot-path WAL.
  • The read-side migration was still pending (`v016` UNION branch in `awa.jobs_compat()`). The ADR couldn't flip the default without it.

The simpler architecture wins. The throughput gap to pgque (14 k vs 40 k in the sweep) is structural — pgque trades feature surface and per-job durability for throughput, awa keeps both. The right awa improvements live elsewhere — the bloat-resistance drops under `idle_in_tx` and `active_readers` (75 % / 71 %) are awa-specific and closeable; that's where the next investigation should look.

What's kept

  • ADR-024, status flipped to Rejected, with the bench data recorded — captures the design space and why it didn't pan out.
  • `correctness/storage/AwaDeferredMaterialize.tla`, with a header comment noting rejection so future readers don't assume it lives in code. Useful as a record of the safety + liveness contract any future revisit would need to satisfy.
  • The two priority-aging tests (awa#249 — separate PR; this branch is a strict superset).

What's removed

`QueueStorageConfig.deferred_done_entries` field, `complete_runtime_batch`'s deferred branch, `materialise_done_entries` and `unmaterialised_closures_for_ready_partition` SQL helpers, `prune_oldest_claims`'s inline-materialise, the maintenance loop's materialise timer, the 5 ADR-024 integration tests, the `release_unique_claims_for_done_rows_tx` helper extraction (inlined back), and the bulk-added `deferred_done_entries: false` lines across the test suite.

Test plan

  • `cargo build --workspace` clean
  • `cargo clippy --workspace --tests -- -D warnings` clean
  • `cargo fmt --all`
  • `cargo test -p awa --test queue_storage_runtime_test` — 57/57 pass (was 63/63 with the deferred-mode tests)
  • A/B benchmark on local docker pg18 — see numbers above

hardbyte added 5 commits May 10, 2026 16:09
Focused safety+liveness spec for the proposed change where the
completion hot path writes only the lease_claim_closures row, and a
periodic maintenance pass joins closures with lease_claims and
ready_entries to materialise the terminal done_entries row out of
band.

The new safety hazard the spec is here to catch: a ready_entries
partition TRUNCATE that fires before the materialiser has
processed every closure pointing into that partition would orphan
the source data and leave done_entries permanently incomplete. The
rotation guard must block partition TRUNCATE on "all closures
referencing this partition have been materialised" — encoded as
the ClosureSourceNotPruned invariant.

State explored, multi-job config (Jobs={j1,j2}, 3 segs,
max run-lease 2): 1710 distinct states, depth 13. All seven
invariants hold:
  - TypeOK / ExactlyOneOpen — shape
  - ClosureSourceNotPruned — rotation guard works
  - ClaimedSourceNotPruned / ReadySourceNotPruned — claim/ready data
    survives until safe to drop
  - ReaderProjectionConsistent — reader UNION view sees terminal
    iff closure exists, regardless of materialisation timing
  - NoLostJob — no silent disappearance

Liveness (single-job, Materialise+Prune fairness): 34 distinct
states; EventuallyMaterialised holds.

Out of scope (per the abstraction note):
  - Postgres row locks — feedback says TLA+ at this level shouldn't
    pretend to model SQL locking; we model membership and lifecycle
  - retries / DLQ — orthogonal
  - heartbeat / deadline rescue — produces same closure shape, same
    materialisation path
Adds the architectural record for deferring done_entries
materialisation off the completion hot path, plus the two SQL
helpers the maintenance loop will drive:

- materialise_done_entries(pool, max_batch) — single-statement CTE
  that backfills done_entries from
  lease_claim_closures ⨝ lease_claims ⨝ ready_entries for closures
  with outcome='completed' that don't yet have a terminal index row.
  Filtered LEFT JOIN done_entries IS NULL is the idempotency guard;
  ON CONFLICT DO NOTHING covers the remaining race.
- unmaterialised_closures_for_ready_partition(pool, ready_slot) —
  the rotation guard precondition. Returns the number of
  outcome=completed closures pointing into the given partition with
  no done_entries row, so the partition-prune caller can drive the
  materialiser until it returns 0 before TRUNCATE.

Hot-path change in complete_runtime_batch: when
deferred_done_entries() is true, the receipt path skips
insert_done_rows_tx and runs only release_unique_claims_for_done_rows_tx
(extracted from the prefix of insert_done_rows_tx so both paths
honour the same unique-claim release contract).

Config knob: QueueStorageConfig.deferred_done_entries (default
false). Pre-flighted under env flag in the worker harness wire-up
(next commit).

TLA+ spec at correctness/storage/AwaDeferredMaterialize.tla — safety
+ liveness checked exhaustive on (Jobs={j1,j2}, 3 segments,
max_run_lease=2): 1710 distinct states, all invariants hold.
Maintenance service gains two knobs and a periodic timer:
- materialise_done_interval (default 100 ms)
- materialise_done_batch (default 4096)

The timer is only constructed when the storage backend is
queue_storage AND its deferred_done_entries flag is on, so the
non-receipt and non-deferred paths see zero overhead.

The handler `materialise_done_entries` is one SQL round-trip per
pass (the CTE in queue_storage::materialise_done_entries). It logs
at debug level when rows are drained and at warn on error;
swallows transient errors so a failed pass doesn't stop the maint
loop.

Four new integration tests against pg18:
- test_deferred_done_entries_skips_synchronous_done_insert: with
  deferred=true, complete writes only the closure; done_entries
  stays empty.
- test_materialiser_backfills_done_entries_from_closures: maint
  pass reconstructs done_entries from
  closures⨝lease_claims⨝ready_entries; args column matches the
  source ready_entries row exactly.
- test_materialiser_is_idempotent: re-running the pass on the
  same backlog yields zero new rows; no duplicates.
- test_unmaterialised_for_partition_drains_after_materialise:
  rotation-guard helper returns the expected pending count
  before maint and zero after.

Backward-compat fix in benchmark_test.rs: one exhaustive
QueueStorageConfig literal that pre-existed without
..Default::default(); now spread.
Implementation note: the existing pre-TRUNCATE pending-ready check
in `prune_oldest` already serves as the rotation guard the deferred
path needs.

Today's check is `LEFT JOIN done_entries WHERE done.lane_seq IS
NULL` — under the synchronous path this counted still-running /
never-started work. Under the deferred path the same predicate
*also* catches completed-but-not-materialised. When a closure has
landed but done_entries hasn't, the JOIN finds no match, count > 0,
and prune returns `SkippedActive { reason: QueuePendingReady }`.
Once the materialiser drains the backlog, the next prune tick
succeeds.

This means no new partition-level SQL was needed for the deferred
path; the materialiser task is the only addition. The dedicated
helper `unmaterialised_closures_for_ready_partition` is kept for
ops visibility but is not on the prune hot path.

`test_rotation_blocked_until_materialiser_drains` walks the full
pipeline end-to-end: enqueue → claim → complete (no done_entries) →
rotate → prune blocked → materialise → prune succeeds. Pins the
architectural claim.

ADR-024 updated to reflect the finding.
test_synchronous_path_still_writes_done_entries: runs with
deferred_done_entries=false and asserts done_entries lands
synchronously in complete_runtime_batch's tx. Catches the case
where extracting release_unique_claims_for_done_rows_tx
accidentally drops the wide INSERT.

ADR-024: adds an explicit "Still to land before flipping the
default" section. The hot-path skip, materialiser, and rotation
guard are correct and tested at the storage layer; the remaining
production-readiness gate is the read-side projection update —
awa.jobs_compat() needs a fifth UNION branch synthesising
"completed" for closures whose done_entries hasn't been
materialised yet. Until that migration ships (v016), the deferred
flag is safe for bench/throughput experiments but not for
production deployments with operators polling awa.jobs.

Full test suite for awa/tests/queue_storage_runtime_test.rs:
63 passed, 0 failed (6 new ADR-024 tests + 57 existing).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5f24e708-58f7-4ce3-8658-c5d12d7558b1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/deferred-done-entries

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Benchmarked the inline-prune-materialise version (the second iteration,
after the first design with closure-prune gating produced multi-second
stalls and was redesigned). At W=64 on macOS docker / pg18:

  sync     2,803 jobs/s, p99 1,396 ms
  deferred 1,839 jobs/s, p99 1,954 ms  (-34% / +40%)

Deferred mode was 34 % slower than the synchronous receipt-ring path,
not the 20-40 % faster the ADR predicted. The deferred design adds
three coupled mechanisms (periodic materialiser timer, inline
catch-up in prune_oldest_claims, the existing pending-ready check
acting as the rotation guard); together they introduce
partition-rotation contention that exceeds the saved hot-path WAL.
The done_entries INSERT was not the bottleneck on this workload —
the sync path has plenty of headroom.

Reverts:
- QueueStorageConfig.deferred_done_entries field + accessor
- complete_runtime_batch's deferred branch
- materialise_done_entries SQL helper
- unmaterialised_closures_for_ready_partition SQL helper
- prune_oldest_claims's inline-materialise block
- maintenance loop's materialise_done_entries timer + handler
- 5 ADR-024 integration tests (priority-aging tests stay)
- the release_unique_claims_for_done_rows_tx helper extraction
  (inlined back into insert_done_rows_tx)
- AWA_DEFER_DONE_ENTRIES env knob in awa-bench
- the bulk `deferred_done_entries: false` field added to existing
  QueueStorageConfig literals across the test suite

Kept as record of the design space:
- ADR-024 itself, status flipped to Rejected with the bench data
  recorded
- correctness/storage/AwaDeferredMaterialize.tla, with a header
  comment noting rejection so future readers don't assume it lives
  in code

Synchronous receipt-ring complete is the only first-class
implementation. The throughput gap to pgque is structural — pgque
trades feature surface and per-job durability; awa keeps both. The
right awa improvements are bloat-resistance under idle_in_tx /
active_readers (75 % / 71 % drops in the 2026-05-09 sweep), which
are awa-specific and closeable.
@hardbyte hardbyte changed the title ADR-024: deferred done_entries materialisation ADR-024: deferred done_entries — investigated, rejected May 10, 2026
@hardbyte
Copy link
Copy Markdown
Owner Author

Closing without merging. ADR-024 is captured in the PR description and commit history (52dc713..7cffa48) for future search. The TLA+ model is over-engineered preservation for a rejected design — a fresh spec would be needed for any future revisit anyway.

@hardbyte hardbyte closed this May 11, 2026
@hardbyte hardbyte deleted the feat/deferred-done-entries branch May 11, 2026 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant