Implement ready tombstone ledger#323
Conversation
|
Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine. 📝 WalkthroughWalkthroughThis PR introduces a ready-tombstones ledger to AWA's queue-storage system, replacing physical deletion of ready entries with append-only tombstone records for cancellation and priority aging. The claim-cursor advancement logic is updated to treat tombstoned lanes as committed spent evidence and only advance across safe contiguous prefixes. Ready backing rows are retained for terminal-fact hydration until segment reclamation. ChangesReady Tombstones Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61d099bb4d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let truncate = sqlx::query(&format!("TRUNCATE TABLE {ready_child}, {done_child}",)) | ||
| .execute(tx.as_mut()) | ||
| .await; | ||
| let tomb_child = format!("{schema}.ready_tombstones_{slot}"); |
There was a problem hiding this comment.
Avoid letting tombstoned ready rows block pruning
When a ready row is tombstoned without a matching done_entries row (for example priority aging, or DELETE FROM awa.jobs for an available job), the pending-ready guard just above this still counts that retained backing row because it only left-joins done_entries and does not anti-join ready_tombstones. That means the newly added truncation of the tombstone partition is never reached for any queue slot containing those rows, so the slot cannot be reclaimed and later queue-ring rotation will keep seeing the non-empty ready_entries_* partition as busy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
awa-model/src/queue_storage.rs (3)
2596-2615:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winReset must truncate
ready_tombstonestoo.
reset()rewinds ring generations, lane heads, and job IDs, but it leavesready_tombstonesbehind. After that, fresh ready rows can reuse the same(ready_slot, ready_generation, queue, priority, enqueue_shard, lane_seq)key and get filtered out by stale tombstones, which makes new jobs unclaimable after a reset.Suggested fix
TRUNCATE {schema}.ready_entries, + {schema}.ready_tombstones, {schema}.done_entries, {schema}.dlq_entries, {schema}.leases,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 2596 - 2615, The reset() SQL TRUNCATE call omits the ready_tombstones table, causing stale tombstones to block reuse of (ready_slot, ready_generation, queue, priority, enqueue_shard, lane_seq) keys after reset; update the TRUNCATE list in the SQL executed by reset() (the query built with sqlx::query(&format!(...)) surrounding tables like ready_entries, done_entries, etc.) to also include {schema}.ready_tombstones so tombstones are cleared alongside the other tables.
6825-6875:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFilter tombstoned ready rows out of
load_job().
age_waiting_priorities()now keeps the source ready row and marks it tombstoned.load_job()still reads everyready_entriesrow for the job, so a reprioritized job can return the tombstoned snapshot instead of the live lane because both ready candidates tie on state/run_lease/run_at. That makes the reported priority nondeterministic after aging.Suggested fix
- FROM {schema}.ready_entries - WHERE job_id = $1 + FROM {schema}.ready_entries AS ready + WHERE job_id = $1 + AND NOT EXISTS ( + SELECT 1 FROM {schema}.ready_tombstones AS tomb + WHERE tomb.queue = ready.queue + AND tomb.priority = ready.priority + AND tomb.enqueue_shard = ready.enqueue_shard + AND tomb.lane_seq = ready.lane_seq + AND tomb.ready_slot = ready.ready_slot + AND tomb.ready_generation = ready.ready_generation + ) ORDER BY run_lease DESC, attempted_at DESC NULLS LAST, run_at DESCAlso applies to: 7854-7874
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 6825 - 6875, load_job() must ignore ready_entries rows that have been tombstoned; update the SELECT-from-{schema}.ready_entries query used by load_job() to exclude any row present in {schema}.ready_tombstones by adding the same tombstone predicate used in the diff (match on queue, priority, enqueue_shard, lane_seq, ready_slot, ready_generation), e.g. add a NOT EXISTS(...) or LEFT JOIN ... WHERE tomb.ready_slot IS NULL filter so the tombstoned snapshot is not returned. Make the identical change to the second occurrence noted (the other block around lines 7854-7874) so both load_job() query paths filter out tombstoned ready rows.
10664-10717:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExclude tombstoned lanes from the prune gate.
pendingstill counts any ready row without a matchingdone_entriesrow. After priority aging, the source lane is intentionally left inready_entrieswith only a tombstone, so this gate will keep treating that slot as live andprune_oldest()can stop reclaiming queue partitions.correctness/storage/MAPPING.md:96describes this check as an anti-join againstready_tombstones.Suggested fix
+ let tomb_child = format!("{schema}.ready_tombstones_{slot}"); let pending: i64 = sqlx::query_scalar(&format!( r#" SELECT count(*)::bigint FROM {ready_child} AS ready + LEFT JOIN {tomb_child} AS tomb + ON tomb.ready_slot = ready.ready_slot + AND tomb.ready_generation = ready.ready_generation + AND tomb.queue = ready.queue + AND tomb.priority = ready.priority + AND tomb.enqueue_shard = ready.enqueue_shard + AND tomb.lane_seq = ready.lane_seq LEFT JOIN {done_child} AS done ON done.ready_generation = ready.ready_generation AND done.queue = ready.queue AND done.priority = ready.priority AND done.enqueue_shard = ready.enqueue_shard AND done.lane_seq = ready.lane_seq WHERE done.lane_seq IS NULL + AND tomb.lane_seq IS NULL "# )) .fetch_one(tx.as_mut()) .await .map_err(map_sqlx_error)?; - - let tomb_child = format!("{schema}.ready_tombstones_{slot}");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 10664 - 10717, The pending-count query (producing variable pending) currently counts ready rows that have no matching done row but does not exclude lanes that are tombstoned, so slots left with only tombstones block pruning; modify the SQL used in the sqlx::query_scalar call referencing {ready_child} and {done_child} to LEFT JOIN the corresponding ready_tombstones partition (ready_tombstones_{slot}) and add a WHERE clause requiring the tombstone join to be NULL (e.g. AND rt.ready_generation IS NULL or similar) so that rows that are tombstoned are excluded from the anti-join count; update the query string used by the pending calculation accordingly.awa-worker/src/client.rs (1)
1971-1999:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't swallow queue-availability query failures in the health check.
Falling back to
unwrap_or_default()here means a broken queue-storage query still produceshealthy = trueas long asSELECT 1succeeds. With the newready_tombstonesdependency, a half-prepared or mismatched substrate now shows up as “empty queue” instead of an unhealthy runtime. Please log the error and fold availability-query success into thehealthyresult rather than treating failure as an empty set.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-worker/src/client.rs` around lines 1971 - 1999, The availability query currently swallows errors by using unwrap_or_default() on the sqlx::query_as(...).fetch_all(&self.pool).await call in the block that assigns available_rows (inside the effective_storage.queue_storage_store() branch); change this to propagate or handle the Result: on Err, log the error via the existing logger and set the overall health flag to unhealthy (do not treat failure as an empty set), while on Ok use the returned rows. Update the health-check logic that uses available_rows to consider the query outcome (failure => healthy = false) rather than treating a query error as "no available rows."
🧹 Nitpick comments (1)
awa-model/src/admin.rs (1)
271-338: 🏗️ Heavy liftFactor the tombstone-aware available-row predicate into one shared builder.
This exact
ready_entriesfilter now exists here, incancel_by_unique_key, instate_counts, and again inawa-worker/src/client.rs::health_check.MAPPING.mdtreats those as equivalent read-side projections, so any future change to tombstone keying or enqueue-shard semantics now has several places to drift. A shared SQL fragment/CTE builder would make that contract much harder to break.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/admin.rs` around lines 271 - 338, The ready_entries tombstone-aware filter repeated in queue_storage_current_jobs_cte (and also present in cancel_by_unique_key, state_counts, and awa-worker::client::health_check) should be factored into a single shared SQL fragment builder function (e.g., ready_tombstone_predicate or build_ready_available_cte) that returns the predicate/CTE string; update queue_storage_current_jobs_cte to call that new function instead of inlining the WHERE/NOT EXISTS block, and replace the duplicated filter in cancel_by_unique_key, state_counts, and health_check to use the same builder so the tombstone keying and enqueue_shard semantics are defined in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@correctness/storage/README.md`:
- Around line 107-118: The README claims "each DLQ transition is reachable" but
the listed actions include RescueToReady which is not a DLQ transition; update
the text to make scope consistent by either renaming the header/table to a more
general "Action coverage" (so keep all listed actions including RescueToReady)
or remove RescueToReady from the DLQ-specific list; adjust the sentence around
the table and the table caption accordingly to reference the chosen change and
verify the table rows (FailToDlq, TimeoutWaitingToDlq, PurgeDlq,
MoveFailedToDlq, RetryFromDlq) match the DLQ scope if you choose to keep the DLQ
wording.
---
Outside diff comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 2596-2615: The reset() SQL TRUNCATE call omits the
ready_tombstones table, causing stale tombstones to block reuse of (ready_slot,
ready_generation, queue, priority, enqueue_shard, lane_seq) keys after reset;
update the TRUNCATE list in the SQL executed by reset() (the query built with
sqlx::query(&format!(...)) surrounding tables like ready_entries, done_entries,
etc.) to also include {schema}.ready_tombstones so tombstones are cleared
alongside the other tables.
- Around line 6825-6875: load_job() must ignore ready_entries rows that have
been tombstoned; update the SELECT-from-{schema}.ready_entries query used by
load_job() to exclude any row present in {schema}.ready_tombstones by adding the
same tombstone predicate used in the diff (match on queue, priority,
enqueue_shard, lane_seq, ready_slot, ready_generation), e.g. add a NOT
EXISTS(...) or LEFT JOIN ... WHERE tomb.ready_slot IS NULL filter so the
tombstoned snapshot is not returned. Make the identical change to the second
occurrence noted (the other block around lines 7854-7874) so both load_job()
query paths filter out tombstoned ready rows.
- Around line 10664-10717: The pending-count query (producing variable pending)
currently counts ready rows that have no matching done row but does not exclude
lanes that are tombstoned, so slots left with only tombstones block pruning;
modify the SQL used in the sqlx::query_scalar call referencing {ready_child} and
{done_child} to LEFT JOIN the corresponding ready_tombstones partition
(ready_tombstones_{slot}) and add a WHERE clause requiring the tombstone join to
be NULL (e.g. AND rt.ready_generation IS NULL or similar) so that rows that are
tombstoned are excluded from the anti-join count; update the query string used
by the pending calculation accordingly.
In `@awa-worker/src/client.rs`:
- Around line 1971-1999: The availability query currently swallows errors by
using unwrap_or_default() on the sqlx::query_as(...).fetch_all(&self.pool).await
call in the block that assigns available_rows (inside the
effective_storage.queue_storage_store() branch); change this to propagate or
handle the Result: on Err, log the error via the existing logger and set the
overall health flag to unhealthy (do not treat failure as an empty set), while
on Ok use the returned rows. Update the health-check logic that uses
available_rows to consider the query outcome (failure => healthy = false) rather
than treating a query error as "no available rows."
---
Nitpick comments:
In `@awa-model/src/admin.rs`:
- Around line 271-338: The ready_entries tombstone-aware filter repeated in
queue_storage_current_jobs_cte (and also present in cancel_by_unique_key,
state_counts, and awa-worker::client::health_check) should be factored into a
single shared SQL fragment builder function (e.g., ready_tombstone_predicate or
build_ready_available_cte) that returns the predicate/CTE string; update
queue_storage_current_jobs_cte to call that new function instead of inlining the
WHERE/NOT EXISTS block, and replace the duplicated filter in
cancel_by_unique_key, state_counts, and health_check to use the same builder so
the tombstone keying and enqueue_shard semantics are defined in one place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a9ef892a-46d2-4efd-8da0-88e18397fe37
📒 Files selected for processing (21)
README.mdawa-model/migrations/v023_install_queue_storage_substrate.sqlawa-model/migrations/v028_ready_tombstones.sqlawa-model/src/admin.rsawa-model/src/migrations.rsawa-model/src/queue_storage.rsawa-model/src/storage.rsawa-worker/src/client.rsawa/tests/migration_test.rsawa/tests/queue_storage_runtime_test.rscorrectness/README.mdcorrectness/storage/AwaDeadTupleContract.tlacorrectness/storage/AwaSegmentedStorage.tlacorrectness/storage/MAPPING.mdcorrectness/storage/README.mddocs/adr/019-queue-storage-redesign.mddocs/architecture.mddocs/configuration.mddocs/queue-storage-substrate.mddocs/troubleshooting.mddocs/upgrade-0.5-to-0.6.md
| Action coverage from a `-coverage 1` run of the base config confirms each DLQ | ||
| transition is reachable: | ||
|
|
||
| | Action | States | | ||
| |---|---:| | ||
| | `FailToDlq` | 41,472 | | ||
| | `TimeoutWaitingToDlq` | 9,216 | | ||
| | `RescueToReady` | 6,912 | | ||
| | `PurgeDlq` | 4,608 | | ||
| | `MoveFailedToDlq` | 3,072 | | ||
| | `RetryFromDlq` | 1,536 | | ||
| | Action | | ||
| |---| | ||
| | `FailToDlq` | | ||
| | `TimeoutWaitingToDlq` | | ||
| | `RescueToReady` | | ||
| | `PurgeDlq` | | ||
| | `MoveFailedToDlq` | | ||
| | `RetryFromDlq` | | ||
|
|
There was a problem hiding this comment.
Clarify the DLQ coverage claim to match the listed actions.
Line 107 says each DLQ transition is reachable, but Line 115 lists RescueToReady, which is not a DLQ transition. Please either rename the section/table scope or remove that action from the DLQ-specific list.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@correctness/storage/README.md` around lines 107 - 118, The README claims
"each DLQ transition is reachable" but the listed actions include RescueToReady
which is not a DLQ transition; update the text to make scope consistent by
either renaming the header/table to a more general "Action coverage" (so keep
all listed actions including RescueToReady) or remove RescueToReady from the
DLQ-specific list; adjust the sentence around the table and the table caption
accordingly to reference the chosen change and verify the table rows (FailToDlq,
TimeoutWaitingToDlq, PurgeDlq, MoveFailedToDlq, RetryFromDlq) match the DLQ
scope if you choose to keep the DLQ wording.
Summary
ready_tombstonesqueue-storage ledger and install it for each queue-storage substrate.ready_entrieson common completion, DLQ, discard, retry, cancellation, reprioritization, and SQL compatibility paths.Fixes #309.
Why
The cursor allocator from PR #321 moved hot sequence bounds away from mutable rows, but
ready_entriesstill generated dead tuples because completed jobs deleted their ready backing rows. Under overlapping readers that DELETE pressure hurts throughput and makes the ready ring less append-only than intended.This change makes ready segments immutable until partition maintenance truncates them. Rare out-of-band ready mutations append tiny tombstone rows instead of deleting ready rows, so the hot lifecycle is dominated by inserts, cursor advancement, and partition truncation.
Design Notes
ready_entriesremains the append-only source of ready job bodies for a segment.ready_tombstonesrecords rare ready-lane invalidations such as cancellation, reprioritization, and SQL compatibility DELETE of an available job.claim_ready_runtime()treats tombstones as committed spent evidence so the claim cursor can advance over a contiguous prefix without skipping earlier live jobs.done_entriesor matchingready_tombstonesas spent evidence.Validation
cargo fmt --checkgit diff --checkpython3 -m py_compile awa-python/tests/mixed_fleet_helper.pycargo test -p awa --test migration_test test_v027_rebuckets_existing_terminal_live_counts -- --nocapturecargo test -p awa --test queue_storage_runtime_test test_queue_storage_prune_treats_ready_tombstone_as_spent -- --nocapturecargo test -p awa --test queue_storage_runtime_test test_queue_storage_prune_pending_ready_match_is_scoped_by_enqueue_shard -- --nocapturecargo test --package awa --test chaos_suite_test test_mixed_rust_and_python_workers_share_same_queue -- --exact --ignored --nocapturerepeated 3 times locally./correctness/run-tlc.sh storage/AwaDeadTupleContract.tla./correctness/run-tlc.sh storage/AwaSegmentedStorage.tla./correctness/run-tlc.sh storage/AwaSegmentedStorage.tla storage/AwaSegmentedStorageInterleavings.cfg./correctness/run-tlc.sh storage/AwaSegmentedStorageTrace.tlareaches the expected witness invariant behavior documented by that trace config.sofiles