refactor(migrations): move default queue-storage substrate into v023 SQL helper (#308 PR 1)#310
Conversation
…SQL helper (#308 PR 1) Adds `awa.install_queue_storage_substrate(p_schema, ...)` in v023 and makes it the single source of DDL truth for per-schema queue-storage substrate objects (sequences, ring singletons, partitioned ready/done/lease tables, lane indexes, claim_ready_runtime). The helper takes the same per-schema advisory xact lock previously held by Rust (`awa.queue_storage.install:{schema}`), runs as SECURITY INVOKER, and rejects non-default configuration against the default `awa` schema with errcode 22023. `awa migrate` now installs the default `awa` substrate via `SELECT awa.install_queue_storage_substrate('awa')`, so a fresh install no longer waits for the first worker to materialize the queue-storage tables and `awa migrate --sql` output is now self-sufficient for external migration tooling. `QueueStorage::prepare_schema()` shrinks from ~1884 LOC to ~280 LOC and now: rejects non-empty `open_receipt_claims`, renames a legacy non-partitioned `lease_claims` / `lease_claim_closures` aside, calls the helper, copies any renamed-aside rows into the new partitioned parents, drops `queue_count_snapshots`, and still creates `awa.runtime_storage_backends` (the CREATE moves into a dedicated migration in PR 2). The helper is activation-neutral and does not touch `runtime_storage_backends` or storage-transition state. `lease_claim_receipts = false` remains permitted for non-`awa` schemas (test and legacy paths still exercise it); only the default `awa` install requires receipts=TRUE.
|
Warning Review limit reached
More reviews will be available in 33 minutes and 14 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughAdds a canonical SQL installer (v023) awa.install_queue_storage_substrate(...), installs per-schema partitioned queue-storage objects and claim runtime, migrates legacy default-schema tables into partitioned parents, updates Rust prepare_schema to invoke the installer, and registers schema version 23. ChangesQueue Storage Installer & Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 244ad26ca3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 'Installs the queue-storage substrate (sequences, ring-state singletons, partitioned ready/done/lease tables, lane indexes, claim_ready_runtime()) into the named schema. The default ''awa'' substrate is migration-owned and default-shaped: lease_claim_receipts=TRUE and (queue=16, lease=8, claim=8). Operators wanting non-default slot counts or lease_claim_receipts=FALSE must use a custom queue-storage schema. SECURITY INVOKER, takes a per-schema advisory xact lock. See #308.'; | ||
|
|
||
| -- Install the default `awa` substrate as part of migrate. | ||
| SELECT awa.install_queue_storage_substrate('awa'); |
There was a problem hiding this comment.
Run legacy lease-claim fixups before v23 install
When upgrading an existing database that still has the pre-partition regular awa.lease_claims or awa.lease_claim_closures tables, this migration calls the helper directly and skips the rename/copy fixups that now remain only in QueueStorage::prepare_schema(). In that scenario CREATE TABLE IF NOT EXISTS awa.lease_claims (...) PARTITION BY LIST is a no-op against the existing regular table, and the subsequent CREATE TABLE ... PARTITION OF awa.lease_claims in the helper fails because the parent is not partitioned, so awa migrate cannot complete before the application ever reaches prepare_schema().
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@awa-model/migrations/v023_install_queue_storage_substrate.sql`:
- Around line 996-999: The non-receipts CTE is referencing an invalid positional
parameter `$6` for the deadline fallback but the generated function
claim_ready_runtime only defines four parameters (p_queue, p_max_batch,
p_deadline_secs, p_aging_secs); replace the `$6` usage with the correct
parameter name `p_deadline_secs` (or its proper positional `$3`) in the
expression COALESCE(v_deadline_at, v_claimed_at + make_interval(secs => ...)),
updating the expression that uses v_deadline_at and v_claimed_at, and if the
original bug was intentionally preserved add a clear inline TODO comment
referencing claim_ready_runtime and p_lease_claim_receipts to mark the follow-up
fix instead of changing code.
In `@awa-model/src/queue_storage.rs`:
- Around line 2231-2242: The backfill INSERT in the lease_claims migration is
dropping legacy columns (e.g., enqueue_shard and deadline_at) causing runtime
failures; update the INSERT into {schema}.lease_claims and its accompanying
SELECT from {schema}.lease_claims_legacy so that the INSERT column list exactly
matches the full legacy row shape (add enqueue_shard, deadline_at and any other
missing columns) and map the SELECT values (using the same current_slot subquery
for claim_slot) to preserve all fields; ensure the conflict clause (ON CONFLICT
(claim_slot, job_id, run_lease) DO NOTHING) remains unchanged.
- Line 2237: The two legacy INSERTs read claim_ring_state.current_slot
separately causing mismatched partitions if rotate_claims() runs; change the SQL
so you snapshot current_slot once (e.g., via a WITH/CTE or a single SELECT INTO
a variable) and reuse that single value for both lease_claims_legacy and
lease_claim_closures_legacy inserts; update the statements that reference
claim_ring_state.current_slot to use the captured value so both inserts use the
same slot even if rotate_claims() runs concurrently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ff5d631-6222-4786-8c6f-0a22ec3e52a9
📒 Files selected for processing (3)
awa-model/migrations/v023_install_queue_storage_substrate.sqlawa-model/src/migrations.rsawa-model/src/queue_storage.rs
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
awa-model/src/queue_storage.rs (1)
2215-2223:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail impossible default-
awaconfigs before this helper call.Line 2215 is now the first place that enforces the new “default
awamust be default-shaped” contract.QueueStorage::newstill acceptsschema == "awa"with custom slot counts orlease_claim_receipts = false, but the rest of this type branches on those config values even outsideprepare_schema(). That lets a misconfigured runtime target an already-migrated default substrate and take the wrong receipt/slot path before SQL ever rejects it.Suggested fix
pub fn new(config: QueueStorageConfig) -> Result<Self, AwaError> { if config.queue_slot_count < 4 { return Err(AwaError::Validation( "queue storage requires at least 4 queue slots".into(), )); @@ if config.queue_stripe_count == 0 { return Err(AwaError::Validation( "queue storage requires at least 1 queue stripe".into(), )); } + if config.schema == DEFAULT_SCHEMA + && (config.queue_slot_count != DEFAULT_QUEUE_SLOT_COUNT + || config.lease_slot_count != DEFAULT_LEASE_SLOT_COUNT + || config.claim_slot_count != DEFAULT_CLAIM_SLOT_COUNT + || !config.lease_claim_receipts) + { + return Err(AwaError::Validation( + "the default `awa` queue-storage schema must use the default slot counts and lease_claim_receipts=true".into(), + )); + } validate_ident(&config.schema)?; Ok(Self { config, next_stripe_probe: AtomicUsize::new(0),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 2215 - 2223, Fail early when the config requests the canonical "awa" schema but the runtime still provides non-default settings: in QueueStorage::new (before calling prepare_schema() or executing the install helper that runs SELECT awa.install_queue_storage_substrate(...)), validate if schema == "awa" and then assert that queue_slot_count(), lease_slot_count(), claim_slot_count() equal the expected default constants and lease_claim_receipts() is true; if not, return an error immediately so misconfigured runtimes cannot proceed down code paths that assume non-default values.
♻️ Duplicate comments (2)
awa-model/src/queue_storage.rs (2)
2250-2261:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve the full legacy
lease_claimsrow shape during backfill.Line 2250 still drops
enqueue_shardanddeadline_at, but later receipt paths read both from{schema}.lease_claims(ensure_running_leases_from_receipts_tx,rescue_expired_receipt_deadlines_tx, andload_job). Upgraded claims lose the shard/deadline metadata needed to materialize leases, join back toready_entries, and rescue expired deadlines.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` around lines 2250 - 2261, The backfill INSERT from {schema}.lease_claims_legacy into {schema}.lease_claims currently omits enqueue_shard and deadline_at, losing metadata needed by ensure_running_leases_from_receipts_tx, rescue_expired_receipt_deadlines_tx, and load_job; modify the INSERT and SELECT in the upgrade SQL (the INSERT INTO ... SELECT from lease_claims_legacy block) to include enqueue_shard and deadline_at in both the target column list and the SELECT projection so upgraded rows retain the full legacy row shape, and keep the ON CONFLICT (claim_slot, job_id, run_lease) DO NOTHING behavior.
2256-2256:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReuse one
claim_slotfor both legacy inserts.Lines 2256 and 2297 read
claim_ring_state.current_slotindependently. Ifrotate_claims()advances between those statements on an already-active schema, closures get copied into a different partition than their claims, so the migrated attempts remain permanently “open” to everylease_claims/lease_claim_closuresanti-join.Suggested fix
+ let legacy_claim_slot: i32 = sqlx::query_scalar(&format!( + "SELECT current_slot FROM {schema}.claim_ring_state WHERE singleton" + )) + .fetch_one(install_tx.as_mut()) + .await + .map_err(map_sqlx_error)?; + if lease_claims_legacy_exists { sqlx::query(&format!( r#" INSERT INTO {schema}.lease_claims ( @@ ) SELECT - (SELECT current_slot FROM {schema}.claim_ring_state WHERE singleton), + $1, job_id, run_lease, ready_slot, ready_generation, queue, priority, attempt, max_attempts, lane_seq, claimed_at, materialized_at FROM {schema}.lease_claims_legacy ON CONFLICT (claim_slot, job_id, run_lease) DO NOTHING "# )) + .bind(legacy_claim_slot) .execute(install_tx.as_mut()) .await .map_err(map_sqlx_error)?; @@ if closures_legacy_exists { sqlx::query(&format!( r#" INSERT INTO {schema}.lease_claim_closures ( @@ ) SELECT - (SELECT current_slot FROM {schema}.claim_ring_state WHERE singleton), + $1, job_id, run_lease, outcome, closed_at FROM {schema}.lease_claim_closures_legacy ON CONFLICT (claim_slot, job_id, run_lease) DO NOTHING "# )) + .bind(legacy_claim_slot) .execute(install_tx.as_mut()) .await .map_err(map_sqlx_error)?;Also applies to: 2297-2297
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/src/queue_storage.rs` at line 2256, In rotate_claims(), avoid reading claim_ring_state.current_slot twice; read it once into a local variable (e.g., claim_slot) and reuse that single value for both legacy inserts (the INSERTs that reference claim_ring_state.current_slot and the second legacy insert around the lease_claim_closures/lease_claims anti-join) so both closures and claims are inserted with the same slot; update references in the INSERT statements and any subsequent logic to use claim_slot instead of querying claim_ring_state.current_slot again.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@awa/tests/queue_storage_runtime_test.rs`:
- Around line 2170-2185: Add an assertion to ensure the legacy receipts-disabled
path does not create any lease_claims: call lease_claim_count(&pool,
&store).await and assert_eq! it to 0 immediately after the existing lease
assertions (use the same pool, store, and job_id context used in this test with
variables like claimed, job_id, pool, store) so regressions that insert a
lease_claims row are detected.
---
Outside diff comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 2215-2223: Fail early when the config requests the canonical "awa"
schema but the runtime still provides non-default settings: in QueueStorage::new
(before calling prepare_schema() or executing the install helper that runs
SELECT awa.install_queue_storage_substrate(...)), validate if schema == "awa"
and then assert that queue_slot_count(), lease_slot_count(), claim_slot_count()
equal the expected default constants and lease_claim_receipts() is true; if not,
return an error immediately so misconfigured runtimes cannot proceed down code
paths that assume non-default values.
---
Duplicate comments:
In `@awa-model/src/queue_storage.rs`:
- Around line 2250-2261: The backfill INSERT from {schema}.lease_claims_legacy
into {schema}.lease_claims currently omits enqueue_shard and deadline_at, losing
metadata needed by ensure_running_leases_from_receipts_tx,
rescue_expired_receipt_deadlines_tx, and load_job; modify the INSERT and SELECT
in the upgrade SQL (the INSERT INTO ... SELECT from lease_claims_legacy block)
to include enqueue_shard and deadline_at in both the target column list and the
SELECT projection so upgraded rows retain the full legacy row shape, and keep
the ON CONFLICT (claim_slot, job_id, run_lease) DO NOTHING behavior.
- Line 2256: In rotate_claims(), avoid reading claim_ring_state.current_slot
twice; read it once into a local variable (e.g., claim_slot) and reuse that
single value for both legacy inserts (the INSERTs that reference
claim_ring_state.current_slot and the second legacy insert around the
lease_claim_closures/lease_claims anti-join) so both closures and claims are
inserted with the same slot; update references in the INSERT statements and any
subsequent logic to use claim_slot instead of querying
claim_ring_state.current_slot again.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d06561d6-6ff9-4cdf-952b-172bd37ff231
📒 Files selected for processing (5)
awa-model/migrations/v023_install_queue_storage_substrate.sqlawa-model/src/queue_storage.rsawa/tests/migration_test.rsawa/tests/queue_storage_runtime_test.rsdocs/security.md
a0bfff7 to
70c1120
Compare
Run the default awa v023 install through the same legacy cleanup shape that prepare_schema handles, so awa migrate can upgrade an already-prepared default substrate with regular lease_claims tables. Tighten helper input validation to the runtime's lowercase unquoted schema contract and revoke PUBLIC execute on the installer helper. Fix the non-receipts zero-deadline claim body by using v_deadline_at instead of the stale positional fallback, and add regressions for the migration-only legacy path, installer privileges, identifier validation, and zero-deadline claims. Fixes #311
70c1120 to
4ca2d2e
Compare
Summary
First PR of the #308 series: move the per-schema queue-storage DDL into a SQL helper function so the default
awasubstrate becomes migration-owned. The headline numbers:prepare_schema()shrinks from 1884 → 280 lines; v023 ships ~1300 lines of plpgsql that both the migration andprepare_schema()call into;awa migrate --sqlnow contains the complete substrate install.Follow-ons in this series:
awa.runtime_storage_backendsout ofprepare_schema()into its own migration;activate_backend()becomes INSERT/UPDATE only.awa storage prepare-queue-storage-schema --schema awa --reset; adddocs/queue-storage-substrate.mdand link from architecture / migrations / security.Design contract
The default
awa.*substrate is migration-owned and default-shaped. The helper signature:prepare_schema()callsSELECT awa.install_queue_storage_substrate($1, $2, $3, $4, $5)rather than holding parallel CREATE/ALTER strings.awa.queue_storage.install:<schema>), so CLI / Rust / Python / extracted SQL share race behavior.runtime_storage_backendsinsert/update, no storage-transition writes. The cross-schema seed ofawa.runtime_storage_backendsstays inprepare_schema()for PR 1; PR 2 hoists it to its own migration.SECURITY INVOKER, no broad grant. Runtime roles don't gain DDL through the helper.p_schema = 'awa':lease_claim_receipts != TRUEand non-default slot counts raise22023(invalid_parameter_value) with a HINT directing the operator to a custom queue-storage schema. Non-awaschemas accept any config (test/legacy paths usereceipts=false~20×).lease_claim_receipts=falseis NOT dropped from production in this PR — out of scope. It remains supported for non-awaschemas only.What
prepare_schema()keepsAfter the shrink,
prepare_schema()(now ~280 LOC) retains only the legacy upgrade edge cases that don't belong in a forward-only DDL helper:open_receipt_claimsnon-empty rejection + drop.lease_claims/lease_claim_closuresfrom non-partitioned shape (relkind'r').QueueStorageConfig).*_legacytables.queue_count_snapshotslegacy drop.awa.runtime_storage_backendsCREATE + cross-schema seed (moves out in PR 2).The outer
pg_advisory_xact_lockstays — it's needed to serialize the legacy rename (before the helper) and the legacy copy + drop (after) with the helper's CREATE of the partitioned parents.pg_advisory_xact_lockis reentrant for the same session, so the helper's inner acquire is a no-op when called fromprepare_schema().Test plan
cargo fmt --all --checkcleancargo clippy --all-targets --all-features -- -D warningscleancargo build --workspacecleancargo test -p awa-model --lib— 64/64 pass (incl. 6 migration tests)cargo test -p awa --test queue_storage_runtime_testagainst clean Postgres — 77/77 pass in 108s, including alltest_queue_terminal_live_counts*andqueue_counts*tests. (Initial run on a stale DB showed 4 transient failures in receipts=false + custom-schema tests; verified clean DB shows 77/77.)awa migrate --sql | tail -20includes the full helper definition, theSELECT awa.install_queue_storage_substrate('awa');invocation, and the schema_version v23 INSERT.Known latent quirks preserved verbatim
claim_ready_runtimehas a$6parameter reference that doesn't correspond to any of the function's 4 named parameters. This is a pre-existing latent bug onmain(line ~2210 ofawa-model/src/queue_storage.rsbefore this PR) that's never reached in practice —COALESCEshort-circuits past it wheneverp_deadline_secs > 0, and every call site passes> 0. Preserved verbatim here to keep this PR a pure refactor; will be tracked as a follow-up issue.Refs
Closes none; refs #308. PRs 2 and 3 land separately.
Summary by CodeRabbit