perf(queue_storage): fillfactor=50 on leases + lease_claims partitions (#169)#315
Conversation
#169) Per #169 release decision, audit the receipt-plane mutable tables for HOT-update headroom. Two partitioned tables take UPDATEs on the hot path and were left at the default fillfactor=100: * leases — heartbeat, state-transition, callback-field UPDATEs * lease_claims — SET materialized_at = clock_timestamp() Other receipt-plane partitioned tables (ready_entries, done_entries, lease_claim_closures) have zero UPDATE call sites — verified by grep across queue_storage.rs and the migration SQL functions — so lowering their fillfactor would just waste pages. They stay at fillfactor=100. For leases the heartbeat/state-transition UPDATEs still hit indexed columns (state_hb, state_deadline, state_callback_timeout, callback) so they remain non-HOT until the companion B1 work removes the heartbeat_at write/index in receipts mode. fillfactor=50 still matters: it caps off-page tuple placement so the dead-tuple chain stays bounded under pinned-MVCC pressure. Mirrors the d21e5db / ab99a31 pattern on queue_claimer_leases / queue_claim_heads / queue_enqueue_heads. Two parts: 1. v023's install_queue_storage_substrate helper now applies the ALTERs inside the partition-creation FOR loops, so fresh installs (genuinely fresh DB, or fresh custom-schema prepare on a deploy that includes this migration in its initial migrate) land at fillfactor=50. 2. v024 covers existing deployments. It defines awa.apply_receipt_plane_fillfactor(schema TEXT) — iterates the given schema's leases/lease_claims partitions and applies the same ALTER as the helper — then calls it for the default `awa` schema and every schema registered in awa.runtime_storage_backends. ALTER TABLE SET (fillfactor=N) only changes the hint for future page allocations, so this is non-blocking even on populated tables. The helper is GRANT EXECUTE ... TO PUBLIC so operators can run it ad-hoc against a custom schema they prepare after upgrading (the rare case where the install function still cached in pg_proc has the pre-v024 body). Four regression tests in awa-model/tests/receipt_plane_fillfactor_test.rs cover: leases partitions carry the full reloptions block, lease_claims partitions carry fillfactor=50, the insert-only siblings deliberately keep the default (catches future bulk-apply mistakes), and the apply_receipt_plane_fillfactor helper covers custom schemas.
📝 WalkthroughWalkthroughThis PR adds v024 migration that defines a helper function to retroactively apply fillfactor=50 and autovacuum settings to receipt-plane leases and lease_claims partitions. The helper discovers registered schemas via the AWA sentinel function, is integrated into schema preparation, and is validated by comprehensive tests. ChangesReceipt-plane fillfactor v024 migration and application
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 115de3b5df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| FOR v_schema IN | ||
| SELECT DISTINCT schema_name | ||
| FROM awa.runtime_storage_backends | ||
| WHERE schema_name IS NOT NULL | ||
| AND schema_name <> 'awa' | ||
| LOOP | ||
| PERFORM awa.apply_receipt_plane_fillfactor(v_schema); |
There was a problem hiding this comment.
Ensure post-upgrade prepared schemas get fillfactor
On databases that are already at v23, this migration only applies the new reloptions to awa and schemas already present in awa.runtime_storage_backends; it never replaces the existing awa.install_queue_storage_substrate function. A later awa storage prepare-queue-storage-schema --schema <new> still goes through QueueStorage::prepare_schema() and only calls awa.install_queue_storage_substrate(...) (awa-model/src/queue_storage.rs), so the old pg_proc body creates the new schema's leases_* and lease_claims_* partitions with default fillfactor=100 unless the operator remembers a separate manual helper call. That leaves the newly prepared custom schema on the same hot-update behavior this migration is meant to fix; v024 should update the installer function or have the prepare path invoke apply_receipt_plane_fillfactor after installation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
awa-model/migrations/v024_receipt_plane_fillfactor.sql (1)
42-78: ⚖️ Poor tradeoffConsider re-applying the updated
install_queue_storage_substratebody in v024 to close the post-upgrade gap.Because v023 only re-runs on fresh installs, a deployment already at v23 keeps the pre-edit function body in
pg_proc. v024 fixes existing partitions, but any custom schema prepared after the upgrade still getsfillfactor=100, leaving operators dependent on the manualapply_receipt_plane_fillfactorworkaround documented above. Re-runningCREATE OR REPLACE FUNCTION awa.install_queue_storage_substrate(...)(the updated body) inside v024 would refresh the cached definition so future prepares are correct automatically.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/migrations/v024_receipt_plane_fillfactor.sql` around lines 42 - 78, The migration should re-apply the updated install_queue_storage_substrate function body so Postgres caches the new definition for future schema prepares: inside v024, add a CREATE OR REPLACE FUNCTION awa.install_queue_storage_substrate(...) with the same signature and the updated body (the exact implementation you updated earlier) and execute it during the migration so pg_proc is refreshed and subsequent calls to install_queue_storage_substrate produce partitions with the corrected fillfactor by default.awa-model/tests/receipt_plane_fillfactor_test.rs (1)
117-161: ⚡ Quick winThis test does not reliably exercise the helper on a fresh database.
On a fresh migrate, the v023
install_queue_storage_substratebody inpg_procalready appliesfillfactor=50inside the partition loop, so the partitions are at 50 beforeapply_receipt_plane_fillfactoris called. The assertion then passes even if the helper were a no-op — the comment on Lines 117-121 only holds for an upgraded DB carrying the stale function body. To make the test deterministic and actually validate the helper, reset the partitions to a non-50 state first, then assert the helper flips them.♻️ Reset partitions before invoking the helper
sqlx::query("SELECT awa.install_queue_storage_substrate($1)") .bind(&schema) .execute(&pool) .await .expect("install substrate via helper"); // Force partitions back to the default so the helper's effect is observable // regardless of whether the cached install body already applied fillfactor=50. sqlx::query(&format!( "DO $$ DECLARE r oid; BEGIN \ FOR r IN SELECT inhrelid FROM pg_inherits \ WHERE inhparent IN ('{schema}.leases'::regclass, '{schema}.lease_claims'::regclass) \ LOOP EXECUTE format('ALTER TABLE %s RESET (fillfactor)', r::regclass::text); END LOOP; END $$;" )) .execute(&pool) .await .expect("reset fillfactor"); sqlx::query("SELECT awa.apply_receipt_plane_fillfactor($1)") .bind(&schema) .execute(&pool) .await .expect("apply fillfactor helper");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/tests/receipt_plane_fillfactor_test.rs` around lines 117 - 161, The test currently can pass on a fresh DB because install_queue_storage_substrate may already set fillfactor=50, so make the test deterministic by resetting partition reloptions before calling apply_receipt_plane_fillfactor; specifically, after calling awa.install_queue_storage_substrate(schema) but before calling awa.apply_receipt_plane_fillfactor(schema), execute a SQL block that iterates pg_inherits for the schema's leases and lease_claims partitions and runs ALTER TABLE ... RESET (fillfactor) (or equivalent) to revert them to the default, then run apply_receipt_plane_fillfactor and assert that has_option sees fillfactor=50 on each partition (referencing the test's calls to install_queue_storage_substrate and apply_receipt_plane_fillfactor and the partitions / has_option check).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@awa-model/migrations/v024_receipt_plane_fillfactor.sql`:
- Around line 42-78: The migration should re-apply the updated
install_queue_storage_substrate function body so Postgres caches the new
definition for future schema prepares: inside v024, add a CREATE OR REPLACE
FUNCTION awa.install_queue_storage_substrate(...) with the same signature and
the updated body (the exact implementation you updated earlier) and execute it
during the migration so pg_proc is refreshed and subsequent calls to
install_queue_storage_substrate produce partitions with the corrected fillfactor
by default.
In `@awa-model/tests/receipt_plane_fillfactor_test.rs`:
- Around line 117-161: The test currently can pass on a fresh DB because
install_queue_storage_substrate may already set fillfactor=50, so make the test
deterministic by resetting partition reloptions before calling
apply_receipt_plane_fillfactor; specifically, after calling
awa.install_queue_storage_substrate(schema) but before calling
awa.apply_receipt_plane_fillfactor(schema), execute a SQL block that iterates
pg_inherits for the schema's leases and lease_claims partitions and runs ALTER
TABLE ... RESET (fillfactor) (or equivalent) to revert them to the default, then
run apply_receipt_plane_fillfactor and assert that has_option sees fillfactor=50
on each partition (referencing the test's calls to
install_queue_storage_substrate and apply_receipt_plane_fillfactor and the
partitions / has_option check).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b7a032bb-c9dd-4471-80d8-0246e60ce7e9
📒 Files selected for processing (4)
awa-model/migrations/v023_install_queue_storage_substrate.sqlawa-model/migrations/v024_receipt_plane_fillfactor.sqlawa-model/src/migrations.rsawa-model/tests/receipt_plane_fillfactor_test.rs
Three findings from the PR review: 1. QueueStorage::prepare_schema now calls apply_receipt_plane_fillfactor after install_queue_storage_substrate. This closes the "operator prepares a NEW custom schema after upgrading from an already-v23 DB" gap that the original v024 only patched for schemas already known at migration time. Tunings live in their own SQL function so future perf knobs land additively without touching the bigger install helper or duplicating it in a migration. 2. v024's DO block discovers schemas via pg_class (any partitioned `leases` + `lease_claims` pair in any namespace) instead of reading awa.runtime_storage_backends, which only carries the active backend marker. A custom schema materialized by prepare_schema but never entered into mixed_transition was previously skipped; pg_class discovery picks it up regardless of registration state. 3. apply_receipt_plane_fillfactor_helper_restores_reset_partitions (renamed from the prior helper-covers-custom-schemas test) now RESETs every partition's reloptions between install and the helper call, then asserts the helper restores them. The original test could have passed vacuously because v023's edited helper in this checkout already sets fillfactor=50. The RESET step proves the v024 helper is what actually puts them back on the upgrade path. Added prepare_schema_applies_receipt_plane_fillfactor as a separate test that exercises the Rust orchestrator path end-to-end, asserting a fresh `QueueStorage::prepare_schema()` call against a brand-new schema lands at fillfactor=50.
Two findings from the review: 1. The v024 schema sweep could alter unrelated app-owned schemas. The pg_class discovery matched any schema with partitioned `leases` and `lease_claims` parents, then apply_receipt_plane_fillfactor() altered those child partitions. Reproduced in a throwaway DB with app_owned.leases / app_owned.lease_claims: v024 changed their child reloptions from <none> to fillfactor=50,... Fix: gate on the AWA-specific `claim_ready_runtime(text, bigint, double precision, double precision)` function sentinel — the same shape `awa_model::storage::queue_storage_schema_ready` uses to identify an AWA substrate. The check now lives inside `apply_receipt_plane_fillfactor` itself (not just in the migration sweep), so calling the helper directly against a non-AWA schema is also a no-op. Switched from pg_get_function_identity_arguments (includes parameter names — brittle) to to_regprocedure with a type signature, matching the existing pattern in awa-model/src/storage.rs:147. Added apply_receipt_plane_fillfactor_skips_non_awa_schemas which constructs an app-owned lookalike schema and asserts the helper leaves it untouched. 2. The prepare_schema_applies_receipt_plane_fillfactor test could pass vacuously: on this checkout, install_queue_storage_substrate itself sets fillfactor=50 in its CREATE TABLE loops, so the test would pass even if the new prepare_schema → apply_receipt_plane_fillfactor call were removed. Renamed to prepare_schema_reapplies_receipt_plane_fillfactor_on_reprepare and restructured: prepare, RESET every partition's reloptions to simulate the pre-v024 state, sanity-check the reset, then re-prepare. install_queue_storage_substrate is idempotent — its CREATE TABLE IF NOT EXISTS branches don't re-fire on existing partitions, so only the orchestrator's explicit apply_receipt_plane_fillfactor call can restore the reloptions. Removing the hook from prepare_schema now fails the test.
…r ALTERs) Review caught that the reprepare test claimed only the orchestrator hook could restore reloptions, but `install_queue_storage_substrate` also had `ALTER TABLE ... SET (fillfactor=...)` statements inside its partition-creation loops — those run on every helper invocation, so the test would still pass if the orchestrator hook were removed. That's belt-and-suspenders with no single source of truth. The cleaner architecture: install helper does structure only; tunings live exclusively in `apply_receipt_plane_fillfactor`; the orchestrator (Rust `prepare_schema` for live code paths, the v024 sweep for fresh migrations) is what composes them. Drop the in-loop ALTER TABLE blocks from v023's install helper. v024's sweep still applies tunings to the default `awa` schema during a fresh `awa migrate`, and Rust `prepare_schema` applies them after every install for custom schemas. The reprepare test now genuinely isolates the orchestrator: install helper does nothing on re-invocation, only the explicit `apply_receipt_plane_fillfactor` call restores reloptions. Verified end-to-end with a fresh-DB migrate: v023 creates structure, v024 sweep finds `awa` via the claim_ready_runtime sentinel and applies tunings, all 16 partitions land at fillfactor=50.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
awa-model/tests/receipt_plane_fillfactor_test.rs (1)
89-94:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert autovacuum reloptions on
lease_claimspartitions too.
leases_partitions_carry_fillfactor_50checksfillfactor=50andautovacuum_vacuum_scale_factor=0.0/autovacuum_vacuum_threshold=200, butlease_claims_partitions_carry_fillfactor_50(lines 82-94) only checksfillfactor=50. The v024 helperawa.apply_receipt_plane_fillfactorapplies the same autovacuum reloptions (and also cost_limit/cost_delay) to bothleasesandlease_claimspartitions, so extending thelease_claimsassertions would prevent regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/tests/receipt_plane_fillfactor_test.rs` around lines 89 - 94, In the lease_claims_partitions_carry_fillfactor_50 test, extend the loop that currently asserts only has_option(opts, "fillfactor", "50") to also assert the autovacuum reloptions like in leases_partitions_carry_fillfactor_50: add assertions for has_option(opts, "autovacuum_vacuum_scale_factor", "0.0") and has_option(opts, "autovacuum_vacuum_threshold", "200") (and optionally the vacuum cost options if your other test checks them) so the lease_claims partitions are validated for the same autovacuum settings; use the existing has_option helper to perform these checks.
🧹 Nitpick comments (1)
awa-model/tests/receipt_plane_fillfactor_test.rs (1)
132-164: ⚡ Quick winExtract the duplicated partition-discovery / reset / count SQL into helpers.
The
leases/lease_claimspartition discovery query, the per-partitionRESET (...)loop, and thecount(... fillfactor=50 ...)sanity check are copy-pasted nearly verbatim acrossapply_receipt_plane_fillfactor_helper_restores_reset_partitions(Lines 132-189) andprepare_schema_reapplies_receipt_plane_fillfactor_on_reprepare(Lines 281-336). Consolidating them (e.g.async fn list_lease_partitions(pool, schema),async fn reset_partition_reloptions(pool, name),async fn count_partitions_with_fillfactor50(pool, schema)) would cut the duplication and keep both tests in sync as the schema evolves.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@awa-model/tests/receipt_plane_fillfactor_test.rs` around lines 132 - 164, The test duplicates partition-discovery, per-partition RESET, and a fillfactor=50 count; extract these into shared async helpers to remove copy/paste: implement async fn list_lease_partitions(pool: &Pool, schema: &str) -> Vec<String> that runs the SELECT (n.nspname || '.' || c.relname) query currently inline, async fn reset_partition_reloptions(pool: &Pool, name: &str) that executes the ALTER TABLE ... RESET (...) for a single partition, and async fn count_partitions_with_fillfactor50(pool: &Pool, schema: &str) that runs the existing COUNT(...) fillfactor=50 sanity query and returns the count; then replace the inline code in apply_receipt_plane_fillfactor_helper_restores_reset_partitions and prepare_schema_reapplies_receipt_plane_fillfactor_on_reprepare to call list_lease_partitions, loop calling reset_partition_reloptions, and use count_partitions_with_fillfactor50 for the assertion so both tests share the same logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@awa-model/tests/receipt_plane_fillfactor_test.rs`:
- Around line 89-94: In the lease_claims_partitions_carry_fillfactor_50 test,
extend the loop that currently asserts only has_option(opts, "fillfactor", "50")
to also assert the autovacuum reloptions like in
leases_partitions_carry_fillfactor_50: add assertions for has_option(opts,
"autovacuum_vacuum_scale_factor", "0.0") and has_option(opts,
"autovacuum_vacuum_threshold", "200") (and optionally the vacuum cost options if
your other test checks them) so the lease_claims partitions are validated for
the same autovacuum settings; use the existing has_option helper to perform
these checks.
---
Nitpick comments:
In `@awa-model/tests/receipt_plane_fillfactor_test.rs`:
- Around line 132-164: The test duplicates partition-discovery, per-partition
RESET, and a fillfactor=50 count; extract these into shared async helpers to
remove copy/paste: implement async fn list_lease_partitions(pool: &Pool, schema:
&str) -> Vec<String> that runs the SELECT (n.nspname || '.' || c.relname) query
currently inline, async fn reset_partition_reloptions(pool: &Pool, name: &str)
that executes the ALTER TABLE ... RESET (...) for a single partition, and async
fn count_partitions_with_fillfactor50(pool: &Pool, schema: &str) that runs the
existing COUNT(...) fillfactor=50 sanity query and returns the count; then
replace the inline code in
apply_receipt_plane_fillfactor_helper_restores_reset_partitions and
prepare_schema_reapplies_receipt_plane_fillfactor_on_reprepare to call
list_lease_partitions, loop calling reset_partition_reloptions, and use
count_partitions_with_fillfactor50 for the assertion so both tests share the
same logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2f785618-57ac-4361-b46c-dc3a89b65ce8
📒 Files selected for processing (3)
awa-model/migrations/v024_receipt_plane_fillfactor.sqlawa-model/src/queue_storage.rsawa-model/tests/receipt_plane_fillfactor_test.rs
Summary
First HOT-update audit landing for #169 (v0.6 release gate). Two partitioned receipt-plane tables take UPDATEs on the hot path and were left at the default fillfactor=100:
leases— heartbeat, state-transition, callback-field UPDATEslease_claims—SET materialized_at = clock_timestamp()The other receipt-plane partitioned tables (
ready_entries,done_entries,lease_claim_closures) have zero UPDATE call sites — verified by grep acrossqueue_storage.rsand the migration SQL functions — so they stay at fillfactor=100 to avoid wasting pages.This PR is the orthogonal-hygiene piece. For
leases, the heartbeat / state-transition UPDATEs still hit indexed columns (state_hb,state_deadline,state_callback_timeout,callback) so they remain non-HOT until the companion B1 work removes the heartbeat write and index in receipts mode. fillfactor=50 still matters here: it caps off-page tuple placement so the dead-tuple chain stays bounded under pinned-MVCC pressure. Mirrors thed21e5db/ab99a31pattern onqueue_claimer_leases/queue_claim_heads/queue_enqueue_heads.Mechanics
awa.install_queue_storage_substratenow applies the ALTERs inside the partition-creation FOR loops, so genuinely fresh installs (new DB, or fresh custom-schema prepare on a deployment whose initial migrate already includes this PR) land at fillfactor=50.awa.apply_receipt_plane_fillfactor(schema TEXT)which iterates the given schema'sleases/lease_claimspartitions and applies the same ALTER. Migration calls it for the defaultawaschema and every schema registered inawa.runtime_storage_backends.ALTER TABLE SET (fillfactor=N)only changes the hint for future page allocations, so this is non-blocking even on populated tables.The helper is
GRANT EXECUTE ... TO PUBLICso operators can run it ad-hoc against a custom schema they prepare after upgrading (the rare case where the install function still cached inpg_prochas the pre-v024 body).Test plan
cargo fmt --allcargo clippy --workspace --all-targets --all-features -- -D warningscargo build --workspaceawa-model/tests/receipt_plane_fillfactor_test.rs:leases_partitions_carry_fillfactor_50— full reloptions block on everyleases_Nlease_claims_partitions_carry_fillfactor_50— same forlease_claims_Ninsert_only_partitions_keep_default_fillfactor—ready_entries,done_entries,lease_claim_closuresstay at the default (catches future bulk-apply mistakes)apply_receipt_plane_fillfactor_helper_covers_custom_schemas— fresh custom schema + helper call lands at fillfactor=50awa migrateagainst local DB, confirmed viapg_class.reloptionsthat all 16 partitions (8 leases + 8 lease_claims) carry the full reloptions blockFollow-ups (out of scope for this PR)
leases.heartbeat_atin receipts mode, dropidx_state_hb, COALESCE compat-view reads fromattempt_state.heartbeat_at. Removes heartbeat-driven non-HOT UPDATEs entirely.long_horizonagainst post-PR-Amainto quantify the fillfactor impact on the leases / lease_claims dead-tuple curve.Summary by CodeRabbit
Release Notes
New Features
Tests
Chores