Skip to content

Stabilize striped queue-storage runtime claims#215

Merged
hardbyte merged 1 commit into
mainfrom
brian/queue-stripe-claim-transactions
May 3, 2026
Merged

Stabilize striped queue-storage runtime claims#215
hardbyte merged 1 commit into
mainfrom
brian/queue-stripe-claim-transactions

Conversation

@hardbyte
Copy link
Copy Markdown
Owner

@hardbyte hardbyte commented May 3, 2026

Summary

  • Claim each physical queue stripe in its own transaction for striped queue-storage runtime claims.
  • Extend the TLA+ lock-order model with striped enqueue and old multi-stripe claim transaction coverage.
  • Add a concurrent striped enqueue/claim regression test.
  • Expose queue_storage_queue_stripe_count through the Python start API and docs.

Benchmark evidence

256 workers, depth-target producer, copy producer path, clean phase medians:

Run Throughput/s Depth p95 ms p99 ms
Old main, 1 stripe, DB depth 4000 20,588 5,532 529 627
This branch, 1 stripe, DB depth 4000 18,734 6,960 557 647
This branch, 2 stripes, DB depth 4000 18,790 5,748 497 594
Main + harness local-started controller, 1 stripe, depth 1500 16,802 1,180 116 269
This branch + harness local-started controller, 2 stripes, depth 1500 17,440 1,244 105 143

The first attempted 2-stripe run on main produced queue_storage deadlocks and then pool starvation; this branch completed the same 2-stripe shape cleanly.

The local-started controller is still a benchmark-harness experiment and is not part of this PR.

Tests

cargo check -p awa-model -p awa-worker

PYO3_PYTHON="/Users/brian/.local/share/uv/python/cpython-3.14.3-macos-aarch64-none/bin/python" \
  cargo check --manifest-path awa-python/Cargo.toml

cargo test -p awa --test queue_storage_runtime_test striped -- --nocapture

uv run pytest tests/test_start_config.py -k 'claim_ring_knobs_validate'

./correctness/run-tlc.sh \
  storage/AwaStorageLockOrder.tla \
  storage/AwaStorageLockOrder.cfg

./correctness/run-tlc.sh \
  storage/AwaStorageLockOrder.tla \
  storage/AwaStorageLockOrderDeadlockDemo.cfg
# expected: NoDeadlock violation

./correctness/run-tlc.sh \
  storage/AwaStorageLockOrder.tla \
  storage/AwaStorageLockOrderOldStripedClaimDeadlock.cfg
# expected: NoDeadlock violation

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@hardbyte has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 9 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d2b685a-83c9-408e-9057-7664cc3c9f28

📥 Commits

Reviewing files that changed from the base of the PR and between 624574e and 3aabb90.

📒 Files selected for processing (13)
  • awa-model/src/queue_storage.rs
  • awa-python/python/awa/_awa.pyi
  • awa-python/python/awa/client.py
  • awa-python/src/client.rs
  • awa-python/tests/test_start_config.py
  • awa/tests/queue_storage_runtime_test.rs
  • correctness/README.md
  • correctness/storage/AwaStorageLockOrder.cfg
  • correctness/storage/AwaStorageLockOrder.tla
  • correctness/storage/AwaStorageLockOrderDeadlockDemo.cfg
  • correctness/storage/AwaStorageLockOrderOldStripedClaimDeadlock.cfg
  • correctness/storage/MAPPING.md
  • docs/configuration.md
📝 Walkthrough

Walkthrough

A queue storage refactoring exposes striping configuration through the Python API: claim_runtime_batch_with_aging now conditionally routes to multiple physical stripe queues only when needed, with per-queue claim logic extracted to a helper. The new queue_storage_queue_stripe_count parameter flows from Python bindings through Rust layers into queue-storage configuration, validated and tested for concurrent striped claim and enqueue workloads.

Changes

Queue Striping Configuration & Implementation

Layer / File(s) Summary
Core Implementation
awa-model/src/queue_storage.rs
claim_runtime_batch_with_aging refactored to conditionally fan-out across striped physical queues only when multiple stripes exist; per-physical-queue claim logic moved to new helper claim_runtime_batch_with_aging_physical.
Python API Surface
awa-python/python/awa/_awa.pyi, awa-python/python/awa/client.py
Added queue_storage_queue_stripe_count: int = 1 parameter to Client.start() with corresponding default constant DEFAULT_QUEUE_STORAGE_QUEUE_STRIPE_COUNT.
Rust Binding Layer
awa-python/src/client.rs
PyClient.start() wires queue_storage_queue_stripe_count into QueueStorageConfig, with validation rejecting values ≤ 0.
Integration Tests
awa/tests/queue_storage_runtime_test.rs, awa-python/tests/test_start_config.py
Added test_queue_storage_striped_runtime_claims_do_not_deadlock_with_enqueues to verify concurrent striped enqueue/claim operations don't deadlock, and added validation test case for stripe-count bounds checking.
Documentation
docs/configuration.md
Updated Python configuration example and "knobs mean" table to include and clarify queue_storage_queue_stripe_count alongside queue_stripe_count.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Make queue claimer cap configurable #214 — Both PRs modify the runtime claim path; the current PR refactors claim_runtime_batch_with_aging into per-physical-queue helpers, while that PR changes callers to pass configurable parameters into the claim method, creating a direct code-level dependency.
  • Reduce claimer heartbeat churn #213 — Both PRs edit awa-model/src/queue_storage.rs's runtime claiming logic; this PR extracts per-physical-queue helpers for striped handling, while that PR modifies claimer lease and active-marker invocation, sharing the same control-flow region.

Poem

🐰 Stripes dance through queues so clean,
With fangs of logic in between,
No deadlocks in this bouncy chain—
Each rabbit claim, a perfect reign! 🌳✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the primary change: stabilizing striped queue-storage runtime claims through transaction isolation per physical queue stripe.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch brian/queue-stripe-claim-transactions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 31 minutes and 9 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 624574e689

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread awa-model/src/queue_storage.rs Outdated
Comment on lines +5123 to +5130
self.claim_runtime_batch_with_aging_physical(
pool,
stripe_queue,
remaining,
deadline_duration,
aging_interval,
)
.await?,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve already-claimed jobs on later stripe failures

In the striped path (queue_stripe_count > 1), each call to claim_runtime_batch_with_aging_physical commits its own transaction before returning, but the loop still uses ? for later stripes. If stripe N succeeds and stripe N+1 hits a transient DB error (e.g., deadlock/timeout), this function returns Err after some jobs are already moved to running, and the caller never receives those claimed jobs to execute. That can strand work until rescue/deadline logic runs, which is a correctness and latency regression compared with the previous all-in-one transaction behavior.

Useful? React with 👍 / 👎.

@hardbyte hardbyte force-pushed the brian/queue-stripe-claim-transactions branch from 624574e to c991838 Compare May 3, 2026 05:11
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 643df26a-12e7-4ca1-8132-f79fe8c68c65

📥 Commits

Reviewing files that changed from the base of the PR and between 8ebcbc5 and 624574e.

📒 Files selected for processing (7)
  • awa-model/src/queue_storage.rs
  • awa-python/python/awa/_awa.pyi
  • awa-python/python/awa/client.py
  • awa-python/src/client.rs
  • awa-python/tests/test_start_config.py
  • awa/tests/queue_storage_runtime_test.rs
  • docs/configuration.md

Comment thread awa-model/src/queue_storage.rs
@hardbyte hardbyte force-pushed the brian/queue-stripe-claim-transactions branch from c991838 to 3aabb90 Compare May 3, 2026 05:37
@hardbyte hardbyte merged commit d930e72 into main May 3, 2026
24 of 25 checks passed
@hardbyte hardbyte deleted the brian/queue-stripe-claim-transactions branch May 3, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant