Stabilize striped queue-storage runtime claims#215
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughA queue storage refactoring exposes striping configuration through the Python API: ChangesQueue Striping Configuration & Implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 31 minutes and 9 seconds.Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 624574e689
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self.claim_runtime_batch_with_aging_physical( | ||
| pool, | ||
| stripe_queue, | ||
| remaining, | ||
| deadline_duration, | ||
| aging_interval, | ||
| ) | ||
| .await?, |
There was a problem hiding this comment.
Preserve already-claimed jobs on later stripe failures
In the striped path (queue_stripe_count > 1), each call to claim_runtime_batch_with_aging_physical commits its own transaction before returning, but the loop still uses ? for later stripes. If stripe N succeeds and stripe N+1 hits a transient DB error (e.g., deadlock/timeout), this function returns Err after some jobs are already moved to running, and the caller never receives those claimed jobs to execute. That can strand work until rescue/deadline logic runs, which is a correctness and latency regression compared with the previous all-in-one transaction behavior.
Useful? React with 👍 / 👎.
624574e to
c991838
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 643df26a-12e7-4ca1-8132-f79fe8c68c65
📒 Files selected for processing (7)
awa-model/src/queue_storage.rsawa-python/python/awa/_awa.pyiawa-python/python/awa/client.pyawa-python/src/client.rsawa-python/tests/test_start_config.pyawa/tests/queue_storage_runtime_test.rsdocs/configuration.md
c991838 to
3aabb90
Compare
Summary
queue_storage_queue_stripe_countthrough the Python start API and docs.Benchmark evidence
256 workers, depth-target producer, copy producer path, clean phase medians:
The first attempted 2-stripe run on main produced
queue_storagedeadlocks and then pool starvation; this branch completed the same 2-stripe shape cleanly.The local-started controller is still a benchmark-harness experiment and is not part of this PR.
Tests