Minimize awaiting in the primary qorb pool's tokio::select #118

smklein · 2025-10-29T00:08:12Z

This PR does not fully eliminate "await from tokio::select", nor does it fully eliminate the use of borrowed futures in tokio::select. However, for the main Pool worker, it avoids await-ing for accessing claims and rebalancing.

smklein · 2025-10-30T22:28:09Z

src/slot.rs

+        name = "Slots::claim",
+        fields(name = ?self.name),
+    )]
+    fn claim(&mut self, id: ClaimId) -> Result<claim::Handle<Conn>, Error> {


This function has basically been moved from the SetWorker to this new struct, Slots

And, the whole chain of awaits has been unraveled because this used to be a message sent to the SetWorker actor, but is now done by just locking the Slots?

The SlotSet used to have an async fn claim method that used a oneshot to communicate with the SlotWorker. The worker then called a synchronous claim function and returned the result.

This new structure avoids the message passing to the SlotWorker, by pulling out this Slots structure. Now, there's no message passing -- as you said, yeah, we're just locking Slots and pulling the connection out of it.

smklein · 2025-10-30T22:28:30Z

src/slot.rs

-        name = "SetWorker::create_slot"
+        name = "Slots::take_connected_unclaimed_slot"
    )]
+    fn take_connected_unclaimed_slot(


This function has basically been moved from the SetWorker to this new struct, Slots

smklein · 2025-10-30T22:28:34Z

src/slot.rs

+        self.conform_slot_count();
+    }
+
+    fn conform_slot_count(&mut self) {


This function has basically been moved from the SetWorker to this new struct, Slots

smklein · 2025-10-30T22:30:03Z

src/slot.rs

+// Provides direct access to all underlying slots
+//
+// Shared by both a [`SetWorker`] and [`Set`]
+struct Slots<Conn: Connection> {


This is the biggest crux of this change: we move this data out of the SetWorker, into a std::mutex-wrapped object, so callers can access it from the pool without using SetRequests and message-passing.

This means we can avoid await-ing from the pool.

smklein · 2025-10-30T22:30:41Z

src/pool.rs

                        Some(Request::Claim { id, tx }) => {
-                            self.claim_or_enqueue(id, tx).await
+                            self.claim_or_enqueue(id, tx)
                        }


This is "why" we're making this change -- so we don't need to await in the select arms, to reduce the risk of futurelock.

hawkw

Overall, I think making this synchronous is the right move. However, I did note that we may be able to get away with using a read-write lock around the Slots instead of a Mutex, so that multiple claimants can claim slots concurrently, and the entire thing only needs to be locked when changing the number of slots. I think that would make claiming slots less of a bottleneck. Does that make sense to you?

hawkw · 2025-10-30T22:36:34Z

src/slot.rs

+        name = "Slots::claim",
+        fields(name = ?self.name),
+    )]
+    fn claim(&mut self, id: ClaimId) -> Result<claim::Handle<Conn>, Error> {


And, the whole chain of awaits has been unraveled because this used to be a message sent to the SetWorker actor, but is now done by just locking the Slots?

hawkw · 2025-10-30T22:40:38Z

src/slot.rs

+        name = "Slots::take_connected_unclaimed_slot"
    )]
+    fn take_connected_unclaimed_slot(
+        &mut self,


Why does this borrow self mutably? It looks like each slot has its own mutex, and this function just iterates over them immutably and locks each slot. At a glance, I think that this should be able to take &self.

If we can make that change, I think that Slots::claim could be changed to take &self. Then, we could change the Mutex around the Slots to be an RwLock, allowing concurrent calls to claim, and making it so that only set_wanted_count needs a write lock. That way, we might be able to get away with locking the whole thing only when we are changing the number of slots, and not when we're claiming them.

I think this should work, but I may be overlooking a place where we mutate the whole Slots in the claim path...?

You're definitely right that we can change &mut self to &self -- and I'll make that change now. On the RwLock note -- I think this is technically true, that we can make it a RwLock to allow multiple readers to claim concurrently.

However, the pool is definitely serializing these incoming claim requests, as it's checking each backend. I think this cost isn't terrible -- especially after this PR, it's basically doing quick synchronous work, rather than blocking on any I/O -- but I kinda would want to do some benchmarking before moving to an rwlock, and have a compelling way to actually call claim concurrently. Since the pool is the only consumer, and it's not calling claim concurrently on any SlotSets, using a RwLock instead of a Mutex seems like it would just add some mild overhead without (yet) inferring an advantage.

You okay if I defer using the RwLock? I'm not fundamentally opposed to using it, I just want to be cautious with introducing a change like that without a good higher-level justification, and without a more careful analysis of e.g., "could writers starve readers".

Oh, I misunderstood how this works; I thought that consumers were directly calling claim and that this might happen in parallel. If the only consumer of the Slots type is the pool itself, what is the mutex around it doing here?

The only external consumer in this case is the "pool", which is calling claim directly. It's true that this could happen in parallel with the API of the slot.rs, but the structure of pool.rs means that it currently won't be called in parallel. That's kinda what I meant by the comment above -- an RwLock has the potential to improve things, but we'd need to re-write pool.rs to make that actually useful.

There is an internal consumer of "all slots" - the SetWorker also has an Arc<Mutex<Slots>>, so it can:

Change the total number of slots

Recycle slots that get dropped and returned to qorb (this basically starts the recycling process; the actual I/O is managed by the slot itself).

Additionally, there's a per-slot task, spawned in create_slot, which locks individual slots, and does the actual work of recycling/health checking.

Yeah, that makes sense, thanks for the explanation. I think this change makes sense to me as-is, and I agree that it's better to keep it minimal. I do wonder about the potential opportunities for additional refactoring, but I think that's best done separately.

I filed #120 to keep track of this

smklein added 2 commits October 28, 2025 17:05

Minimize awaiting in the primary qorb pool's tokio::select

4e52dc0

field ordering

d2a862e

smklein commented Oct 30, 2025

View reviewed changes

smklein requested review from davepacheco and hawkw October 30, 2025 22:31

smklein mentioned this pull request Oct 30, 2025

Continue protecting against futurelock #119

Open

hawkw reviewed Oct 30, 2025

View reviewed changes

less &mut

ae809cf

hawkw approved these changes Oct 31, 2025

View reviewed changes

smklein mentioned this pull request Oct 31, 2025

Slots could be accessed via rwlock? #120

Open

smklein merged commit e6b4628 into master Oct 31, 2025
4 checks passed

smklein deleted the minimize-await branch October 31, 2025 21:01

Minimize awaiting in the primary qorb pool's tokio::select #118

Minimize awaiting in the primary qorb pool's tokio::select #118

Uh oh!

Conversation

smklein commented Oct 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants