fix: reduce detach latency and stabilize detach path#324
fix: reduce detach latency and stabilize detach path#324ethanpailes merged 8 commits intoshell-pool:masterfrom
Conversation
757b1bd to
994ea81
Compare
|
Oh come on - formatting of a comment? Really? 😄 I'll lookup the correct formatting rules later. |
1013438 to
b689d04
Compare
b689d04 to
af4a3cd
Compare
ethanpailes
left a comment
There was a problem hiding this comment.
A good start. I left some feedback on things we can to do reuse code a bit.
libshpool/src/daemon/shell.rs
Outdated
| } | ||
|
|
||
| thread::sleep(consts::HEARTBEAT_DURATION); | ||
| let mut slept = time::Duration::ZERO; |
There was a problem hiding this comment.
Let's bundle this logic up into a helper function that takes a stop predicate function (in this case it would be parameterized with a closure that checks the stop atomic), a poll strategy enum, a sleep duration. It seems like a generically useful pattern. We could put it in libshpool/src/common.rs
The poll strategy enum can have one option for uniform polling with a given duration (which we would use here) and another for exponential backoff with the usual exponential backoff params (initial poll dur, backoff factor, max poll dur).
we could call it sleep_unless or something like that.
libshpool/src/protocol.rs
Outdated
| if sock_to_stdout_h.is_finished() { | ||
| info!("recheck: sock->stdout thread done"); | ||
| nfinished_threads += 1; | ||
| // Fast-path: when server->client already ended (detach/disconnect), |
There was a problem hiding this comment.
Lets keep the thread names consistant and call this sock->stdout
libshpool/src/protocol.rs
Outdated
| let mut stdin_done = stdin_to_sock_h.is_finished(); | ||
| let mut stdout_done = sock_to_stdout_h.is_finished(); | ||
|
|
||
| let stdin_is_tty = isatty(io::stdin()).unwrap_or(false); |
There was a problem hiding this comment.
I generally like to avoid isatty when possible since it makes it harder to predict how a tool will work when running under a script. Sometimes it is worthwhile, but in this case I don't think it is worth having divergant behavior. I don't see a reason we would need to wait around longer in a script context, so let's just always use the short timeout if the daemon hangs up on us. We should re-name the constant to avoid mentioning TTY when we do this.
libshpool/src/daemon/shell.rs
Outdated
| thread::sleep(consts::HEARTBEAT_DURATION); | ||
| let mut slept = time::Duration::ZERO; | ||
| let sleep_step = consts::JOIN_POLL_DURATION; | ||
| while slept < consts::HEARTBEAT_DURATION { |
There was a problem hiding this comment.
Rather than incrementing a counter of total duration slept, these sorts of loops should instead set a specific deadline in the future and always compare the current time against that. This avoid lock drift because thread::sleep(d) does not alwasy take exactly d. These little mismatches can add up over time. Not really a huge deal, but it doesn't hurt to be as precice as possible.
libshpool/src/daemon/shell.rs
Outdated
| } | ||
|
|
||
| let remaining = consts::HEARTBEAT_DURATION - slept; | ||
| let step = if remaining < sleep_step { remaining } else { sleep_step }; |
There was a problem hiding this comment.
lets use min the way you do below. Also, we probably don't need an explicit named variable for remaining, we can just inline it into the min params for brevity.
libshpool/src/protocol.rs
Outdated
| MAX_DETACH_WAIT_DUR | ||
| }; | ||
|
|
||
| loop { |
There was a problem hiding this comment.
This loop can switch to using the helper I suggested above, with its stop predicate computing nfinished_threads and checking if the count is >= 2.
|
Phew, that was way more work than I expected. Hopefully I haven't overlooked anything or made it more complicated than planned. And - of course - I missed one 😅 |
ethanpailes
left a comment
There was a problem hiding this comment.
Nearly there, just one last nit about the new sleep API
libshpool/src/common.rs
Outdated
| /// Poll at a fixed interval. | ||
| Uniform { interval: time::Duration }, | ||
| /// Poll with exponential backoff up to a maximum interval. | ||
| Backoff { initial_interval: time::Duration, factor: u32, max_interval: time::Duration }, |
There was a problem hiding this comment.
Let's use a float for the factor. The most common backoff factors are 2 and 1.5, and if we use an int we can't handle 1.5.
ethanpailes
left a comment
There was a problem hiding this comment.
I switched factor to a float.
Summary
Fixes #323, detach latency by making detach path responsive with faster polling and adaptive backoff in client protocol loop.
Changes
JOIN_POLL_DURATIONfrom 100ms to 10ms inlibshpool/src/consts.rsfor faster thread join pollingSHELL_TO_CLIENT_POLL_MSfrom 100 to 10 inlibshpool/src/daemon/shell.rsfor faster detach/reattach detectionlibshpool/src/daemon/shell.rslibshpool/src/protocol.rsto check stop flag during sleep intervalsBehavior
Before this fix
After this fix
Rationale
The previous polling intervals (100ms) were too conservative, causing noticeable latency in the detach operation. By reducing polling to 10ms and implementing adaptive backoff, detach becomes responsive while maintaining compatibility with existing shutdown flows and avoiding busy waits.
Tests
No tests have been added or modified.