-
-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Hi! We are using thingbuf in a performance-sensitive application for its speed while supporting sync and async interaction on the same sender type. Recently I ran into an issue where thingbuf seems to hang up our async application completely in busy loops. The profiling in the degraded state showed almost only calls to thingbuf.
I was able to reproduce the issue in a minimum example. You can run several variations of it yourself by checking out this repo:
https://github.com/sgasse/thingbuf_hangup/
The initial setup (binary thingbuf_sendref) to get into the hangup was this:
- One
std::threadsends withtry_send_refin a loop every 10ms. - One
tokio::taskreceives on the channel withrecv_ref().awaitin a loop. After 10s, there is a delay of 10s introduced between receive calls. This simulates badly handled backpressure from a downstream task. - One
tokio::taskstarts sending withsend(..).awaitin a loop after 20s. - One
tokio::tasklogs an alive message every second.
Once the second sender becomes active, we no longer see any alive logs. Introducing logs to thingbuf shows that two threads (one tokio worker and the self-spawned thread) are both stuck in this loop in push_ref.
Here is some log output with line numbers from src/lib.rs from thingbuf for the hang-up scenario, this is infinitely:
...
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 301: tail at the end of the loop: 7757
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 286: head: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 301: tail at the end of the loop: 7757
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 286: head: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 301: tail at the end of the loop: 7757
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 286: head: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 301: tail at the end of the loop: 7757
...
The initial setup mimics the behavior of a part of our real application. However I varied the setup in other examples, here are some findings:
- Whether we
try_send_ref()(sync),send_ref().await()orsend(..).await(async) does not seem to matter, see examplesthingbuf_sendref,thingbuf_sendref_pure,thingbuf_send_recvrefandthingbuf_send_no_try_recvrefwhich all hang-up. - When using
recv().awaitinstead ofrecv_ref().await, there is no hangup, see examplethingbuf_send. - Switching from 1 worker thread to e.g. 3 sometimes leads to a few more logs printed immediately after the second sender starts but then, the example also hangs up.
- My system is
x86_64-unknown-linux-gnu, but I initially found it onaarch64-linux-androidso I guess it does not depend on the platform. - The issue is reproducible with both
rustcin version1.76andnightly, so probably not related to the compiler. - To check that there is nothing obviously wrong with the example setup, I replaced
thingbuf::mpscwithtokio::mpscin one example, which works as expected: It still logs the alive messages and does not hang up.
I would have expected the same behavior as I see with tokio::mpsc, so I guess it is a bug. But please let me know if there is a limitation which I overlooked or if I can provide further info.