Skip to content

thingbuf::mpsc::Sender hanging up for parallel try_send_ref and send / send_ref from sync thread and async tokio::task #83

@sgasse

Description

@sgasse

Hi! We are using thingbuf in a performance-sensitive application for its speed while supporting sync and async interaction on the same sender type. Recently I ran into an issue where thingbuf seems to hang up our async application completely in busy loops. The profiling in the degraded state showed almost only calls to thingbuf.

I was able to reproduce the issue in a minimum example. You can run several variations of it yourself by checking out this repo:
https://github.com/sgasse/thingbuf_hangup/

The initial setup (binary thingbuf_sendref) to get into the hangup was this:

  • One std::thread sends with try_send_ref in a loop every 10ms.
  • One tokio::task receives on the channel with recv_ref().await in a loop. After 10s, there is a delay of 10s introduced between receive calls. This simulates badly handled backpressure from a downstream task.
  • One tokio::task starts sending with send(..).await in a loop after 20s.
  • One tokio::task logs an alive message every second.

Once the second sender becomes active, we no longer see any alive logs. Introducing logs to thingbuf shows that two threads (one tokio worker and the self-spawned thread) are both stuck in this loop in push_ref.

Here is some log output with line numbers from src/lib.rs from thingbuf for the hang-up scenario, this is infinitely:

...
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 301: tail at the end of the loop: 7757
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 286: head: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 301: tail at the end of the loop: 7757
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 286: head: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 301: tail at the end of the loop: 7757
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 286: head: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 202: push_ref loop starts
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 294: if
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(2) line 227: idx: 77, state: 7246
[2024-03-06T08:16:20Z DEBUG thingbuf] thread ThreadId(3) line 301: tail at the end of the loop: 7757
...

The initial setup mimics the behavior of a part of our real application. However I varied the setup in other examples, here are some findings:

  • Whether we try_send_ref() (sync), send_ref().await() or send(..).await (async) does not seem to matter, see examples thingbuf_sendref, thingbuf_sendref_pure, thingbuf_send_recvref and thingbuf_send_no_try_recvref which all hang-up.
  • When using recv().await instead of recv_ref().await, there is no hangup, see example thingbuf_send.
  • Switching from 1 worker thread to e.g. 3 sometimes leads to a few more logs printed immediately after the second sender starts but then, the example also hangs up.
  • My system is x86_64-unknown-linux-gnu, but I initially found it on aarch64-linux-android so I guess it does not depend on the platform.
  • The issue is reproducible with both rustc in version 1.76 and nightly, so probably not related to the compiler.
  • To check that there is nothing obviously wrong with the example setup, I replaced thingbuf::mpsc with tokio::mpsc in one example, which works as expected: It still logs the alive messages and does not hang up.

I would have expected the same behavior as I see with tokio::mpsc, so I guess it is a bug. But please let me know if there is a limitation which I overlooked or if I can provide further info.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions