Skip to content

fix(connlib): split TUN send & recv into separate threads#8117

Merged
thomaseizinger merged 2 commits into
mainfrom
fix/perf-regression
Feb 14, 2025
Merged

fix(connlib): split TUN send & recv into separate threads#8117
thomaseizinger merged 2 commits into
mainfrom
fix/perf-regression

Conversation

@thomaseizinger

@thomaseizinger thomaseizinger commented Feb 13, 2025

Copy link
Copy Markdown
Member

We appear to have caused a pretty big performance regression (~40%) in 037a2e6 (identified through git-bisect). Specifically, the regression appears to have been caused by aef411a (#7605). Weirdly enough, undoing just that on top of main doesn't fix the regression.

My hypothesis is that using the same file descriptor for read AND write interests on the same runtime causes issues because those interests are occasionally cleared (i.e. on false-positive wake-ups).

In this PR, we spawn a dedicated thread each for the sending and receiving operations of the TUN device. On unix-based systems, a TUN device is just a file descriptor and can therefore simply be copied and read & written to from different threads. Most importantly, we only construct the AsyncFd within the newly spawned thread and runtime because constructing an AsyncFd implicitly registers with the runtime active on the current thread.

As a nice benefit, this allows us to get rid of a future::select. Those are always kind of nasty because they cancel the future that wasn't ready. My original intuition was that we drop packets due to cancelled futures there but that could not be confirmed in experiments.

@vercel

vercel Bot commented Feb 13, 2025

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 14, 2025 1:16am

@jamilbk jamilbk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on Apple. Replicated the 40% uplift, primarily on the upload throughput.

@jamilbk

jamilbk commented Feb 13, 2025

Copy link
Copy Markdown
Member

May want to add a changelog note (I noticed customers do read these)

@thomaseizinger thomaseizinger added this pull request to the merge queue Feb 14, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Feb 14, 2025
@thomaseizinger thomaseizinger added this pull request to the merge queue Feb 14, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Feb 14, 2025
@thomaseizinger thomaseizinger added this pull request to the merge queue Feb 14, 2025
Merged via the queue into main with commit 10ba02e Feb 14, 2025
@thomaseizinger thomaseizinger deleted the fix/perf-regression branch February 14, 2025 05:48
github-merge-queue Bot pushed a commit that referenced this pull request Feb 17, 2025
Same as done for unix-based operation systems in #8117, we introduce a
dedicated "TUN send" thread for Windows in this PR. Not only does this
move the syscalls and copying of sending packets away from `connlib`'s
main thread but it also establishes backpressure between those threads
properly.

WinTUN does not have any ability to signal that it has space in its send
buffer. If it fails to allocate a packet for sending, it will return
`ERROR_BUFFER_OVERFLOW` [0]. We now handle this case gracefully by
suspending the send thread for 10ms and then try again. This isn't a
great way of establishing back-pressure but at least we don't have any
packet loss.

To test this, I temporarily lowered the ring buffer size and ran a speed
test. In that, I could confirm that `ERROR_BUFFER_OVERFLOW` is indeed
emitted and handled as intended.

[0]: https://git.zx2c4.com/wintun/tree/api/session.c#n267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants