Skip to content

refactor(connlib): don't re-implement waker for TUN thread#7944

Merged
thomaseizinger merged 2 commits into
mainfrom
refactor/no-reimplement-channel-waker
Jan 29, 2025
Merged

refactor(connlib): don't re-implement waker for TUN thread#7944
thomaseizinger merged 2 commits into
mainfrom
refactor/no-reimplement-channel-waker

Conversation

@thomaseizinger
Copy link
Copy Markdown
Member

Within connlib - on UNIX platforms - we have dedicated threads that read from and write to the TUN device. These threads are connected with connlib's main thread via bounded channels: one in each direction. When these channels are full, connlib's main thread will suspend and not read any network packets from the sockets in order to maintain back-pressure. Reading more packets from the socket would mean most likely sending more packets out the TUN device.

When debugging #7763, it became apparent that something must be wrong with these threads and that somehow, we either consider them as full or aren't emptying them and as a result, we don't read any network packets from our sockets.

To maintain back-pressure here, we currently use our own AtomicWaker construct that is shared with the TUN thread(s). This is unnecessary. We can also directly convert the flume::Sender into a flume::async::SendSink and therefore directly access a poll interface.

@vercel
Copy link
Copy Markdown

vercel Bot commented Jan 29, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 29, 2025 1:10pm

Copy link
Copy Markdown
Member

@jamilbk jamilbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything to remove complexity LGTM. I think the hunch about not reading network packets because we can't write to the TUN device makes sense.

I guess the key question is why can't we write to the TUN device. If it's due to an OS error or something outside of our control, I guess we'd have to consider adding logic that detects a "stuck" channel and rolls over the file descriptor?

That wouldn't be an option on Apple though...

@thomaseizinger thomaseizinger added this pull request to the merge queue Jan 29, 2025
Merged via the queue into main with commit 8bd8098 Jan 29, 2025
@thomaseizinger thomaseizinger deleted the refactor/no-reimplement-channel-waker branch January 29, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants