Skip to content

fix(connlib): send all unwritten packets before reading new ones#7342

Merged
thomaseizinger merged 1 commit intomainfrom
chore/reserve-before-send
Nov 14, 2024
Merged

fix(connlib): send all unwritten packets before reading new ones#7342
thomaseizinger merged 1 commit intomainfrom
chore/reserve-before-send

Conversation

@thomaseizinger
Copy link
Member

@thomaseizinger thomaseizinger commented Nov 14, 2024

With the parallelisation of TUN and UDP operations, we lost backpressure: Packets can now be read quicker from the UDP sockets than they can be sent out the TUN device, causing packet loss in extremely high-throughput situations.

To avoid this, we don't directly send packets into the channel to the TUN device thread. This channel is bounded, meaning sending can fail if reading UDP packets is faster than writing packets to the TUN device.

Due to GRO, we may read multiple UDP packets in one go, requiring us to write multiple IP packets to the TUN device as part of a single iteration in the event-loop. Thus, we cannot know, how much space we need in the channel for outgoing IP packets.

By introducing a dedicated buffer, we can temporarily hold on to all of these packets and on the next call to poll, we flush them out into the channel. If the channel is full, we will suspend and only continue once there is space in the channel. This behaviour restores backpressue because we won't read UDP packets from the socket unless we have space to write the corresponding packet to the TUN device.

UDP itself actually doesn't have any backpressure, instead the packets will simply get dropped once the receive buffer overflows. The UDP packets however carry encrypted IP packets, meaning whatever protocol sits inside these packets will detect the packet loss and should throttle their sending-pace accordingly.

@vercel
Copy link

vercel bot commented Nov 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
firezone ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 14, 2024 5:04am

@sentry
Copy link

sentry bot commented Nov 14, 2024

Sentry Issue: GATEWAY-1

jamilbk
jamilbk previously approved these changes Nov 14, 2024
Copy link
Member

@jamilbk jamilbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a dangerous change? I.e. is it possible this conditional will never evaluate true for some edge case?

If that happens would all packet processing basically stop?

@thomaseizinger thomaseizinger added this pull request to the merge queue Nov 14, 2024
@thomaseizinger
Copy link
Member Author

Is this a dangerous change? I.e. is it possible this conditional will never evaluate true for some edge case?

If that happens would all packet processing basically stop?

If there is no capacity in the channel, that means the worker thread that sends packets out the TUN device stopped and can no longer send packets to the TUN device.

In other words, if this repeatedly returns false, packet processing has already stopped for some other reason.

@thomaseizinger thomaseizinger removed this pull request from the merge queue due to a manual request Nov 14, 2024
@thomaseizinger
Copy link
Member Author

I think we may have to add a waker here to avoid potentially stalling the event-loop.

@thomaseizinger thomaseizinger force-pushed the chore/reserve-before-send branch from d879fe8 to 8820143 Compare November 14, 2024 03:47
@thomaseizinger thomaseizinger changed the title fix(connlib): reserve capacity for outbound packets fix(connlib): send all unwritten packets before reading new ones Nov 14, 2024
@thomaseizinger thomaseizinger force-pushed the chore/reserve-before-send branch from 8820143 to cc88195 Compare November 14, 2024 03:54
@thomaseizinger
Copy link
Member Author

@jamilbk I came up with a much better strategy to fix this.

@thomaseizinger
Copy link
Member Author

Sentry Issue: GATEWAY-1

This will fix some of the Sentry spam that we are getting from our CI runs.

Copy link
Member

@jamilbk jamilbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be getting better at recognizing risky changes 👍

@thomaseizinger thomaseizinger added this pull request to the merge queue Nov 14, 2024
Merged via the queue into main with commit 4e423dc Nov 14, 2024
@thomaseizinger thomaseizinger deleted the chore/reserve-before-send branch November 14, 2024 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants