Skip to content

Conversation

@thomaseizinger
Copy link
Member

WireGuard has several internal timers. For example, if we have previously sent a data packet on a session and the session becomes older than 120s, we automatically perform a re-key. This ensures session keys are constantly rolled over on sessions that are actively used. If a session is not used at all, it expires after 180s.

There is a race condition where a packet may be sent on a previously entirely idle session but just before it expires, i.e. at second 179 of the session. In that case, the sending party will still happily encrypt it but depending on the latency to the receiver, the session may be expired on their end by the time it arrives.

To fix this, we introduce a new timer called SHOULD_NOT_USE_AFTER_TIME which is defined as REJECT_AFTER_TIME - KEEPALIVE_TIMEOUT. If we attempt to send a packet on a session that is about to expire and we are the initiator of the current session, we instead buffer the packet and emit a new handshake.

It is important that only the current initiator performs this check, otherwise we would introduce another race condition where the responder would initiate the next session in case it wants to sends an application-level reply just after SHOULD_NOT_USE_AFTER_TIME.

@jamilbk
Copy link
Member

jamilbk commented Jan 23, 2025

Out of curiosity, is this an issue seen by the broader community?

How does the kernel module behave under this condition?

update_timer_results_in_handshake(&mut my_tun, &mut now);
}

/// If a tunnel is idle for close to 120s without sending a packet,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we use for "close to", one second? This seems quite arbitrary, no?

I'm curious how the spec handles this edge case. Why not just let the session fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomaseizinger
Copy link
Member Author

We agreed to merge this here but wait to ship it in Firezone.

@thomaseizinger thomaseizinger added this pull request to the merge queue Jan 23, 2025
Merged via the queue into master with commit 935d9ec Jan 24, 2025
13 checks passed
github-merge-queue bot pushed a commit to firezone/firezone that referenced this pull request Jan 24, 2025
The committed regression seeds trigger a scenario where the WireGuard
sessions of the peers expire in a way where by the time the Client sends
the packet, it is still active (179.xx seconds old) and with the latency
to the Gateway, the 180s mark is reached and the Gateway clears the
session and discards the packet as a result.

In order to fix this, I opted to patch WireGuard by introducing a new
timer that does not allow the initiator to use a session that is almost
expired: firezone/boringtun#68.

Resolves: #7832.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
@jamilbk
Copy link
Member

jamilbk commented Jan 24, 2025

One question I thought of as I'm investigating firezone/firezone#7747:

If we now buffer this packet instead of dropping it, does this introduce any chance for a memory leak?

I.e. is this buffer then dropped if the session expires or the peer becomes unresponsive?

@thomaseizinger
Copy link
Member Author

If we now buffer this packet instead of dropping it, does this introduce any chance for a memory leak?

Buffers in boringtun are bound to at most 256 packets IIRC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants