Skip to content

Conversation

@losman0s
Copy link
Contributor

Problem

fd_quic currently allows 1-RTT "flag" packets (PING, MAX_DATA, etc) to be sent even when handshake is ongoing.
Those packets are then discarded server-side (at least in quinn-rs) due to decryption failure.
From my quick testing, the current scheduling of e.g. PINGs vs handshake completion does not lead to many (any?) such occurrences in normal operation, but I did observe some when artificially delaying the handshake replies.

Opening this PR for consideration from eyes more familiar with fd_quic.

Solution

Require the connection to be in ACTIVE state before allowing fd_quic_enc_level_appdata_id due to pending flag packets.

This PR also gates PINGs consistently between the first and second scheduling sites.

@losman0s losman0s changed the title fix: premature 1-RTT packets quic: premature 1-RTT packets Oct 17, 2025
Copy link
Contributor

@akhinvasara-jumptrading akhinvasara-jumptrading left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catches, thank you! 🔥 Left a couple suggestions to tighten it up and grab some adjacent wins.

/* only allow 1-RTT "flag" frames when connection is ACTIVE, to prevent e.g. early 1-RTT PINGs */
if( conn->flags
&& conn->upd_pkt_number >= app_pkt_number
&& conn->state == FD_QUIC_CONN_STATE_ACTIVE ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of state ACTIVE, let's check that we have 1-rtt keys available.
fd_uint_extract_bit( conn->keys_avail, fd_quic_enc_level_appdata_id )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually used that check initially, but changed to ACTIVE state check since, iiuc:

  • client-side 1-RTT keys are available as soon as Handshake is received in response to Initial (at which point client is in HANDSHAKE_COMPLETE)
  • server still rejects short headers until its connection is in Established state, which only happens once it receives Handshake back from the client.
  • the earliest safe signal that we can send 1-RTT is if we have 1-RTT keys locally & peer_enc_level == appdata, which anyway only happens when we receive HANDSHAKE_DONE / transition to ACTIVE (unless there is some packet reordering, but idk how often that happens/significant it is)

Lmk if you still want me to amend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr: yes pls, amending would be great :) thank you!

I think amending would still be good, because waiting for the handshake done turns the 1-rtt into 2-rtt.
It's def buggy to send 1-rtt before we have the keys for it - but the uncommon case of packet reordering isn't worth always waiting the extra round trip.
It's also OK for the server to fail to decrypt, it should just drop the packet, not kill the connection. And we will just retx the packet anyway.

Copy link
Contributor

@akhinvasara-jumptrading akhinvasara-jumptrading left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks v much for catching this and doing a couple iterations of the impl!

nbridge-jump
nbridge-jump previously approved these changes Oct 21, 2025
Copy link
Contributor

@nbridge-jump nbridge-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
Nice work

@losman0s losman0s force-pushed the early-1rtt branch 2 times, most recently from 730db26 to 8b1d62d Compare October 22, 2025 06:30
@losman0s
Copy link
Contributor Author

I pushed a minimal revision of the test_quic_bw test, which started to fail with this commit.

It looks like the test couldn't handle the new flow:

  1. client PING is not gated by ACTIVE, so PING gets sent as soon as 1-RTT keys are received from server
  2. server sends HANDSHAKE_DONE as soon as it receives PING (vs previously when it received Hanshake back from client)
  3. client receives HANDSHAKE_DONE while still in HANDSHAKE (vs previously HANDSHAKE_COMPLETE), which it ignores
  4. client only receives and processes HANDSHAKE_DONE at retry from server

The test was only cranking 20 iterations as fast as possible, even though it uses wall clock, so it was never reaching the retry.

One thing about it is that it shows that by sending 1-RTT PING as soon as we have the keys, we might be introducing a systematic 500ms delay to HS completion, which could be worse than a RTT from just gating appdata enc_level on ACTIVE state?

Lmk what you want to do.

Copy link
Contributor

@akhinvasara-jumptrading akhinvasara-jumptrading left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks very much

@akhinvasara-jumptrading akhinvasara-jumptrading merged commit d2fd47d into firedancer-io:main Nov 2, 2025
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants