-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Backport Fix Hello Retry Request failure in QUIC to 3.5 #28189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Norbert Pocs <norbertp@openssl.org>
When HRR happens a second client hello is sent and it consist of a transport params extension. This must be processed and not cause failure. Fixes: openssl/project#1296 Signed-off-by: Norbert Pocs <norbertp@openssl.org>
Recently, our overnight QUIC interop runs began failing in CI when an openssl server was tested against an ngtcp2 client: https://github.com/openssl/openssl/actions/runs/16739736813 The underlying cause bears some explination for historical purposes The problem began happening with a recent update to ngtcp2 in which ngtcp2 updated its wolfssl tls backend to support ML-KEM, which caused ngtcp to emit a client hello message that offered several groups (including X25519MLKEM768) but only provided a keyshare for x25519. This in turn triggered the openssl server to respond with a hello retry request (HRR), requesting an ML-KEM keyshare instead, which ngtcp2 obliged. However all subsequent frames from the client were discarded by the server, due to failing packet body decryption. The problem was tracked down to a mismatch in the initial vectors used by the client and server, leading to an AEAD tag mismatch. Packet protection keys generate their IV's in QUIC by xoring the packet number of the received frame with the base IV as derived via HKDF in the tls layer. The underlying problem was that openssl hit a very odd corner case with how we compute the packet number of the received frame. To save space, QUIC encodes packet numbers using a variable length integer, and only sends the changed bits in the packet number. This requires that the receiver (openssl) store the largest received pn of the connection, which we nominally do. However, in default_port_packet_handler (where QUIC frames are processed prior to having an established channel allocated) we use a temporary qrx to validate the packet protection of those frames. This temporary qrx may be incorporated into the channel in some cases, but is not in the case of a valid frame that generates an HRR at the TLS layer. In this case, the channel allocates its own qrx independently. When this occurs, the largest_pn value of the temporary qrx is lost, and subsequent frames are unable to be received, as the newly allocated qrx belives that the larges_pn for a given pn_space is 0, rather than the value received in the initial frame (which was a complete 32 bit value, rather than just the changed lower 8 bits). As a result the IV construction produced the wrong value, and the decrypt failed on those subsequent frames. Up to this point, that wasn't even a problem, as most quic implementations start their packet numbering at 0, so the next packet could still have its packet number computed properly. The combination of ngtcp using large random values for initial packet numbers, along with the HRR triggering a separate qrx creation on a channel led to the discovery of this discrepancy. The fix seems pretty straightforward. When we detect in port_default_packet_handler, that we have a separate qrx in the new channel, we migrate processed packets from the temporary qrx to the canonical channel qrx. In addition to doing that, we also need to migrate the largest_pn array from the temporary qrx to the channel_qrx so that subsequent frame reception is guaranteed to compute the received frame packet number properly, and as such, compute the proper IV for packet protection decryption. Fixes openssl/project#1296
2 tasks
Sashan
approved these changes
Aug 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me. thanks.
nhorman
approved these changes
Aug 7, 2025
t8m
approved these changes
Aug 8, 2025
This pull request is ready to merge |
merged to 3.5, thank you! |
openssl-machine
pushed a commit
that referenced
this pull request
Aug 8, 2025
Signed-off-by: Norbert Pocs <norbertp@openssl.org> Reviewed-by: Saša Nedvědický <sashan@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #28189)
openssl-machine
pushed a commit
that referenced
this pull request
Aug 8, 2025
When HRR happens a second client hello is sent and it consist of a transport params extension. This must be processed and not cause failure. Fixes: openssl/project#1296 Signed-off-by: Norbert Pocs <norbertp@openssl.org> Reviewed-by: Saša Nedvědický <sashan@openssl.org> Reviewed-by: Neil Horman <nhorman@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #28189)
openssl-machine
pushed a commit
that referenced
this pull request
Aug 8, 2025
Recently, our overnight QUIC interop runs began failing in CI when an openssl server was tested against an ngtcp2 client: https://github.com/openssl/openssl/actions/runs/16739736813 The underlying cause bears some explination for historical purposes The problem began happening with a recent update to ngtcp2 in which ngtcp2 updated its wolfssl tls backend to support ML-KEM, which caused ngtcp to emit a client hello message that offered several groups (including X25519MLKEM768) but only provided a keyshare for x25519. This in turn triggered the openssl server to respond with a hello retry request (HRR), requesting an ML-KEM keyshare instead, which ngtcp2 obliged. However all subsequent frames from the client were discarded by the server, due to failing packet body decryption. The problem was tracked down to a mismatch in the initial vectors used by the client and server, leading to an AEAD tag mismatch. Packet protection keys generate their IV's in QUIC by xoring the packet number of the received frame with the base IV as derived via HKDF in the tls layer. The underlying problem was that openssl hit a very odd corner case with how we compute the packet number of the received frame. To save space, QUIC encodes packet numbers using a variable length integer, and only sends the changed bits in the packet number. This requires that the receiver (openssl) store the largest received pn of the connection, which we nominally do. However, in default_port_packet_handler (where QUIC frames are processed prior to having an established channel allocated) we use a temporary qrx to validate the packet protection of those frames. This temporary qrx may be incorporated into the channel in some cases, but is not in the case of a valid frame that generates an HRR at the TLS layer. In this case, the channel allocates its own qrx independently. When this occurs, the largest_pn value of the temporary qrx is lost, and subsequent frames are unable to be received, as the newly allocated qrx belives that the larges_pn for a given pn_space is 0, rather than the value received in the initial frame (which was a complete 32 bit value, rather than just the changed lower 8 bits). As a result the IV construction produced the wrong value, and the decrypt failed on those subsequent frames. Up to this point, that wasn't even a problem, as most quic implementations start their packet numbering at 0, so the next packet could still have its packet number computed properly. The combination of ngtcp using large random values for initial packet numbers, along with the HRR triggering a separate qrx creation on a channel led to the discovery of this discrepancy. The fix seems pretty straightforward. When we detect in port_default_packet_handler, that we have a separate qrx in the new channel, we migrate processed packets from the temporary qrx to the canonical channel qrx. In addition to doing that, we also need to migrate the largest_pn array from the temporary qrx to the channel_qrx so that subsequent frame reception is guaranteed to compute the received frame packet number properly, and as such, compute the proper IV for packet protection decryption. Fixes openssl/project#1296 Reviewed-by: Saša Nedvědický <sashan@openssl.org> Reviewed-by: Tomas Mraz <tomas@openssl.org> (Merged from #28189)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
approval: ready to merge
The 24 hour grace period has passed, ready to merge
branch: 3.5
Merge to openssl-3.5
tests: present
The PR has suitable tests present
triaged: bug
The issue/pr is/fixes a bug
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a backport PR for 3.5 from #28115
Checklist