Fix path challenge and removal logic#6020
Open
guhetier wants to merge 8 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6020 +/- ##
==========================================
- Coverage 85.95% 85.55% -0.41%
==========================================
Files 60 60
Lines 18792 18835 +43
==========================================
- Hits 16153 16114 -39
- Misses 2639 2721 +82 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
anrossi
reviewed
May 21, 2026
b40dbbf to
8efd652
Compare
anrossi
previously approved these changes
May 22, 2026
guhetier
commented
May 22, 2026
anrossi
approved these changes
May 22, 2026
Move path-validation timeout detection from the ACK/loss-detection code
path onto a dedicated QUIC_CONN_TIMER_PATH_VALIDATION wall-clock timer.
Previously, validation timeout was only evaluated when a PATH_CHALLENGE
carrier packet was loss-detected (loss_detection.c). This had two bugs:
1. If the peer ACKed the PATH_CHALLENGE packet without sending a
PATH_RESPONSE, the path stayed stuck in !IsPeerValidated forever.
2. The timeout handler called QuicPathRemove inline during receive
processing, which could drop PathsCount to 0 and crash the
post-batch cleanup loop (uint8_t underflow -> Paths[255] deref).
The new timer fires independently of loss detection:
- QuicConnPathValidationTimeoutUs: max(3*PTO, 6*InitialRtt) per
RFC 9000 section 8.2.4, using named constants.
- QuicConnPathValidationTimerUpdate: scans all paths, arms to
earliest in-progress deadline or cancels.
- QuicConnProcessPathValidationTimerOperation: expires timed-out
paths. For the active path, attempts fallback to a previously-
validated path before silently closing.
Loss detection now only re-arms SendChallenge for retransmit; restores
NewDataQueued |= that was accidentally dropped in PR #283.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Remove Centralize path removal safety logic in QuicPathRemove instead of spreading it across callers: - When the last path is removed (PathsCount == 1), mark it inactive and silently close the connection per RFC 9000 8.2.4 + 10.2. The path stays in the array as a valid placeholder until shutdown completes. Returns FALSE so callers know no path was removed. - When the active path (index 0) is removed while other paths exist, promote the best fallback (prefer peer-validated, otherwise any path) before removing the old active. - Returns TRUE when a path was actually removed (count decremented). Simplify QuicConnProcessRouteCompletion and QuicConnProcessPathValidationTimerOperation to delegate to QuicPathRemove instead of duplicating fallback/close logic. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change loop index from uint8_t to int in reverse path iteration loops. When PathsCount is 0, the expression PathsCount - 1 now evaluates to -1 (via integer promotion) instead of wrapping to 255, so the loop condition 'i > 0' is immediately false and the loop body is never entered. Cast to (uint8_t) at the QuicPathRemove call sites to satisfy warnings- as-errors for the int-to-uint8_t narrowing conversion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace five path lifecycle QuicTraceLogConnInfo traces with structured QuicTraceEvent ETW events using typed fields: - ConnPathInitialized: path created and added to connection - ConnPathRemoved: path removed from connection - ConnPathActive: path promoted to active (slot 0), with IsRebind field - ConnPathValidated: peer address validated, with Reason enum field (mapped via ETW valueMap for pretty-printing) - ConnPathValidationTimeout: path validation timer expired New ETW templates: tid_CONN_PATH (Connection + PathID), tid_CONN_PATH_ACTIVE (Connection + PathID + IsRebind), and tid_CONN_PATH_VALIDATED (Connection + PathID + Reason). New ETW value map: map_QUIC_PATH_VALID_REASON with values Initial Token, Handshake Packet, Path Response. Also adds missing PATH_VALIDATION entry to map_QUIC_CONN_TIMER_TYPE. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When QuicPathRemove returns FALSE (last path, connection closing), clear PathValidationStartTime so QuicConnPathValidationTimerUpdate won't re-arm the timer with delay=0, preventing redundant timer fires. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds QuicTestPathValidationLastPathClose, which verifies that when all paths fail validation the connection closes with QUIC_STATUS_UNREACHABLE rather than crashing on a PathsCount underflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cc48c83 to
2717754
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a kernel-mode PAGE_FAULT crash (
KeBugCheckEx) whereConnection->PathsCountreaches 0 and a reverse loopfor (uint8_t i = PathsCount - 1; ...)underflows to 255, causing an out-of-bounds read onConnection->Paths[255].The results of a combination of issues, largely tied to the invalid assumption that the path at index 0 in the path array is always valid.
Root cause: This is a combination of multiple issues:
QuicLossDetectionRetransmitFrameswas used as a way to detect PATH_CHALLENGE failures and would remove a path, but would not validate any whether the removed path was the last one.QuicConnGetPathForPacketwould always create a path for incoming packets at index 1 of the path array, even if the path at index 0 has been removed.QuicPathRemovewould happily "remove" the same path multiple timesThis allows the following scenario:
QuicLossDetectionRetransmitFrames-> Path 0 is removed (nbPath = 0)QuicLossDetectionRetransmitFrames-> Path 0 is "removed" again, nbPath = 0Also fixes #5505: paths were not removed when the peer ACKed a PATH_CHALLENGE but never sent a PATH_RESPONSE, because path validation timeout was driven by loss detection retransmission rather than a wall-clock timer. If the challenge frame was ACKed, loss detection considered it delivered and never retransmitted, so the timeout never fired.
Commits
Replace loss-detection path validation with wall-clock timer: Adds a dedicated
QUIC_CONN_TIMER_PATH_VALIDATIONtimer instead of attempting to use loss detection retransmission as a proxy.Enforce close-if-last and active-path-fallback invariants in QuicPathRemove
QuicPathRemovenow handles fallbacks to others paths. When an attempt is made to remove the last path, QuicPathRemove now shutdowns the connection and keeps the last path as a placeholder for every operation assuming a path is present.Use signed int for reverse path loops to prevent underflow Changes
uint8_tloop indices tointin all backward-iterating path loops. WhenPathsCount == 0,PathsCount - 1evaluates to-1via integer promotion, making the loop condition immediately false.Add structured ETW events for path lifecycle operations Replaces five
QuicTraceLogConnInfostring traces with typedQuicTraceEventETW events, so further path-related issues can be diagnosed with low-volume events only.Testing
CI
Documentation
No documentation impact.