Skip to content

Fix path challenge and removal logic#6020

Open
guhetier wants to merge 8 commits into
mainfrom
guhetier/path-validation-timer_copilot
Open

Fix path challenge and removal logic#6020
guhetier wants to merge 8 commits into
mainfrom
guhetier/path-validation-timer_copilot

Conversation

@guhetier
Copy link
Copy Markdown
Collaborator

@guhetier guhetier commented May 20, 2026

Description

Fixes a kernel-mode PAGE_FAULT crash (KeBugCheckEx) where Connection->PathsCount reaches 0 and a reverse loop for (uint8_t i = PathsCount - 1; ...) underflows to 255, causing an out-of-bounds read on Connection->Paths[255].

The results of a combination of issues, largely tied to the invalid assumption that the path at index 0 in the path array is always valid.

Root cause: This is a combination of multiple issues:

  • QuicLossDetectionRetransmitFrames was used as a way to detect PATH_CHALLENGE failures and would remove a path, but would not validate any whether the removed path was the last one.
  • QuicConnGetPathForPacket would always create a path for incoming packets at index 1 of the path array, even if the path at index 0 has been removed.
  • QuicPathRemove would happily "remove" the same path multiple times

This allows the following scenario:

  • A path challenge is sent on path 0
  • The path challenge times out
  • A PTO causes the retransmission of the path challenge -> QuicLossDetectionRetransmitFrames -> Path 0 is removed (nbPath = 0)
  • ACKs are received over a new path 1
    • A Path 1 is created at index 1 in the array (nbPath = 1, first problem, nbPath does match the index actually used)
    • Loss detection triggers QuicLossDetectionRetransmitFrames -> Path 0 is "removed" again, nbPath = 0

Also fixes #5505: paths were not removed when the peer ACKed a PATH_CHALLENGE but never sent a PATH_RESPONSE, because path validation timeout was driven by loss detection retransmission rather than a wall-clock timer. If the challenge frame was ACKed, loss detection considered it delivered and never retransmitted, so the timeout never fired.

Commits

  1. Replace loss-detection path validation with wall-clock timer: Adds a dedicated QUIC_CONN_TIMER_PATH_VALIDATION timer instead of attempting to use loss detection retransmission as a proxy.

  2. Enforce close-if-last and active-path-fallback invariants in QuicPathRemove QuicPathRemove now handles fallbacks to others paths. When an attempt is made to remove the last path, QuicPathRemove now shutdowns the connection and keeps the last path as a placeholder for every operation assuming a path is present.

  3. Use signed int for reverse path loops to prevent underflow Changes uint8_t loop indices to int in all backward-iterating path loops. When PathsCount == 0, PathsCount - 1 evaluates to -1 via integer promotion, making the loop condition immediately false.

  4. Add structured ETW events for path lifecycle operations Replaces five QuicTraceLogConnInfo string traces with typed QuicTraceEvent ETW events, so further path-related issues can be diagnosed with low-volume events only.

Testing

CI

Documentation

No documentation impact.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 89.28571% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.55%. Comparing base (27e7f9f) to head (2717754).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/core/connection.c 90.74% 5 Missing ⚠️
src/core/path.c 85.71% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6020      +/-   ##
==========================================
- Coverage   85.95%   85.55%   -0.41%     
==========================================
  Files          60       60              
  Lines       18792    18835      +43     
==========================================
- Hits        16153    16114      -39     
- Misses       2639     2721      +82     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/core/cid.h Outdated
@guhetier guhetier changed the title Fix kernel crash from PathsCount underflow and improve path lifecycle management Fix path challenge and removal logic May 21, 2026
@guhetier guhetier force-pushed the guhetier/path-validation-timer_copilot branch from b40dbbf to 8efd652 Compare May 21, 2026 23:44
@guhetier guhetier marked this pull request as ready for review May 21, 2026 23:44
@guhetier guhetier requested a review from a team as a code owner May 21, 2026 23:44
anrossi
anrossi previously approved these changes May 22, 2026
Comment thread src/core/operation.h Outdated
guhetier and others added 8 commits May 27, 2026 14:03
Move path-validation timeout detection from the ACK/loss-detection code
path onto a dedicated QUIC_CONN_TIMER_PATH_VALIDATION wall-clock timer.

Previously, validation timeout was only evaluated when a PATH_CHALLENGE
carrier packet was loss-detected (loss_detection.c). This had two bugs:
  1. If the peer ACKed the PATH_CHALLENGE packet without sending a
     PATH_RESPONSE, the path stayed stuck in !IsPeerValidated forever.
  2. The timeout handler called QuicPathRemove inline during receive
     processing, which could drop PathsCount to 0 and crash the
     post-batch cleanup loop (uint8_t underflow -> Paths[255] deref).

The new timer fires independently of loss detection:
  - QuicConnPathValidationTimeoutUs: max(3*PTO, 6*InitialRtt) per
    RFC 9000 section 8.2.4, using named constants.
  - QuicConnPathValidationTimerUpdate: scans all paths, arms to
    earliest in-progress deadline or cancels.
  - QuicConnProcessPathValidationTimerOperation: expires timed-out
    paths. For the active path, attempts fallback to a previously-
    validated path before silently closing.

Loss detection now only re-arms SendChallenge for retransmit; restores
NewDataQueued |= that was accidentally dropped in PR #283.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Remove

Centralize path removal safety logic in QuicPathRemove instead of
spreading it across callers:

- When the last path is removed (PathsCount == 1), mark it inactive
  and silently close the connection per RFC 9000 8.2.4 + 10.2.
  The path stays in the array as a valid placeholder until shutdown
  completes. Returns FALSE so callers know no path was removed.

- When the active path (index 0) is removed while other paths exist,
  promote the best fallback (prefer peer-validated, otherwise any
  path) before removing the old active.

- Returns TRUE when a path was actually removed (count decremented).

Simplify QuicConnProcessRouteCompletion and
QuicConnProcessPathValidationTimerOperation to delegate to
QuicPathRemove instead of duplicating fallback/close logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change loop index from uint8_t to int in reverse path iteration loops.
When PathsCount is 0, the expression PathsCount - 1 now evaluates to -1
(via integer promotion) instead of wrapping to 255, so the loop condition
'i > 0' is immediately false and the loop body is never entered.

Cast to (uint8_t) at the QuicPathRemove call sites to satisfy warnings-
as-errors for the int-to-uint8_t narrowing conversion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace five path lifecycle QuicTraceLogConnInfo traces with structured
QuicTraceEvent ETW events using typed fields:

- ConnPathInitialized: path created and added to connection
- ConnPathRemoved: path removed from connection
- ConnPathActive: path promoted to active (slot 0), with IsRebind field
- ConnPathValidated: peer address validated, with Reason enum field
  (mapped via ETW valueMap for pretty-printing)
- ConnPathValidationTimeout: path validation timer expired

New ETW templates: tid_CONN_PATH (Connection + PathID),
tid_CONN_PATH_ACTIVE (Connection + PathID + IsRebind), and
tid_CONN_PATH_VALIDATED (Connection + PathID + Reason).

New ETW value map: map_QUIC_PATH_VALID_REASON with values
Initial Token, Handshake Packet, Path Response.

Also adds missing PATH_VALIDATION entry to map_QUIC_CONN_TIMER_TYPE.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When QuicPathRemove returns FALSE (last path, connection closing),
clear PathValidationStartTime so QuicConnPathValidationTimerUpdate
won't re-arm the timer with delay=0, preventing redundant timer fires.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds QuicTestPathValidationLastPathClose, which verifies that when all paths fail validation the connection closes with QUIC_STATUS_UNREACHABLE rather than crashing on a PathsCount underflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@guhetier guhetier force-pushed the guhetier/path-validation-timer_copilot branch from cc48c83 to 2717754 Compare May 27, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Paths are not removed if the peer ACK a PATH_CHALLENGE but doesn't send a PATH_RESPONSE

2 participants