fix(gateway): bound traced channel startup handoff#82592
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 696ec1296b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (startupTrace) { | ||
| await waitForChannelStartupHandoff(); | ||
| } | ||
| let startAccountTask: ReturnType<typeof startAccount> | undefined; | ||
| await measureStartup(`channels.${channelId}.start-account-handoff`, () => { |
There was a problem hiding this comment.
Re-check abort state before delayed startAccount handoff
When startupTrace is enabled (the default gateway path), this new setImmediate handoff delays startAccount by a full macrotask but never re-checks whether the account was aborted or manually stopped during that window. A quick stopChannel after startChannel can therefore still invoke startAccount after shutdown was requested, causing unwanted reconnect/login work on accounts that were explicitly stopped; this race is introduced by the new delay and should be guarded with an abort/manual-stop check immediately after the handoff wait and before calling startAccount.
Useful? React with 👍 / 👎.
|
Codex review: needs changes before merge. Summary Reproducibility: yes. for source-level reproduction: current main wraps long-lived Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Keep the handoff-span approach, but add an abort/manual-stop guard immediately after the handoff wait and cover that race with a focused gateway channel test. Do we have a high-confidence way to reproduce the issue? Yes for source-level reproduction: current main wraps long-lived Is this the best way to solve the issue? No, not yet: bounding the trace span is the right narrow direction, but the added macrotask handoff must re-check stop/abort state before starting provider work. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against b9921e21b93d. |
696ec12 to
ae54d81
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae54d810bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (startupTrace) { | ||
| await waitForChannelStartupHandoff(); | ||
| } | ||
| let startAccountTask: ReturnType<typeof startAccount> | undefined; | ||
| await measureStartup(`channels.${channelId}.start-account-handoff`, () => { |
There was a problem hiding this comment.
Re-check stop state after startup handoff delay
After startupTrace is enabled, this path now waits a full setImmediate before invoking startAccount, but it does not re-check abort.signal.aborted or manuallyStopped after that wait. If stopChannel() runs in that handoff window, startAccount is still called for an account that was explicitly stopped, which can trigger unwanted reconnect/login work and stop-timeout behavior even though shutdown was requested.
Useful? React with 👍 / 👎.
0c1b517 to
8a4da89
Compare
8a4da89 to
d4d41fa
Compare
Summary
Verification
node scripts/run-vitest.mjs src/gateway/server-channels.test.ts src/infra/net/fetch-guard.ssrf.test.ts extensions/github-copilot/index.test.ts extensions/github-copilot/embeddings.test.ts --runnode scripts/check-no-raw-channel-fetch.mjs/Users/steipete/Projects/agent-scripts/skills/codex-review/scripts/codex-review --mode branchgit diff --checkReal behavior proof
Behavior addressed: Startup liveness warnings can stay pinned to
channels.<id>.start-accountfor long-lived channel tasks, making Discord/QQ-style startup stalls look like an active startup phase even after the channel task has been handed off.Real environment tested: No live QQ Bot login is available; maintainer override is applied for code-level startup-trace proof.
Exact steps or command run after this patch:
node scripts/run-vitest.mjs src/gateway/server-channels.test.ts src/infra/net/fetch-guard.ssrf.test.ts extensions/github-copilot/index.test.ts extensions/github-copilot/embeddings.test.ts --run;node scripts/check-no-raw-channel-fetch.mjs;/Users/steipete/Projects/agent-scripts/skills/codex-review/scripts/codex-review --mode branch;git diff --check.Evidence after fix:
server-channels.test.tsverifiesstartAccountis delayed until after the traced handoff tick,channels.discord.start-account-handoffis no longer active while a long-lived channel account task is still running, and a stop that wins the handoff prevents traced account startup. The guarded-fetch/Copilot tests verify the SSRF guard path stays hermetic under mocked fetches and Copilot login still works through the guarded seam.Observed result after fix: Focused local shard passed: 4 test files, 117 tests. Raw channel fetch boundary check passed. Codex review reported no accepted/actionable findings on the rebased head.
What was not tested: Live QQ Gateway login, because no QQ Bot credentials are available.
Refs #82398.