Skip to content

fix(discord): bind delayed identify to socket generation#82225

Merged
giodl73-repo merged 6 commits into
openclaw:mainfrom
giodl73-repo:fix-discord-identify-socket-generation-79794
May 16, 2026
Merged

fix(discord): bind delayed identify to socket generation#82225
giodl73-repo merged 6 commits into
openclaw:mainfrom
giodl73-repo:fix-discord-identify-socket-generation-79794

Conversation

@giodl73-repo
Copy link
Copy Markdown
Contributor

@giodl73-repo giodl73-repo commented May 15, 2026

Summary

  • Thread the WebSocket that received HELLO through gateway payload handling.
  • After the shared identify limiter resolves, only send IDENTIFY when that same socket is still the current open gateway socket.
  • Add a regression test covering the stale-HELLO race so a replacement socket is not identified before receiving its own HELLO.

Real behavior proof

  • Behavior or issue addressed: Discord gateway delayed IDENTIFY work could use a replacement WebSocket after the original HELLO socket closed while waiting on the shared identify limiter.
  • Real environment tested: OpenClaw checkout in WSL Ubuntu 24.04 with the repo dependency graph installed, using the actual Discord gateway package test runner against a before worktree (origin/main) and the PR branch.
  • Exact steps or command run after this patch: generated a before worktree at origin/main, copied the PR regression spec into it, ran node scripts/test-projects.mjs extensions/discord/src/internal/gateway.test.ts, then ran the same command on PR branch 348310af0b.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): screenshot: discord-identify-before-after-proof.png; before log: https://artifacts.openclaw.ai/crabbox-artifacts/giodl73%40gmail.com/pr-82225/discord-identify-socket-generation-82225/20260515-165432-512753169/before-origin-main-with-regression-test.log; after log: https://artifacts.openclaw.ai/crabbox-artifacts/giodl73%40gmail.com/pr-82225/discord-identify-socket-generation-82225/20260515-165432-512753169/after-pr-82225.log
  • Observed result after fix: Before (origin/main plus the regression spec) failed because the replacement socket received an IDENTIFY payload from stale HELLO work; after (348310af0b) passed all 21 focused gateway tests and only IDENTIFYs the replacement socket after its own HELLO.
  • What was not tested: Live Discord affected-host reproduction/guild ingress was not included; the hosted live QA run for v2026.5.7-beta.1 is still queued, so this PR remains scoped to the source-level stale HELLO/IDENTIFY race.
  • Before evidence (optional but encouraged): The linked before log shows does not identify a replacement socket from a stale HELLO failing on origin/main with Received: "{\"op\":2...} sent by the replacement socket.

Validation

  • COREPACK_ENABLE_DOWNLOAD_PROMPT=0 pnpm test extensions/discord/src/internal/gateway.test.ts
  • COREPACK_ENABLE_DOWNLOAD_PROMPT=0 pnpm exec oxfmt --check --threads=1 extensions/discord/src/internal/gateway.ts extensions/discord/src/internal/gateway.test.ts
  • COREPACK_ENABLE_DOWNLOAD_PROMPT=0 pnpm check:changed

Related to #79794, #78910, and duplicate/satellite #80873. This addresses a source-level stale HELLO/IDENTIFY race without claiming the broader field reports are fully closed.

@openclaw-barnacle openclaw-barnacle Bot added channel: discord Channel integration: discord size: S maintainer Maintainer-authored PR labels May 15, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 15, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR threads the Discord WebSocket that received HELLO through payload handling, skips delayed IDENTIFY when that socket is no longer current, and adds stale-HELLO regression coverage.

Reproducibility: yes. for the source-level race: current main can wait in sharedGatewayIdentifyLimiter after HELLO and then identify whatever this.ws is, and the submitted before log shows the regression spec failing. No live Discord affected-host reproduction is available here.

Real behavior proof
Needs real behavior proof before merge: The PR body and comment include before/after fake-socket gateway test logs, not live Discord gateway behavior after the fix; add redacted live proof or get an explicit maintainer proof override before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Human handling is required because the PR has a protected maintainer label and the remaining blocker is proof/review judgment, not a narrow automated repair.

Security
Cleared: The diff only changes Discord gateway runtime logic and focused tests, with no dependency, CI, secret, packaging, or supply-chain changes.

Review details

Best possible solution:

Land the socket-source guard after maintainer review and either live Discord proof or an explicit proof override, while leaving broader READY/1006 tracking on the linked reports.

Do we have a high-confidence way to reproduce the issue?

Yes for the source-level race: current main can wait in sharedGatewayIdentifyLimiter after HELLO and then identify whatever this.ws is, and the submitted before log shows the regression spec failing. No live Discord affected-host reproduction is available here.

Is this the best way to solve the issue?

Yes for the narrow race: carrying the HELLO source socket into delayed IDENTIFY is the smallest maintainable fix I found. It should not be treated as a complete fix for the broader Discord READY/1006 reports without live proof.

Acceptance criteria:

  • node scripts/test-projects.mjs extensions/discord/src/internal/gateway.test.ts
  • Crabbox/Testbox pnpm check:changed equivalent for the touched Discord surface
  • Redacted live Discord gateway proof or explicit maintainer proof override

What I checked:

  • Current-main race surface: Current main handles a socket message by calling payload handling without preserving the source socket, then delayed identify work reads this.ws only after the shared limiter resolves, so stale HELLO work can act on a replacement socket. (extensions/discord/src/internal/gateway.ts:198, 16e5d6692dcc)
  • Identify limiter delay: The shared identify limiter can await a timeout before IDENTIFY, creating the async window between receiving HELLO and reading the current gateway socket. (extensions/discord/src/internal/gateway-identify-limiter.ts:6, 16e5d6692dcc)
  • PR runtime change: The patch passes the source socket into handlePayload and identifyWithConcurrency, then returns before IDENTIFY when that socket is no longer the current this.ws. (extensions/discord/src/internal/gateway.ts:198, e283c8af7d44)
  • Regression proof: The submitted artifacts show the added fake-socket regression failing on origin/main because the replacement socket receives IDENTIFY, then passing 21 focused gateway tests on the PR branch. (extensions/discord/src/internal/gateway.test.ts:180, e283c8af7d44)
  • Protected review state: The live PR API response shows the PR is open, unmerged, externally authored, and labeled maintainer; current checks on the latest head are green, but the protected label keeps this in explicit maintainer handling. (e283c8af7d44)
  • Feature history: Local blame and GitHub commit history tie this gateway lifecycle area to recent Discord gateway work, including missed-identify reconnect handling and adjacent gateway test assertion work. (extensions/discord/src/internal/gateway.ts:335, b5ba210fd5)

Likely related people:

  • steipete: Recent merged history includes Discord gateway reconnect and broader gateway hardening work in the same runtime/test area. (role: recent Discord gateway lifecycle contributor; confidence: high; commits: 93e2d90af174, 3980eaa1c22e, 6346e792c46f; files: extensions/discord/src/internal/gateway.ts, extensions/discord/src/internal/gateway.test.ts)
  • shakkernerd: Recent file history includes focused Discord gateway opcode assertion work in the same gateway test file. (role: recent adjacent test contributor; confidence: medium; commits: 8eb14802fd0f; files: extensions/discord/src/internal/gateway.test.ts)

Remaining risk / open question:

  • Submitted proof does not exercise a live Discord token, HELLO/IDENTIFY/READY exchange, or guild-message ingress path.
  • The linked Discord READY/1006 field reports may still have additional causes outside this stale-HELLO race.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 16e5d6692dcc.

@giodl73-repo
Copy link
Copy Markdown
Contributor Author

OpenClaw QA Artifacts

Summary

Discord gateway IDENTIFY socket-generation proof for PR #82225.

  • Before: origin/main 4e6c85d with the PR regression spec fails because stale HELLO work can identify the replacement socket.
  • After: PR branch 348310a passes the same focused gateway scenario.
  • Artifact image: discord-identify-before-after-proof.png
  • Logs: before-origin-main-with-regression-test.log, after-pr-82225.log

Evidence

@giodl73-repo giodl73-repo force-pushed the fix-discord-identify-socket-generation-79794 branch 6 times, most recently from 1e5ed86 to 3aa359d Compare May 16, 2026 14:09
giodl73-repo and others added 2 commits May 16, 2026 07:14
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giodl73-repo giodl73-repo force-pushed the fix-discord-identify-socket-generation-79794 branch from 3aa359d to e283c8a Compare May 16, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant