Skip to content

fix(state): dispatch connect event after capabilities are populated#1405

Merged
cliffhall merged 2 commits into
v2/mainfrom
v2/fix-connect-capability-race
Jun 1, 2026
Merged

fix(state): dispatch connect event after capabilities are populated#1405
cliffhall merged 2 commits into
v2/mainfrom
v2/fix-connect-capability-race

Conversation

@cliffhall
Copy link
Copy Markdown
Member

@cliffhall cliffhall commented Jun 1, 2026

Closes #1406

Problem

Tools, prompts, resources, and resource templates were disappearing for some servers on the v2 web Inspector — nothing showed under any primitive.

Root cause

Regression from #1395 ("gate list fetches on server capability"). That PR added a capability gate to each list-state manager's refresh():

if (!client.getCapabilities()?.tools) {
  this.tools = [];
  this.dispatchTypedEvent("toolsChange", this.tools);
  return this.getTools();
}

The gate is correct, but it exposed an ordering race in InspectorClient.connect():

this.dispatchTypedEvent("connect");   // managers refresh() here
await this.fetchServerInfo();         // capabilities set HERE, after

The managed states refresh on the "connect" event. refresh() runs synchronously up to its first await (the real listTools call), so the capability gate executes during the synchronous dispatch("connect") — before fetchServerInfo() populates this.capabilities (which is reset to undefined on every disconnect). The gate reads undefined, trips, and wipes the list to empty. All four primitives are gated identically, so all of them vanish.

Before #1395 refresh() didn't read capabilities, so the ordering never mattered.

Fix

Dispatch "connect" after the awaited fetchServerInfo(), so capabilities are populated before any manager refreshes.

Tests

The unit tests missed this because their FakeInspectorClient has capabilities populated from construction. Added an integration regression test that wires a real ManagedToolsState on connect and asserts the tool list is non-empty — it fails without the fix, passes with it.

  • Integration suite: 176 passed
  • npm run validate: pass

🤖 Generated with Claude Code

The list-state managers (tools/prompts/resources/templates) refresh on the
"connect" event and, since #1395, gate their list RPC on getCapabilities().
But connect() dispatched "connect" before fetchServerInfo() populated
this.capabilities, and refresh() runs synchronously up to its first await —
so the capability gate executed during the synchronous dispatch, read
undefined capabilities (reset to undefined on every disconnect), tripped,
and wiped every list to empty on connect. Tools, prompts, resources, and
resource templates all vanished against any server.

Move the "connect" dispatch to after the awaited fetchServerInfo() so
capabilities are in place before any manager refreshes. The unit tests
missed this because their FakeInspectorClient has capabilities populated
from construction; added an integration regression test wiring a real
ManagedToolsState on connect and asserting the tool list is non-empty
(fails without the fix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cliffhall
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 1, 2026

Claude finished @cliffhall's task in 1m 55s —— View job


Claude PR Review

  • Read changed files
  • Verify the ordering fix matches the described race
  • Review the new integration test
  • Check for related call sites or side effects
  • Post review

Summary

The fix is correct, minimal, and well-targeted. Diagnosis lines up with the code: the gate added in #1395 reads this.capabilities synchronously inside the "connect" handler chain, but the old ordering only populated this.capabilities after the dispatch. Swapping the two lines closes the race. ✅ Ship it.

Correctness of the fix

core/mcp/inspectorClient.ts:660-667 — moving dispatchTypedEvent("connect") to after await this.fetchServerInfo() is the right call:

  • fetchServerInfo() is async only by signature — its body has no real await, it just reads cached SDK getters and assigns this.capabilities. So after await this.fetchServerInfo(), this.capabilities is guaranteed to be set (or stay undefined if the server didn't advertise any), before any "connect" subscriber sees the event. This matches the synchronous capability gate at core/mcp/state/managedToolsState.ts:75.
  • There's only one dispatch site for "connect" (verified via grep — inspectorClient.ts:667 is the only one), so reordering doesn't risk surprising any duplicate path.
  • All five addEventListener("connect", ...) sites are managed-state classes that benefit from capabilities being populated. None of them rely on the prior ordering.
  • this.capabilities = undefined is set on disconnect (inspectorClient.ts:1016), which is what made the race manifest on every reconnect, not just the first one — consistent with the PR description.

Test

clients/web/src/test/integration/mcp/inspectorClient.test.ts:692-716 — good regression coverage:

  • Uses a real stdio transport and a real ManagedToolsState, so the exact code path that broke in production is exercised. The PR description correctly notes that unit-level FakeInspectorClient couldn't catch this because it populates capabilities at construction.
  • Inline comment explains why this test is sensitive to ordering — future readers will know not to "simplify" it back into something that doesn't exercise the race.
  • toolsState.destroy() properly unsubscribes.

Minor observations (non-blocking)

  1. Settle pattern. await new Promise((r) => setTimeout(r, 100)) (line 710) matches an existing pattern in this file (line 472), but the file already imports waitForEvent (line 22). Listening for toolsChange would be more deterministic against slow CI:

    const toolsChanged = waitForEvent(toolsState, "toolsChange");
    await client.connect();
    await toolsChanged;

    Not a blocker — matches surrounding style. Fix this →

  2. One primitive tested, four affected. The test asserts the tools path, but the same gate-vs-ordering bug affects ManagedPromptsState, ManagedResourcesState, ManagedResourceTemplatesState, and ManagedRequestorTasksState (all five state files use getCapabilities()). Since the fix is a single line shared by all of them and the bug-vs-fix mechanism is identical, one regression test is defensible — but a quick assert on ManagedPromptsState or ManagedResourcesState in the same it block would future-proof against someone gating only one primitive on capabilities differently. Optional.

  3. Comment quality. The block at inspectorClient.ts:660-664 is exactly the right kind of "why" comment — it ties the ordering constraint to issue fix(state): gate list fetches on server capability (#1350) #1395 and to the synchronous gate, so a future refactor won't silently reintroduce the race. Good.

Verdict

Approve in spirit (the action cannot literally approve). The fix is correct, the regression test catches the bug, and the inline comment will prevent regressions.
• Branch: v2/fix-connect-capability-race

Address the two non-blocking suggestions from the PR review:
- Replace the 100ms settle sleep with waitForEvent() on the managers'
  change events, so the test is deterministic against slow CI instead of
  racing a fixed timeout.
- Assert a second affected primitive (ManagedResourcesState alongside
  ManagedToolsState). The fix is a single shared line, but covering two
  distinct capabilities guards against a future change gating one
  primitive differently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cliffhall
Copy link
Copy Markdown
Member Author

Thanks for the review! Both non-blocking test suggestions were good — addressed in 1ba24b5:

  1. Settle pattern → waitForEvent. Replaced the 100ms setTimeout with waitForEvent(toolsState, "toolsChange") / waitForEvent(resourcesState, "resourcesChange") set up before client.connect() and awaited via Promise.all. Deterministic against slow CI rather than racing a fixed timeout.

  2. One primitive → two. Added a ManagedResourcesState assertion alongside tools. The fix is a single shared line, but tools and resources are distinct capabilities, so covering both guards against a future change gating one primitive differently. (Renamed the test to "managed list states populate on connect" to match.)

Verified: integration suite still 176 passed, npm run validate green.

I left observation #3 (the "why" comment tying the ordering to #1395 and the synchronous gate) as-is since it was called out as correct.

@cliffhall
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 1, 2026

Claude finished @cliffhall's task in 1m 41s —— View job


Claude PR Review (round 2)

  • Read updated test file
  • Verify waitForEvent replacement is correctly wired
  • Verify second primitive (ManagedResourcesState) assertion
  • Re-check the fix is unchanged and still correct
  • Post review

Summary

Both follow-ups landed cleanly. The test is now deterministic (no fixed sleep) and covers two distinct capabilities through the same shared fix. The production fix in core/mcp/inspectorClient.ts is unchanged from round 1; only the inline comment expanded slightly to spell out the ordering invariant. Good to go. ✅

Test changes (clients/web/src/test/integration/mcp/inspectorClient.test.ts:692-726)

1. waitForEvent settle — correct and bug-distinguishing. The listeners are registered before client.connect(), which matters because the gate-trip path is synchronous: in the pre-fix code, the empty toolsChange/resourcesChange would be dispatched synchronously inside the "connect" dispatch (before client.connect() even returns). Either way the events fire and Promise.all resolves — but the toBeGreaterThan(0) assertions still correctly distinguish bug from fix:

  • Pre-fix: gate reads undefined capabilities → synchronous empty dispatch → events resolve → getTools().length === 0 → ❌ fail
  • Post-fix: capabilities populated → gate passes → async listTools() → events resolve with populated list → ✅ pass

So the move from setTimeout to waitForEvent doesn't weaken the regression coverage — it just removes a flaky timing dependency.

2. Two-primitive coverage. ManagedResourcesState covers a distinct capability (capabilities.resources vs capabilities.tools) using the same code path. The default stdio test server config (test-server-fixtures.ts:1849-1855) exposes both 6 tools and 4 resources, so the .length > 0 assertions are well-grounded.

3. Cleanup. toolsState.destroy() / resourcesState.destroy() removes the event listeners — matches the existing pattern, no leaks.

Production fix (core/mcp/inspectorClient.ts:660-667)

Unchanged from round 1. The expanded comment now explicitly ties the ordering to the #1395 capability gate and warns that flipping the order wipes lists on every connect — exactly the kind of "why" that survives a future refactor. Verified there is still only one dispatchTypedEvent("connect") site.

Verdict

LGTM. The fix is correct, the regression test is deterministic and exercises two distinct capabilities, and the inline comment will prevent reintroduction. Approve in spirit (I can't formally approve PRs).
• Branch: v2/fix-connect-capability-race

@cliffhall cliffhall merged commit 440b194 into v2/main Jun 1, 2026
1 check passed
@cliffhall cliffhall deleted the v2/fix-connect-capability-race branch June 1, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Issues and PRs for v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tools/prompts/resources vanish on connect: capability gate races the connect event

1 participant