Skip to content

Fix flaky Should_Accept_Both_MCP_Servers_And_Custom_Agents test#1346

Merged
stephentoub merged 6 commits into
mainfrom
stephentoub/bookish-funicular
May 20, 2026
Merged

Fix flaky Should_Accept_Both_MCP_Servers_And_Custom_Agents test#1346
stephentoub merged 6 commits into
mainfrom
stephentoub/bookish-funicular

Conversation

@stephentoub
Copy link
Copy Markdown
Collaborator

The Should_Accept_Both_MCP_Servers_And_Custom_Agents E2E test was flaking in the copilot-agent-runtime CI (e.g., this run). It timed out after 120s waiting for an assistant message that never arrived.

Root cause: When both MCP servers (using a non-functional echo command) and custom agents are configured together, the runtime sometimes blocks before making the /chat/completions request -- likely a race condition in MCP server initialization combined with agent routing. The CAPI proxy never receives a chat completion request during the test window, so the test hangs until the timeout fires.

Fix: Remove the message-sending and response assertion. The test's stated purpose is verifying that both configurations are accepted together (not testing message round-trip). Message handling with MCP servers and custom agents individually is already covered by Should_Accept_MCP_Server_Configuration_On_Session_Create and Should_Accept_Custom_Agent_Configuration_On_Session_Create. The test now follows the same pattern as Should_Handle_Multiple_MCP_Servers and Should_Handle_Multiple_Custom_Agents, which pass reliably.

Remove message-sending from the test that combines MCP servers and custom
agents. The test was timing out because the runtime sometimes blocks before
making the LLM call when both configs are present with a non-functional echo
MCP server. Since the test's purpose is verifying config acceptance (not
message round-trip), simplify it to match the pattern of other passing tests
like Should_Handle_Multiple_MCP_Servers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub stephentoub requested a review from a team as a code owner May 20, 2026 16:09
Copilot AI review requested due to automatic review settings May 20, 2026 16:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to de-flake the .NET E2E test Should_Accept_Both_MCP_Servers_And_Custom_Agents by removing the message send + assistant-response assertion that can hang when MCP servers and custom agents are configured together.

Changes:

  • Updated the .NET E2E test to only validate that session creation succeeds (and a session id is returned), without sending a message.
  • Simplified the corresponding YAML snapshot(s) for this scenario to have no recorded conversations.
Show a summary per file
File Description
test/snapshots/mcp-and-agents/should_accept_both_mcp_servers_and_custom_agents.yaml Removes the recorded conversation from the snapshot (now conversations: []).
test/snapshots/mcp_and_agents/should_accept_both_mcp_servers_and_custom_agents.yaml Removes the recorded conversation from the snapshot (now conversations: []); this snapshot is used by other SDK E2E suites.
dotnet/test/E2E/SessionMcpAndAgentConfigE2ETests.cs Removes sending a message and asserting on the assistant response to avoid timeouts/flakiness.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

@github-actions

This comment has been minimized.

The snapshot was updated to have empty conversations, but the Node.js
and Python tests still tried to send a message, causing a 500 proxy
error from the replay proxy.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

Matches the fix applied to .NET, Node.js, and Python tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot mentioned this pull request May 20, 2026
The expect.poll() calls used the default 1s timeout, which is too
short for compaction on Windows. Use 30s to match the pattern used
by the dedicated compaction E2E test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

This comment has been minimized.

GetFinalAssistantMessageAsync was called only after awaiting the
pending_messages.modified event, by which time the assistant message
and idle events may have already been emitted. Await the assistant
message first (it subscribes immediately after SendAsync) so we
don't miss those events.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Cross-SDK Consistency Review ✅

This PR maintains full cross-SDK consistency. The primary fix — removing the message-send + response assertion from the Should_Accept_Both_MCP_Servers_And_Custom_Agents test — was applied uniformly across all four SDK implementations:

SDK File Change
Go go/internal/e2e/mcp_and_agents_e2e_test.go Removed send + assertion block
Node.js nodejs/test/e2e/mcp_and_agents.e2e.test.ts Removed send + assertion block
Python python/e2e/test_mcp_and_agents_e2e.py Removed send + assertion block
.NET dotnet/test/E2E/SessionMcpAndAgentConfigE2ETests.cs Removed send + assertion block

The shared replay snapshot files were also updated consistently for both SDK naming conventions (mcp-and-agents/ and mcp_and_agents/).

The two additional fixes (Node.js expect.poll() timeouts in session_fs.e2e.test.ts and the .NET await-ordering fix in EventFidelityE2ETests.cs) are test-only improvements with no cross-SDK API surface impact.

No consistency concerns identified.

Generated by SDK Consistency Review Agent for issue #1346 · ● 144.7K ·

@stephentoub stephentoub merged commit 43d6aef into main May 20, 2026
43 checks passed
@stephentoub stephentoub deleted the stephentoub/bookish-funicular branch May 20, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants