Copilot SDK Test by anannya03 · Pull Request #1830 · microsoft/mcp

anannya03 · 2026-02-27T10:33:08Z

What does this PR do?

Introduces an automated E2E testing tool that validates Azure MCP tool invocations using the GitHub Copilot SDK. The tool parses test prompts from e2eTestPrompts.md, executes them through a Copilot agent session with Azure MCP tools configured, and verifies that the expected tools are correctly invoked.

Namespace/tool filtering - Test specific Azure service areas (--namespace storage)
Retry logic - Handles LLM non-determinism with configurable retries (default: 3)
Concurrent Execution - The serial execution of >600 prompts took about 10 -12 hours. Implementing a parallel execution helped us reduce the time to 1-3 hours depending on the number of parallel workers set. Prompt level parallelization was more efficient than namespace level parallelization, hence we've gone ahead with that. The default parallel workers is set to 4, with a max limit set to 8 to prevent setting the parallelization extremely high and in turn risk hitting "429 Too Many Requests" exception.
Early termination - Ends the session once the expected tool is invoked, reducing execution time and unnecessary token usage (as full test completion is not currently implemented).
Per-attempt timeout - A cancellation token enforces a timeout for each retry attempt. Each attempt is allowed up to 5 minutes to ensure that no test runs indefinitely.
Secrets redaction - Sensitive values are redacted from logs and reports to prevent accidental exposure.
Model selection - Set the LLM model we want to use for our test via the --model option.
Report and Logs - Generates a Markdown report with pass/fail status and detailed Copilot session information to help debug failures and identify false negatives or positives.
Test context injection - A predefined test context is injected into every session to guide the LLM’s behavior. This currently includes some of the required parameters and will be extended with additional optional parameters to prevent tool execution failures caused by nonexistent or placeholder values.
One-per-tool mode - Supports quick validation by running a single prompt per tool.

Namespace mode

Tool mode (mode --all) performs better as all the tools are exposed. ~93% accuracy.

NEXT: I’m analyzing the failures and will update the tool descriptions where the correct tool isn’t being invoked. For cases where tests are failing due to insufficient parameters (rather than incorrect tool selection), I’ll update test-context.md to include additional optional parameters as needed.

GitHub issue number?

https://github.com/microsoft/mcp-pr/issues/249

Copilot

Pull request overview

This PR introduces a new E2E test runner tool called "Copilot CLI Tester" that uses the GitHub Copilot SDK to test Azure MCP tools. The tool parses test prompts from markdown files, executes them through Copilot SDK agent sessions, and validates that the expected MCP tools are invoked correctly.

Changes:

Added CopilotCliTester project with CLI for running E2E tests against Azure MCP tools via GitHub Copilot SDK
Implemented prompt parsing, agent session management, tool invocation detection, and markdown report generation
Added GitHub.Copilot.SDK package dependency (version 0.1.26) to Directory.Packages.props
Configured .gitignore to exclude generated test reports

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
eng/tools/CopilotCliTester/src/Program.cs	Main CLI entry point with argument parsing, test orchestration, and report generation
eng/tools/CopilotCliTester/src/PromptParser.cs	Parses e2eTestPrompts.md markdown tables into TestPrompt records
eng/tools/CopilotCliTester/src/AgentRunner.cs	Manages Copilot SDK client/session lifecycle with MCP server integration and event collection
eng/tools/CopilotCliTester/src/AgentRunnerUtils.cs	Utility methods for detecting tool invocations from agent session events
eng/tools/CopilotCliTester/src/Models/TestPrompt.cs	Record type for parsed test prompts with namespace, tool, and prompt text
eng/tools/CopilotCliTester/src/Models/TestResult.cs	Record type for test execution results with status, duration, and attempts
eng/tools/CopilotCliTester/src/Models/AgentModels.cs	Configuration and metadata models for agent sessions
eng/tools/CopilotCliTester/src/Models/JsonContext.cs	AOT-compatible JSON serialization context
eng/tools/CopilotCliTester/src/CopilotCliTester.csproj	.NET 10 project file with AOT compatibility and GitHub.Copilot.SDK package reference
eng/tools/CopilotCliTester/src/test-context.md	Test context with default Azure parameters and placeholder substitution rules
eng/tools/CopilotCliTester/config/test-context.md	Shorter version of test context configuration (potential duplicate)
eng/tools/CopilotCliTester/README.md	Comprehensive documentation with usage examples and architecture overview
Directory.Packages.props	Added GitHub.Copilot.SDK v0.1.26 package reference
.gitignore	Added reports directory to ignore list

jongio

Second pass - found a few issues my earlier review didn't catch.

jongio

Latest commit addresses the items from my previous review: pipe escape, JsonStringEnumConverter, slnx entries, doc fix for check-name-availability, and namespace filter docs. One follow-up on the pipe escape coverage (inline).

CI failures look unrelated. The NU1903 warnings on System.Security.Cryptography.Xml 10.0.0 show up across many projects and pre-date this PR.

jongio

Latest commit picks up the pipe-escape in the summary tables and tightens the prefix-strip logic - those threads are resolved. A few items are still open:

Console summary, JSON metric name, and markdown report still label TestStatus.Error as Skipped (Program.cs:345/354, 519/531, 552). My earlier reply on the original thread covered the three sites.
--threshold parsing is culture-sensitive and will reject 95.5 on de-DE runners.
The builtPath interpolation prints an empty string when FindBuildExecutable returns null.

Once those land I think this is ready.

jongio

Latest 3 commits address the items from the previous round:

--max/--retries/--parallel/--threshold now parse with InvariantCulture
Console summary, JSON metric, and markdown section all use Errored instead of Skipped
ResolveServerExecutable throws an explicit error when FindBuildExecutable returns null, instead of producing the empty-string interpolation
azmcp-commands.md got the missing check-name-availability metadata line

One small follow-up below on the README example. CI failure on Build linux_x64 looks like the same pre-existing NU1903 (System.Security.Cryptography.Xml) noise from prior runs, not introduced here.

jongio

All the items from my previous review look addressed: culture-invariant parsing, JsonStringEnumConverter on TestStatus, Skipped enum removal, pipe-escaping in summary tables, csproj registration in Microsoft.Mcp.slnx, fileshare tool name corrections, and the FindBuildExecutable null check ordering.

One new issue and a follow-up on an existing thread.

spboyer · 2026-04-20T23:44:16Z

All review feedback has been addressed and threads resolved. @anannya03 — ready for another look. Thanks!

spboyer · 2026-04-20T23:44:27Z

All review feedback has been addressed and threads resolved. @anannya03 — ready for another look. Thanks!

jongio

Latest commit cleans up the namespace handling and extends RedactSecrets coverage to the spots I flagged last round - the JSON aggregate file is now safe because TestResult.Error is redacted at construction. Two small follow-ups on the new namespace code.

jongio

Approving to clear my prior CHANGES_REQUESTED. The latest commit addressed both items from my last round - tool-prefix namespace derivation and full RedactSecrets coverage on the JSON aggregate file. The two minor follow-ups in my comment review (missing GetNamespace test, test name drift) are non-blocking.

anannya03 · 2026-04-21T15:18:30Z

Hi @jongio , I've updated the code to include the follow ups regarding the tests

spboyer · 2026-04-22T12:46:04Z

All review threads have been resolved (50/50). @jongio — ready for another look when you get a chance. CI is passing. Thanks!

jongio

Addresses all my previous feedback. The improvements are solid: proper lock types, prefix-stripping for tool matching, async stderr reads, InvariantCulture parsing, dynamic TFM search, and 800+ lines of unit tests covering the key utilities.

Two stale references in the README (see inline comments) - not blocking.

anannya03 · 2026-04-24T04:10:04Z

Thanks @jongio and @spboyer for your help in reviewing the PR!
Merging this PR and will create a follow up PR shortly to update the readme.

* rebase-1 * Parallelie runs and fix tool name * guard against parallel * proper disposal - 1 * reports for anallysis- commit will be reverted later * Bunch of more changes(tool desc, build server change, model and test context) * Clean up * review comments (concurrency, logic and security fix) * resolve review comments 2 * resolved review comments * fixed retry attempts, --parallel error and test prompt * resolved review comments * resolved review comments * review comments * unit testcases * added cli options for threshold, fail, added newline and comments * added missing newline * review comments * review comments * update document, escape prompts, add csproj to sln * Fixes * review comments * metadata was not updated for fileshare checkname * dotnet format * Missing readme example fix * namespace and redact secrets * updated promtparser tests

Copilot AI review requested due to automatic review settings February 27, 2026 10:33

anannya03 requested review from a team as code owners February 27, 2026 10:33

anannya03 requested review from chidozieononiwu, hallipr, saikoumudi, tmeschter, vukelich and weshaggard February 27, 2026 10:33

github-project-automation Bot added this to Azure MCP Server Feb 27, 2026

anannya03 requested a review from jairmyree February 27, 2026 10:33

github-project-automation Bot moved this to Untriaged in Azure MCP Server Feb 27, 2026

Copilot started reviewing on behalf of anannya03 February 27, 2026 10:33 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

anannya03 requested review from JonathanCrd, KarishmaGhiya, conniey and xiangyan99 as code owners March 2, 2026 05:58

anannya03 force-pushed the copilot-cli-test branch from 30c867f to 5314e42 Compare March 2, 2026 06:03

anannya03 marked this pull request as draft March 4, 2026 18:22

srnagar added the Do Not Merge Do Not Merge / WIP PRs label Mar 17, 2026

anannya03 force-pushed the copilot-cli-test branch 2 times, most recently from 022b11f to 847d681 Compare March 25, 2026 22:36

anannya03 marked this pull request as ready for review March 31, 2026 21:25

anannya03 requested a review from vcolin7 as a code owner March 31, 2026 21:25

anannya03 marked this pull request as draft March 31, 2026 21:25

anannya03 added 3 commits March 31, 2026 21:47

rebase-1

b9890c1

Parallelie runs and fix tool name

2970de4

guard against parallel

d8c66d0

review comments

ae3eccd

anannya03 requested a review from jongio April 16, 2026 01:07

jongio requested changes Apr 16, 2026

View reviewed changes

update document, escape prompts, add csproj to sln

40577a9

jongio reviewed Apr 17, 2026

View reviewed changes

Comment thread eng/tools/CopilotCliTester/src/Program.cs

Fixes

58e00e7

jongio reviewed Apr 17, 2026

View reviewed changes

Comment thread eng/tools/CopilotCliTester/src/Program.cs Outdated

Comment thread eng/tools/CopilotCliTester/src/Program.cs Outdated

Comment thread eng/tools/CopilotCliTester/src/Program.cs

anannya03 added 3 commits April 18, 2026 19:19

review comments

600e842

metadata was not updated for fileshare checkname

4e79b1b

dotnet format

f80ba80

jongio reviewed Apr 20, 2026

View reviewed changes

Comment thread eng/tools/CopilotCliTester/README.md

Missing readme example fix

2d34e19

anannya03 requested a review from jongio April 20, 2026 17:44

jongio reviewed Apr 20, 2026

View reviewed changes

Comment thread eng/tools/CopilotCliTester/src/Program.cs

namespace and redact secrets

6c17235

jongio reviewed Apr 21, 2026

View reviewed changes

Comment thread eng/tools/CopilotCliTester/src/PromptParser.cs

Comment thread eng/tools/CopilotCliTester/tests/PromptParserTests.cs

updated promtparser tests

8373852

jongio approved these changes Apr 21, 2026

View reviewed changes

spboyer requested a review from jongio April 22, 2026 12:45

spboyer approved these changes Apr 22, 2026

View reviewed changes

anannya03 self-assigned this Apr 23, 2026

jongio approved these changes Apr 24, 2026

View reviewed changes

Comment thread eng/tools/CopilotCliTester/README.md

Comment thread eng/tools/CopilotCliTester/README.md

anannya03 merged commit ec228f2 into microsoft:main Apr 24, 2026
16 checks passed

github-project-automation Bot moved this from In Progress to Done in Azure MCP Server Apr 24, 2026

Conversation

anannya03 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

GitHub issue number?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spboyer commented Apr 20, 2026

Uh oh!

spboyer commented Apr 20, 2026

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

anannya03 commented Apr 21, 2026

Uh oh!

spboyer commented Apr 22, 2026

Uh oh!

jongio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

anannya03 commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

anannya03 commented Feb 27, 2026 •

edited

Loading