Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new E2E test runner tool called "Copilot CLI Tester" that uses the GitHub Copilot SDK to test Azure MCP tools. The tool parses test prompts from markdown files, executes them through Copilot SDK agent sessions, and validates that the expected MCP tools are invoked correctly.
Changes:
- Added CopilotCliTester project with CLI for running E2E tests against Azure MCP tools via GitHub Copilot SDK
- Implemented prompt parsing, agent session management, tool invocation detection, and markdown report generation
- Added GitHub.Copilot.SDK package dependency (version 0.1.26) to Directory.Packages.props
- Configured .gitignore to exclude generated test reports
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| eng/tools/CopilotCliTester/src/Program.cs | Main CLI entry point with argument parsing, test orchestration, and report generation |
| eng/tools/CopilotCliTester/src/PromptParser.cs | Parses e2eTestPrompts.md markdown tables into TestPrompt records |
| eng/tools/CopilotCliTester/src/AgentRunner.cs | Manages Copilot SDK client/session lifecycle with MCP server integration and event collection |
| eng/tools/CopilotCliTester/src/AgentRunnerUtils.cs | Utility methods for detecting tool invocations from agent session events |
| eng/tools/CopilotCliTester/src/Models/TestPrompt.cs | Record type for parsed test prompts with namespace, tool, and prompt text |
| eng/tools/CopilotCliTester/src/Models/TestResult.cs | Record type for test execution results with status, duration, and attempts |
| eng/tools/CopilotCliTester/src/Models/AgentModels.cs | Configuration and metadata models for agent sessions |
| eng/tools/CopilotCliTester/src/Models/JsonContext.cs | AOT-compatible JSON serialization context |
| eng/tools/CopilotCliTester/src/CopilotCliTester.csproj | .NET 10 project file with AOT compatibility and GitHub.Copilot.SDK package reference |
| eng/tools/CopilotCliTester/src/test-context.md | Test context with default Azure parameters and placeholder substitution rules |
| eng/tools/CopilotCliTester/config/test-context.md | Shorter version of test context configuration (potential duplicate) |
| eng/tools/CopilotCliTester/README.md | Comprehensive documentation with usage examples and architecture overview |
| Directory.Packages.props | Added GitHub.Copilot.SDK v0.1.26 package reference |
| .gitignore | Added reports directory to ignore list |
30c867f to
5314e42
Compare
022b11f to
847d681
Compare
jongio
left a comment
There was a problem hiding this comment.
Second pass - found a few issues my earlier review didn't catch.
jongio
left a comment
There was a problem hiding this comment.
Latest commit addresses the items from my previous review: pipe escape, JsonStringEnumConverter, slnx entries, doc fix for check-name-availability, and namespace filter docs. One follow-up on the pipe escape coverage (inline).
CI failures look unrelated. The NU1903 warnings on System.Security.Cryptography.Xml 10.0.0 show up across many projects and pre-date this PR.
jongio
left a comment
There was a problem hiding this comment.
Latest commit picks up the pipe-escape in the summary tables and tightens the prefix-strip logic - those threads are resolved. A few items are still open:
- Console summary, JSON metric name, and markdown report still label
TestStatus.ErrorasSkipped(Program.cs:345/354, 519/531, 552). My earlier reply on the original thread covered the three sites. --thresholdparsing is culture-sensitive and will reject95.5onde-DErunners.- The
builtPathinterpolation prints an empty string whenFindBuildExecutablereturns null.
Once those land I think this is ready.
jongio
left a comment
There was a problem hiding this comment.
Latest 3 commits address the items from the previous round:
--max/--retries/--parallel/--thresholdnow parse withInvariantCulture- Console summary, JSON metric, and markdown section all use
Erroredinstead ofSkipped ResolveServerExecutablethrows an explicit error whenFindBuildExecutablereturnsnull, instead of producing the empty-string interpolationazmcp-commands.mdgot the missingcheck-name-availabilitymetadata line
One small follow-up below on the README example. CI failure on Build linux_x64 looks like the same pre-existing NU1903 (System.Security.Cryptography.Xml) noise from prior runs, not introduced here.
jongio
left a comment
There was a problem hiding this comment.
All the items from my previous review look addressed: culture-invariant parsing, JsonStringEnumConverter on TestStatus, Skipped enum removal, pipe-escaping in summary tables, csproj registration in Microsoft.Mcp.slnx, fileshare tool name corrections, and the FindBuildExecutable null check ordering.
One new issue and a follow-up on an existing thread.
|
All review feedback has been addressed and threads resolved. @anannya03 — ready for another look. Thanks! |
1 similar comment
|
All review feedback has been addressed and threads resolved. @anannya03 — ready for another look. Thanks! |
jongio
left a comment
There was a problem hiding this comment.
Latest commit cleans up the namespace handling and extends RedactSecrets coverage to the spots I flagged last round - the JSON aggregate file is now safe because TestResult.Error is redacted at construction. Two small follow-ups on the new namespace code.
jongio
left a comment
There was a problem hiding this comment.
Approving to clear my prior CHANGES_REQUESTED. The latest commit addressed both items from my last round - tool-prefix namespace derivation and full RedactSecrets coverage on the JSON aggregate file. The two minor follow-ups in my comment review (missing GetNamespace test, test name drift) are non-blocking.
|
Hi @jongio , I've updated the code to include the follow ups regarding the tests |
|
All review threads have been resolved (50/50). @jongio — ready for another look when you get a chance. CI is passing. Thanks! |
jongio
left a comment
There was a problem hiding this comment.
Addresses all my previous feedback. The improvements are solid: proper lock types, prefix-stripping for tool matching, async stderr reads, InvariantCulture parsing, dynamic TFM search, and 800+ lines of unit tests covering the key utilities.
Two stale references in the README (see inline comments) - not blocking.
* rebase-1 * Parallelie runs and fix tool name * guard against parallel * proper disposal - 1 * reports for anallysis- commit will be reverted later * Bunch of more changes(tool desc, build server change, model and test context) * Clean up * review comments (concurrency, logic and security fix) * resolve review comments 2 * resolved review comments * fixed retry attempts, --parallel error and test prompt * resolved review comments * resolved review comments * review comments * unit testcases * added cli options for threshold, fail, added newline and comments * added missing newline * review comments * review comments * update document, escape prompts, add csproj to sln * Fixes * review comments * metadata was not updated for fileshare checkname * dotnet format * Missing readme example fix * namespace and redact secrets * updated promtparser tests
What does this PR do?
Introduces an automated E2E testing tool that validates Azure MCP tool invocations using the GitHub Copilot SDK. The tool parses test prompts from
e2eTestPrompts.md, executes them through a Copilot agent session with Azure MCP tools configured, and verifies that the expected tools are correctly invoked.Namespace mode

Tool mode (mode --all) performs better as all the tools are exposed. ~93% accuracy.
NEXT: I’m analyzing the failures and will update the tool descriptions where the correct tool isn’t being invoked. For cases where tests are failing due to insufficient parameters (rather than incorrect tool selection), I’ll update test-context.md to include additional optional parameters as needed.
GitHub issue number?
https://github.com/microsoft/mcp-pr/issues/249