Skip to content

Copilot SDK Test#1830

Merged
anannya03 merged 27 commits intomicrosoft:mainfrom
anannya03:copilot-cli-test
Apr 24, 2026
Merged

Copilot SDK Test#1830
anannya03 merged 27 commits intomicrosoft:mainfrom
anannya03:copilot-cli-test

Conversation

@anannya03
Copy link
Copy Markdown
Contributor

@anannya03 anannya03 commented Feb 27, 2026

What does this PR do?

Introduces an automated E2E testing tool that validates Azure MCP tool invocations using the GitHub Copilot SDK. The tool parses test prompts from e2eTestPrompts.md, executes them through a Copilot agent session with Azure MCP tools configured, and verifies that the expected tools are correctly invoked.

  • Namespace/tool filtering - Test specific Azure service areas (--namespace storage)
  • Retry logic - Handles LLM non-determinism with configurable retries (default: 3)
  • Concurrent Execution - The serial execution of >600 prompts took about 10 -12 hours. Implementing a parallel execution helped us reduce the time to 1-3 hours depending on the number of parallel workers set. Prompt level parallelization was more efficient than namespace level parallelization, hence we've gone ahead with that. The default parallel workers is set to 4, with a max limit set to 8 to prevent setting the parallelization extremely high and in turn risk hitting "429 Too Many Requests" exception.
  • Early termination - Ends the session once the expected tool is invoked, reducing execution time and unnecessary token usage (as full test completion is not currently implemented).
  • Per-attempt timeout - A cancellation token enforces a timeout for each retry attempt. Each attempt is allowed up to 5 minutes to ensure that no test runs indefinitely.
  • Secrets redaction - Sensitive values are redacted from logs and reports to prevent accidental exposure.
  • Model selection - Set the LLM model we want to use for our test via the --model option.
  • Report and Logs - Generates a Markdown report with pass/fail status and detailed Copilot session information to help debug failures and identify false negatives or positives.
  • Test context injection - A predefined test context is injected into every session to guide the LLM’s behavior. This currently includes some of the required parameters and will be extended with additional optional parameters to prevent tool execution failures caused by nonexistent or placeholder values.
  • One-per-tool mode - Supports quick validation by running a single prompt per tool.

Namespace mode
Screenshot 2026-03-01 194506

Tool mode (mode --all) performs better as all the tools are exposed. ~93% accuracy.

NEXT: I’m analyzing the failures and will update the tool descriptions where the correct tool isn’t being invoked. For cases where tests are failing due to insufficient parameters (rather than incorrect tool selection), I’ll update test-context.md to include additional optional parameters as needed.

GitHub issue number?

https://github.com/microsoft/mcp-pr/issues/249

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new E2E test runner tool called "Copilot CLI Tester" that uses the GitHub Copilot SDK to test Azure MCP tools. The tool parses test prompts from markdown files, executes them through Copilot SDK agent sessions, and validates that the expected MCP tools are invoked correctly.

Changes:

  • Added CopilotCliTester project with CLI for running E2E tests against Azure MCP tools via GitHub Copilot SDK
  • Implemented prompt parsing, agent session management, tool invocation detection, and markdown report generation
  • Added GitHub.Copilot.SDK package dependency (version 0.1.26) to Directory.Packages.props
  • Configured .gitignore to exclude generated test reports

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
eng/tools/CopilotCliTester/src/Program.cs Main CLI entry point with argument parsing, test orchestration, and report generation
eng/tools/CopilotCliTester/src/PromptParser.cs Parses e2eTestPrompts.md markdown tables into TestPrompt records
eng/tools/CopilotCliTester/src/AgentRunner.cs Manages Copilot SDK client/session lifecycle with MCP server integration and event collection
eng/tools/CopilotCliTester/src/AgentRunnerUtils.cs Utility methods for detecting tool invocations from agent session events
eng/tools/CopilotCliTester/src/Models/TestPrompt.cs Record type for parsed test prompts with namespace, tool, and prompt text
eng/tools/CopilotCliTester/src/Models/TestResult.cs Record type for test execution results with status, duration, and attempts
eng/tools/CopilotCliTester/src/Models/AgentModels.cs Configuration and metadata models for agent sessions
eng/tools/CopilotCliTester/src/Models/JsonContext.cs AOT-compatible JSON serialization context
eng/tools/CopilotCliTester/src/CopilotCliTester.csproj .NET 10 project file with AOT compatibility and GitHub.Copilot.SDK package reference
eng/tools/CopilotCliTester/src/test-context.md Test context with default Azure parameters and placeholder substitution rules
eng/tools/CopilotCliTester/config/test-context.md Shorter version of test context configuration (potential duplicate)
eng/tools/CopilotCliTester/README.md Comprehensive documentation with usage examples and architecture overview
Directory.Packages.props Added GitHub.Copilot.SDK v0.1.26 package reference
.gitignore Added reports directory to ignore list

Comment thread eng/tools/CopilotCliTester/src/AgentRunner.cs Outdated
Comment thread eng/tools/CopilotCliTester/README.md Outdated
Comment thread eng/tools/CopilotCliTester/src/CopilotCliTester.csproj Outdated
Comment thread eng/tools/CopilotCliTester/config/test-context.md Outdated
Comment thread eng/tools/CopilotCliTester/src/test-context.md Outdated
Comment thread eng/tools/CopilotCliTester/config/test-context.md Outdated
Comment thread eng/tools/CopilotCliTester/src/Models/TestResult.cs Outdated
Comment thread eng/tools/CopilotCliTester/README.md Outdated
Comment thread eng/tools/CopilotCliTester/src/Program.cs Outdated
@anannya03 anannya03 marked this pull request as draft March 4, 2026 18:22
@srnagar srnagar added the Do Not Merge Do Not Merge / WIP PRs label Mar 17, 2026
@anannya03 anannya03 force-pushed the copilot-cli-test branch 2 times, most recently from 022b11f to 847d681 Compare March 25, 2026 22:36
@anannya03 anannya03 marked this pull request as ready for review March 31, 2026 21:25
@anannya03 anannya03 requested a review from vcolin7 as a code owner March 31, 2026 21:25
@anannya03 anannya03 marked this pull request as draft March 31, 2026 21:25
@anannya03 anannya03 requested a review from jongio April 16, 2026 01:07
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass - found a few issues my earlier review didn't catch.

Comment thread eng/tools/CopilotCliTester/src/PromptParser.cs Outdated
Comment thread servers/Azure.Mcp.Server/docs/e2eTestPrompts.md
Comment thread eng/tools/CopilotCliTester/src/CopilotCliTester.csproj
Comment thread eng/tools/CopilotCliTester/src/Models/TestStatus.cs Outdated
Comment thread eng/tools/CopilotCliTester/src/Program.cs
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit addresses the items from my previous review: pipe escape, JsonStringEnumConverter, slnx entries, doc fix for check-name-availability, and namespace filter docs. One follow-up on the pipe escape coverage (inline).

CI failures look unrelated. The NU1903 warnings on System.Security.Cryptography.Xml 10.0.0 show up across many projects and pre-date this PR.

Comment thread eng/tools/CopilotCliTester/src/Program.cs
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit picks up the pipe-escape in the summary tables and tightens the prefix-strip logic - those threads are resolved. A few items are still open:

  • Console summary, JSON metric name, and markdown report still label TestStatus.Error as Skipped (Program.cs:345/354, 519/531, 552). My earlier reply on the original thread covered the three sites.
  • --threshold parsing is culture-sensitive and will reject 95.5 on de-DE runners.
  • The builtPath interpolation prints an empty string when FindBuildExecutable returns null.

Once those land I think this is ready.

Comment thread eng/tools/CopilotCliTester/src/Program.cs Outdated
Comment thread eng/tools/CopilotCliTester/src/Program.cs Outdated
Comment thread eng/tools/CopilotCliTester/src/Program.cs
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest 3 commits address the items from the previous round:

  • --max/--retries/--parallel/--threshold now parse with InvariantCulture
  • Console summary, JSON metric, and markdown section all use Errored instead of Skipped
  • ResolveServerExecutable throws an explicit error when FindBuildExecutable returns null, instead of producing the empty-string interpolation
  • azmcp-commands.md got the missing check-name-availability metadata line

One small follow-up below on the README example. CI failure on Build linux_x64 looks like the same pre-existing NU1903 (System.Security.Cryptography.Xml) noise from prior runs, not introduced here.

Comment thread eng/tools/CopilotCliTester/README.md
@anannya03 anannya03 requested a review from jongio April 20, 2026 17:44
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the items from my previous review look addressed: culture-invariant parsing, JsonStringEnumConverter on TestStatus, Skipped enum removal, pipe-escaping in summary tables, csproj registration in Microsoft.Mcp.slnx, fileshare tool name corrections, and the FindBuildExecutable null check ordering.

One new issue and a follow-up on an existing thread.

Comment thread eng/tools/CopilotCliTester/src/Program.cs
@spboyer
Copy link
Copy Markdown
Member

spboyer commented Apr 20, 2026

All review feedback has been addressed and threads resolved. @anannya03 — ready for another look. Thanks!

1 similar comment
@spboyer
Copy link
Copy Markdown
Member

spboyer commented Apr 20, 2026

All review feedback has been addressed and threads resolved. @anannya03 — ready for another look. Thanks!

Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit cleans up the namespace handling and extends RedactSecrets coverage to the spots I flagged last round - the JSON aggregate file is now safe because TestResult.Error is redacted at construction. Two small follow-ups on the new namespace code.

Comment thread eng/tools/CopilotCliTester/src/PromptParser.cs
Comment thread eng/tools/CopilotCliTester/tests/PromptParserTests.cs
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to clear my prior CHANGES_REQUESTED. The latest commit addressed both items from my last round - tool-prefix namespace derivation and full RedactSecrets coverage on the JSON aggregate file. The two minor follow-ups in my comment review (missing GetNamespace test, test name drift) are non-blocking.

@anannya03
Copy link
Copy Markdown
Contributor Author

Hi @jongio , I've updated the code to include the follow ups regarding the tests

@spboyer spboyer requested a review from jongio April 22, 2026 12:45
@spboyer
Copy link
Copy Markdown
Member

spboyer commented Apr 22, 2026

All review threads have been resolved (50/50). @jongio — ready for another look when you get a chance. CI is passing. Thanks!

@anannya03 anannya03 self-assigned this Apr 23, 2026
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses all my previous feedback. The improvements are solid: proper lock types, prefix-stripping for tool matching, async stderr reads, InvariantCulture parsing, dynamic TFM search, and 800+ lines of unit tests covering the key utilities.

Two stale references in the README (see inline comments) - not blocking.

Comment thread eng/tools/CopilotCliTester/README.md
Comment thread eng/tools/CopilotCliTester/README.md
@anannya03
Copy link
Copy Markdown
Contributor Author

Thanks @jongio and @spboyer for your help in reviewing the PR!
Merging this PR and will create a follow up PR shortly to update the readme.

@anannya03 anannya03 merged commit ec228f2 into microsoft:main Apr 24, 2026
16 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Azure MCP Server Apr 24, 2026
colbytimm pushed a commit to colbytimm/microsoft-mcp that referenced this pull request Apr 25, 2026
* rebase-1

* Parallelie runs and fix tool name

* guard against parallel

* proper disposal - 1

* reports for anallysis- commit will be reverted later

* Bunch of more changes(tool desc, build server change, model and test context)

* Clean up

* review comments (concurrency, logic and security fix)

* resolve review comments 2

* resolved review comments

* fixed retry attempts, --parallel error and test prompt

* resolved review comments

* resolved review comments

* review comments

* unit testcases

* added cli options for threshold, fail, added newline and comments

* added missing newline

* review comments

* review comments

* update document, escape prompts, add csproj to sln

* Fixes

* review comments

* metadata was not updated for fileshare checkname

* dotnet format

* Missing readme example fix

* namespace and redact secrets

* updated promtparser tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants