Improve CallService_McpTool_TelemetryRecorded test reliability#13374
Improve CallService_McpTool_TelemetryRecorded test reliability#13374
Conversation
- Remove QuarantinedTest attribute - Remove initialization telemetry drain code - Use loop pattern to read telemetry until McpToolCall is found - Continue reading until EndOperation is found - Remove sender disposal and verification code Co-authored-by: JamesNK <303201+JamesNK@users.noreply.github.com>
| await testTelemetrySender.DisposeAsync(); | ||
| // Read telemetry items until we find the McpToolCall event | ||
| bool foundMcpToolCall = false; | ||
| while (!foundMcpToolCall && await testTelemetrySender.ContextChannel.Reader.WaitToReadAsync().DefaultTimeout()) |
There was a problem hiding this comment.
@copilot Remove foundMcpToolCall from while test. Instead, break after foundMcpToolCall is set to true.
There was a problem hiding this comment.
Done in e527a40. Removed foundMcpToolCall from the while condition and added break after setting the flag to true.
| Assert.False(hasMore); | ||
| // Then read until we find the EndOperation event | ||
| bool foundEndOperation = false; | ||
| while (!foundEndOperation && await testTelemetrySender.ContextChannel.Reader.WaitToReadAsync().DefaultTimeout()) |
There was a problem hiding this comment.
@copilot Remove foundEndOperation from while test. Instead, break after foundEndOperation is set to true.
There was a problem hiding this comment.
Done in e527a40. Removed foundEndOperation from the while condition and added break after setting the flag to true.
- Remove foundMcpToolCall from while condition, use break after setting flag - Remove foundEndOperation from while condition, use break after setting flag Co-authored-by: JamesNK <303201+JamesNK@users.noreply.github.com>
|
@copilot Run CallService_McpTool_TelemetryRecorded test 50 times. It should pass consistently. |
✅ Test passed all 50 runs consistently. Duration ranged from 1.066s to 1.149s per run. No failures observed. |
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 13374Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 13374" |
1 similar comment
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 13374Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 13374" |
There was a problem hiding this comment.
Pull request overview
This PR successfully removes the CallService_McpTool_TelemetryRecorded test from quarantine by refactoring it to use a more robust loop-based telemetry reading pattern. The changes eliminate flakiness caused by strict assumptions about telemetry ordering and initialization state.
Key Changes:
- Removed
QuarantinedTestattribute to return test to regular CI pipeline - Replaced direct channel reads with loop-based pattern that searches for specific telemetry events
- Simplified assertions to validate presence of key events rather than exact ordering and counts
Description
Refactored the
CallService_McpTool_TelemetryRecordedtest to be more robust and removed it from quarantine. The test was flaky due to strict assumptions about telemetry ordering and initialization state.Changes
McpToolCallandEndOperationevents, ignoring telemetry count and orderingThe new pattern uses
breakfor cleaner loop control:Validation
The test has been validated by running it 50 consecutive times with a 100% pass rate, confirming the reliability improvements have successfully eliminated the flakiness.
Checklist
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.