Improve live test playback running#2333
Conversation
There was a problem hiding this comment.
Pull request overview
This PR focuses on speeding up recorded test playback by avoiding unnecessary waits/retries and by standardizing ARM long-running-operation (LRO) handling across multiple Azure tool services.
Changes:
- Switch many ARM LRO calls from
WaitUntil.CompletedtoWaitUntil.Started+ a sharedWaitForLroCompletionAsynchelper (with a shorter polling interval in playback). - In playback, strip/override
Retry-After-style response headers coming back throughRecordingRedirectHandlerto prevent “real-world” backoff delays. - Improve playback-mode initialization by guarding principal setup and propagating playback signaling to the server process.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/Azure.Mcp.Tools.Workbooks/src/Services/WorkbooksService.cs | Uses Start+wait helper for workbook create/delete LROs. |
| tools/Azure.Mcp.Tools.StorageSync/src/Services/StorageSyncService.cs | Uses Start+wait helper for multiple Storage Sync LROs; minor modern C# cleanups. |
| tools/Azure.Mcp.Tools.Sql/src/Services/SqlService.cs | Uses Start+wait helper for SQL ARM LROs; simplifies null-check throw. |
| tools/Azure.Mcp.Tools.Postgres/src/Services/PostgresService.cs | Uses Start+wait helper for server parameter update LRO. |
| tools/Azure.Mcp.Tools.MySql/src/Services/MySqlService.cs | Uses Start+wait helper for configuration LRO; minor token request cleanup. |
| tools/Azure.Mcp.Tools.Monitor/src/Services/MonitorWebTestService.cs | Uses Start+wait helper for web test create/update LROs; minor Uri construction cleanup. |
| tools/Azure.Mcp.Tools.ManagedLustre/src/Services/ManagedLustreService.cs | Uses Start+wait helper for multiple Lustre ARM LROs; minor object init refactors. |
| tools/Azure.Mcp.Tools.LoadTesting/src/Services/LoadTestingService.cs | Refactors some null checks and uses Start+wait helper for resource create; adds several ?? throw guards. |
| tools/Azure.Mcp.Tools.FileShares/src/Services/FileSharesService.cs | Uses Start+wait helper across file share/snapshot/PE connection LROs; minor object init cleanups. |
| tools/Azure.Mcp.Tools.EventHubs/src/Services/EventHubsService.cs | Uses Start+wait helper for namespace/event hub/consumer group LROs. |
| tools/Azure.Mcp.Tools.ConfidentialLedger/src/Services/ConfidentialLedgerService.cs | Uses Start+wait helper for ledger append-entry operation. |
| tools/Azure.Mcp.Tools.Compute/src/Services/ComputeService.cs | Uses Start+wait helper for VM/VMSS/network/disk LROs; minor modern C# updates. |
| tools/Azure.Mcp.Tools.Communication/src/Services/CommunicationService.cs | Uses Start+wait helper for email send LRO; minor object init cleanups. |
| tools/Azure.Mcp.Tools.AppService/src/Services/AppServiceService.cs | Uses Start+wait helper for config update LRO; minor modern C# header/value construction changes. |
| core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Helpers/TestEnvironment.cs | Namespace/style cleanup (file-scoped namespace). |
| core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/RecordedCommandTestsBase.cs | Uses collection expression for sanitizer list; formatting adjustments. |
| core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/TestProxyFixture.cs | File-scoped namespace + cleanup; adds path resolver configurability. |
| core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/LiveTestSettingsFixture.cs | Guards principal setup for playback; refactor/formatting. |
| core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/LiveTestSettings.cs | Reuses shared JsonSerializerOptions for settings deserialization. |
| core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/CommandTestsBase.cs | Propagates playback env vars (including TEST_MODE) to server process; minor cleanup. |
| core/Microsoft.Mcp.Core/src/Services/Http/RecordingRedirectHandler.cs | In playback, overrides retry-after headers to avoid backoff delays. |
| core/Microsoft.Mcp.Core/src/Services/Azure/Authentication/CustomChainedCredential.cs | Playback detection now uses environment-based IsPlaybackTesting() (debug-only). |
| core/Microsoft.Mcp.Core/src/Helpers/EnvironmentVariableHelpers.cs | Adds IsPlaybackTesting() helper (debug-only, based on TEST_MODE). |
| core/Azure.Mcp.Core/tests/Azure.Mcp.Core.LiveTests/RecordingFramework/RecordedCommandTestsBaseTests.cs | Removes unused usings. |
| core/Azure.Mcp.Core/tests/Azure.Mcp.Core.LiveTests/RecordingFramework/RecordedCommandTestHarness.cs | Minor expression-bodied method cleanups and formatting. |
| core/Azure.Mcp.Core/src/Services/Azure/BaseAzureService.cs | Adds LRO completion helper and playback-specific poll interval shortening. |
| core/Azure.Mcp.Core/src/Services/Azure/BaseAzureResourceService.cs | Makes generic ARM helpers static; uses Start+wait helper for generic resource LROs. |
jongio
left a comment
There was a problem hiding this comment.
Clean approach to the playback perf problem. The three-pronged fix (strip retry-after headers + 1ms LRO poll + credential guard) hits the right targets. The #if DEBUG guard on poll interval is a nice safety measure.
Three items inline - two minor nits and one missed LRO conversion [Codex].
jongio
left a comment
There was a problem hiding this comment.
One gap not covered by existing comments: TEST_MODE isn't in ClearEnvironmentVariablesBeforeTestAttribute._variablesToClear. Since this env var now drives playback detection (credential short-circuit, retry stripping, poll interval), it can leak between test runs and cause flaky results. Worth adding alongside the existing AZURE_TOKEN_CREDENTIALS entry.
jongio
left a comment
There was a problem hiding this comment.
Three-pronged approach to playback perf (strip retry-after headers + 1ms LRO poll + credential guard) is clean. LRO conversions across 14 services are consistent.
Issues to address:
- RecordingRedirectHandler.cs:64 - StripRetryAfter modifies response headers on every request path but has no tests
- IKeyVaultService.cs:23 - return type change makes CertificateOperation mockable, resolving the TODO at CertificateCreateCommandTests.cs:51
jongio
left a comment
There was a problem hiding this comment.
Three-pronged playback optimization is clean and consistent:
- Retry-after header stripping in RecordingRedirectHandler - handles Retry-After, x-ms-retry-after-ms, and retry-after-ms
- 1ms LRO poll interval via WaitForLroCompletionAsync helper - applies to all 14 services through BaseAzureService
- Credential guard in LiveTestSettingsFixture - prevents AzurePowerShellCredential errors in playback
LRO conversion covers every service in the codebase that uses WaitUntil.Completed - verified no calls remain outside the changed files. Pattern is consistent across all 14 services.
The ConfidentialLedger change also threads cancellationToken through RequestContext where it wasn't before - nice improvement beyond the stated scope.
Follow-up items from previous reviews (StripRetryAfter tests, KeyVault test coverage) are fine as separate work.
What does this PR do?
Improves running live tests in playback with the following changes:
RecordingRedirectHandlerwhen the test mode is known to bePlayback. Reducing playback time spent just waiting on retries that should be instantaneous as we aren't waiting for service throttling / metering.AzurePowerShellCredential is not availablecould be seen in playback testing due to howLiveTestSettingsinitialized. It has a default ofLivetest mode, but if we have.testsettings.jsonconfigured to playback it would still try to set up the principal settings, that is now guarded on test mode changing to playback.For tools using long-running operations, this PR introduces a new design paradigm to improve testing. LRO calls should no longer use the design:
Instead they should call into the newly added
BaseAzureService.WaitForLroCompletionAsyncand use this calling pattern:The new
WaitForLroCompletionAsyncmethods hook into test running context to automatically reduce the LRO polling interval to 1 millisecond when running playback testing as playback testing uses recorded sessions which don't need to wait on a server side operation to complete and can run as fast as possible.GitHub issue number?
[Link to the GitHub issue this PR addresses]Pre-merge Checklist
servers/Azure.Mcp.Server/README.mdand/orservers/Fabric.Mcp.Server/README.mddocumentationREADME.mdchanges running the script./eng/scripts/Process-PackageReadMe.ps1. See Package READMEToolDescriptionEvaluatorand obtained a score of0.4or more and a top 3 ranking for all related test promptsconsolidated-tools.jsonbreaking-changelabelservers/Azure.Mcp.Server/docs/azmcp-commands.md./eng/scripts/Update-AzCommandsMetadata.ps1to update tool metadata inazmcp-commands.md(required for CI)servers/Azure.Mcp.Server/docs/e2eTestPrompts.mdcrypto mining, spam, data exfiltration, etc.)/azp run mcp - pullrequest - liveto run Live Test Pipeline