Skip to content

Improve live test playback running#2333

Merged
alzimmermsft merged 6 commits intomicrosoft:mainfrom
alzimmermsft:ImproveLiveTestPlayback
Apr 8, 2026
Merged

Improve live test playback running#2333
alzimmermsft merged 6 commits intomicrosoft:mainfrom
alzimmermsft:ImproveLiveTestPlayback

Conversation

@alzimmermsft
Copy link
Copy Markdown
Contributor

@alzimmermsft alzimmermsft commented Apr 3, 2026

What does this PR do?

Improves running live tests in playback with the following changes:

  • Retry after headers are now stripped by RecordingRedirectHandler when the test mode is known to be Playback. Reducing playback time spent just waiting on retries that should be instantaneous as we aren't waiting for service throttling / metering.
  • Fixed a pain point where AzurePowerShellCredential is not available could be seen in playback testing due to how LiveTestSettings initialized. It has a default of Live test mode, but if we have .testsettings.json configured to playback it would still try to set up the principal settings, that is now guarded on test mode changing to playback.
  • Added a helper method to override default polling delay to 1 millisecond when running in playback as playback tests don't wait for a server-side operation to complete.

For tools using long-running operations, this PR introduces a new design paradigm to improve testing. LRO calls should no longer use the design:

var response = await client.lroAsync(
    WaitUntil.Completed,
    request,
    cancellationToken);

Instead they should call into the newly added BaseAzureService.WaitForLroCompletionAsync and use this calling pattern:

var operation = await client.lroAsync(
    WaitUntil.Started,
    request,
    cancellationToken);
await WaitForLroCompletionAsync(operation, cancellationToken);
var response = operation.Value;

The new WaitForLroCompletionAsync methods hook into test running context to automatically reduce the LRO polling interval to 1 millisecond when running playback testing as playback testing uses recorded sessions which don't need to wait on a server side operation to complete and can run as fast as possible.

GitHub issue number?

[Link to the GitHub issue this PR addresses]

Pre-merge Checklist

  • Required for All PRs
    • Read contribution guidelines
    • PR title clearly describes the change
    • Commit history is clean with descriptive messages (cleanup guide)
    • Added comprehensive tests for new/modified functionality
    • Created a changelog entry if the change falls among the following: new feature, bug fix, UI/UX update, breaking change, or updated dependencies. Follow the changelog entry guide
  • For MCP tool changes:
    • One tool per PR: This PR adds or modifies only one MCP tool for faster review cycles
    • Updated servers/Azure.Mcp.Server/README.md and/or servers/Fabric.Mcp.Server/README.md documentation
    • Validate README.md changes running the script ./eng/scripts/Process-PackageReadMe.ps1. See Package README
    • For new or modified tool descriptions, ran ToolDescriptionEvaluator and obtained a score of 0.4 or more and a top 3 ranking for all related test prompts
    • For tools with new names, including new tools or renamed tools, update consolidated-tools.json
    • For renamed tools, follow the Tool Rename Checklist and tag the PR with the breaking-change label
    • For new tools associated with Azure services or publicly available tools/APIs/products, add URL to documentation in the PR description
  • Extra steps for Azure MCP Server tool changes:
    • Updated command list in servers/Azure.Mcp.Server/docs/azmcp-commands.md
    • Ran ./eng/scripts/Update-AzCommandsMetadata.ps1 to update tool metadata in azmcp-commands.md (required for CI)
    • Updated test prompts in servers/Azure.Mcp.Server/docs/e2eTestPrompts.md
    • 👉 For Community (non-Microsoft team member) PRs:
      • Security review: Reviewed code for security vulnerabilities, malicious code, or suspicious activities before running tests (crypto mining, spam, data exfiltration, etc.)
      • Manual tests run: added comment /azp run mcp - pullrequest - live to run Live Test Pipeline

Comment thread core/Microsoft.Mcp.Core/src/Services/Http/HttpClientFactoryConfigurator.cs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on speeding up recorded test playback by avoiding unnecessary waits/retries and by standardizing ARM long-running-operation (LRO) handling across multiple Azure tool services.

Changes:

  • Switch many ARM LRO calls from WaitUntil.Completed to WaitUntil.Started + a shared WaitForLroCompletionAsync helper (with a shorter polling interval in playback).
  • In playback, strip/override Retry-After-style response headers coming back through RecordingRedirectHandler to prevent “real-world” backoff delays.
  • Improve playback-mode initialization by guarding principal setup and propagating playback signaling to the server process.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tools/Azure.Mcp.Tools.Workbooks/src/Services/WorkbooksService.cs Uses Start+wait helper for workbook create/delete LROs.
tools/Azure.Mcp.Tools.StorageSync/src/Services/StorageSyncService.cs Uses Start+wait helper for multiple Storage Sync LROs; minor modern C# cleanups.
tools/Azure.Mcp.Tools.Sql/src/Services/SqlService.cs Uses Start+wait helper for SQL ARM LROs; simplifies null-check throw.
tools/Azure.Mcp.Tools.Postgres/src/Services/PostgresService.cs Uses Start+wait helper for server parameter update LRO.
tools/Azure.Mcp.Tools.MySql/src/Services/MySqlService.cs Uses Start+wait helper for configuration LRO; minor token request cleanup.
tools/Azure.Mcp.Tools.Monitor/src/Services/MonitorWebTestService.cs Uses Start+wait helper for web test create/update LROs; minor Uri construction cleanup.
tools/Azure.Mcp.Tools.ManagedLustre/src/Services/ManagedLustreService.cs Uses Start+wait helper for multiple Lustre ARM LROs; minor object init refactors.
tools/Azure.Mcp.Tools.LoadTesting/src/Services/LoadTestingService.cs Refactors some null checks and uses Start+wait helper for resource create; adds several ?? throw guards.
tools/Azure.Mcp.Tools.FileShares/src/Services/FileSharesService.cs Uses Start+wait helper across file share/snapshot/PE connection LROs; minor object init cleanups.
tools/Azure.Mcp.Tools.EventHubs/src/Services/EventHubsService.cs Uses Start+wait helper for namespace/event hub/consumer group LROs.
tools/Azure.Mcp.Tools.ConfidentialLedger/src/Services/ConfidentialLedgerService.cs Uses Start+wait helper for ledger append-entry operation.
tools/Azure.Mcp.Tools.Compute/src/Services/ComputeService.cs Uses Start+wait helper for VM/VMSS/network/disk LROs; minor modern C# updates.
tools/Azure.Mcp.Tools.Communication/src/Services/CommunicationService.cs Uses Start+wait helper for email send LRO; minor object init cleanups.
tools/Azure.Mcp.Tools.AppService/src/Services/AppServiceService.cs Uses Start+wait helper for config update LRO; minor modern C# header/value construction changes.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Helpers/TestEnvironment.cs Namespace/style cleanup (file-scoped namespace).
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/RecordedCommandTestsBase.cs Uses collection expression for sanitizer list; formatting adjustments.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/TestProxyFixture.cs File-scoped namespace + cleanup; adds path resolver configurability.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/LiveTestSettingsFixture.cs Guards principal setup for playback; refactor/formatting.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/Helpers/LiveTestSettings.cs Reuses shared JsonSerializerOptions for settings deserialization.
core/Microsoft.Mcp.Core/tests/Microsoft.Mcp.Tests/Client/CommandTestsBase.cs Propagates playback env vars (including TEST_MODE) to server process; minor cleanup.
core/Microsoft.Mcp.Core/src/Services/Http/RecordingRedirectHandler.cs In playback, overrides retry-after headers to avoid backoff delays.
core/Microsoft.Mcp.Core/src/Services/Azure/Authentication/CustomChainedCredential.cs Playback detection now uses environment-based IsPlaybackTesting() (debug-only).
core/Microsoft.Mcp.Core/src/Helpers/EnvironmentVariableHelpers.cs Adds IsPlaybackTesting() helper (debug-only, based on TEST_MODE).
core/Azure.Mcp.Core/tests/Azure.Mcp.Core.LiveTests/RecordingFramework/RecordedCommandTestsBaseTests.cs Removes unused usings.
core/Azure.Mcp.Core/tests/Azure.Mcp.Core.LiveTests/RecordingFramework/RecordedCommandTestHarness.cs Minor expression-bodied method cleanups and formatting.
core/Azure.Mcp.Core/src/Services/Azure/BaseAzureService.cs Adds LRO completion helper and playback-specific poll interval shortening.
core/Azure.Mcp.Core/src/Services/Azure/BaseAzureResourceService.cs Makes generic ARM helpers static; uses Start+wait helper for generic resource LROs.

Comment thread core/Azure.Mcp.Core/src/Services/Azure/BaseAzureService.cs
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean approach to the playback perf problem. The three-pronged fix (strip retry-after headers + 1ms LRO poll + credential guard) hits the right targets. The #if DEBUG guard on poll interval is a nice safety measure.

Three items inline - two minor nits and one missed LRO conversion [Codex].

Comment thread core/Microsoft.Mcp.Core/src/Helpers/EnvironmentVariableHelpers.cs Outdated
Comment thread core/Microsoft.Mcp.Core/src/Services/Http/RecordingRedirectHandler.cs Outdated
Comment thread tools/Azure.Mcp.Tools.LoadTesting/src/Services/LoadTestingService.cs Outdated
@alzimmermsft alzimmermsft requested a review from vcolin7 as a code owner April 7, 2026 14:40
@alzimmermsft alzimmermsft requested a review from jongio April 7, 2026 16:34
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One gap not covered by existing comments: TEST_MODE isn't in ClearEnvironmentVariablesBeforeTestAttribute._variablesToClear. Since this env var now drives playback detection (credential short-circuit, retry stripping, poll interval), it can leak between test runs and cause flaky results. Worth adding alongside the existing AZURE_TOKEN_CREDENTIALS entry.

Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three-pronged approach to playback perf (strip retry-after headers + 1ms LRO poll + credential guard) is clean. LRO conversions across 14 services are consistent.

Issues to address:

  • RecordingRedirectHandler.cs:64 - StripRetryAfter modifies response headers on every request path but has no tests
  • IKeyVaultService.cs:23 - return type change makes CertificateOperation mockable, resolving the TODO at CertificateCreateCommandTests.cs:51

Comment thread tools/Azure.Mcp.Tools.KeyVault/src/Services/IKeyVaultService.cs
@alzimmermsft alzimmermsft requested a review from jongio April 8, 2026 17:39
Copy link
Copy Markdown
Contributor

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three-pronged playback optimization is clean and consistent:

  1. Retry-after header stripping in RecordingRedirectHandler - handles Retry-After, x-ms-retry-after-ms, and retry-after-ms
  2. 1ms LRO poll interval via WaitForLroCompletionAsync helper - applies to all 14 services through BaseAzureService
  3. Credential guard in LiveTestSettingsFixture - prevents AzurePowerShellCredential errors in playback

LRO conversion covers every service in the codebase that uses WaitUntil.Completed - verified no calls remain outside the changed files. Pattern is consistent across all 14 services.

The ConfidentialLedger change also threads cancellationToken through RequestContext where it wasn't before - nice improvement beyond the stated scope.

Follow-up items from previous reviews (StripRetryAfter tests, KeyVault test coverage) are fine as separate work.

@github-project-automation github-project-automation Bot moved this from Untriaged to In Progress in Azure MCP Server Apr 8, 2026
@alzimmermsft alzimmermsft merged commit d5aa9a6 into microsoft:main Apr 8, 2026
16 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Azure MCP Server Apr 8, 2026
@alzimmermsft alzimmermsft deleted the ImproveLiveTestPlayback branch April 8, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants