- 
                Notifications
    
You must be signed in to change notification settings  - Fork 657
 
.NET: chore: support retries on Cosmos storage creation #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds configurable retry functionality to Cosmos DB container creation operations. The implementation addresses transient failures that can occur during container initialization by introducing exponential backoff retry logic with customizable parameters.
Key changes:
- Introduces a new options class for configuring retry behavior with exponential backoff
 - Modifies the LazyCosmosContainer to support retry logic during container initialization
 - Adds comprehensive test coverage for the new retry functionality
 
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description | 
|---|---|
| CosmosActorStateStorageOptions.cs | Defines configuration options for retry behavior including max attempts, delays, and backoff multiplier | 
| LazyCosmosContainer.cs | Implements retry logic with exponential backoff for container initialization operations | 
| ServiceCollectionExtensions.cs | Updates dependency injection to pass retry options to LazyCosmosContainer | 
| LazyCosmosContainerTests.cs | Adds integration test to verify retry configuration is properly applied | 
| Microsoft.Extensions.AI.Agents.Runtime.Storage.CosmosDB.csproj | Adds Microsoft.Extensions.Options package reference | 
| Directory.Packages.props | Defines version for Microsoft.Extensions.Options package | 
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
        
          
                dotnet/src/Microsoft.Extensions.AI.Agents.Runtime.Storage.CosmosDB/LazyCosmosContainer.cs
          
            Show resolved
            Hide resolved
        
              
          
                dotnet/src/Microsoft.Extensions.AI.Agents.Runtime.Storage.CosmosDB/LazyCosmosContainer.cs
          
            Show resolved
            Hide resolved
        
      …ft/agent-framework into dmkorolev/cosmos-retries
….Agents.Runtime.Storage.CosmosDB.Tests/CosmosTestFixture.cs
| 
           This is an improvement, but it seems like it still can leave the Lazy in a permanently busted state if the bounded number of retries fail. One way to avoid that is to move to an IAsyncDisposable pattern with an internal retry loop and a CTS. That way the initialization will keep retrying until it succeeds or is canceled. Example: internal sealed class LazyCosmosContainer : IAsyncDisposable
{
    private readonly CancellationTokenSource _cts = new();
    private Task<Container>? _initTask;
    public Task<Container> GetContainerAsync()
        => _initTask ??= InitializeWithRetryAsync(_cts.Token);
    private async Task<Container> InitializeWithRetryAsync(CancellationToken ct)
    {
        var delay = TimeSpan.FromSeconds(1);
        while (true)
        {
            ct.ThrowIfCancellationRequested();
            try { return await InitializeContainerAsync(); }
            catch (CosmosException ex) when (IsTransient(ex))
            {
                await Task.Delay(delay, ct);
                delay = TimeSpan.FromSeconds(Math.Min(delay.TotalSeconds * 2, 30));
            }
        }
    }
    public ValueTask DisposeAsync()
    {
        _cts.Cancel();
        _cts.Dispose();
        return default;
    }
} | 
    
| 
           BTW it might be good to add some jitter in the backoff too so that if multiple instances start at the same time they don't all hammer cosmos in sync.  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We'll need to keep an eye on the CI for a while to make sure it's stable.
        
          
                dotnet/src/Microsoft.Extensions.AI.Agents.Runtime.Storage.CosmosDB/ActorDocuments.cs
          
            Show resolved
            Hide resolved
        
      * support retries * tests + registration options * fix ordering .. * HK + update packages * fix paths * Update dotnet/tests/CosmosDB.IntegrationTests/Microsoft.Extensions.AI.Agents.Runtime.Storage.CosmosDB.Tests/CosmosTestFixture.cs * re create project and fix some pk usage * fix all tests * try workflow? * wip 1 * fix definition * try with cosmos_use_emulator env? * try ignore SSL errors? * other cert verifications * hardcode to 8081? * proper valuation of ENV * logging * ensure db exsists for CI * bump * cleanup * fix usage * nit comment * try only release for stability? * try skip some flaky tests * merge fixes + rollback container * reimplement with iasyncdisposable pattern * remove example doc struct
Add configurable retry on Cosmos container creation.
Also includes #425.
Fixes #307
Fixes #305
Contribution Checklist