-
Notifications
You must be signed in to change notification settings - Fork 53
Fix GrpcChannel handle leak in AzureManaged backend #629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request fixes a GrpcChannel handle leak in the AzureManaged backend by implementing channel caching and proper disposal.
Changes:
- Modified
ConfigureGrpcChannelclass to cache channels per configuration usingConcurrentDictionary<string, Lazy<GrpcChannel>> - Implemented
IAsyncDisposableinConfigureGrpcChannelto properly dispose channels when the ServiceProvider is disposed - Added comprehensive tests to verify channel reuse, isolation for different configurations, and proper disposal
- Changed existing test methods to
async Taskand useawait usingfor proper ServiceProvider disposal
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/Worker/AzureManaged/DurableTaskSchedulerWorkerExtensions.cs | Added channel caching with ConcurrentDictionary and IAsyncDisposable implementation for proper cleanup |
| src/Client/AzureManaged/DurableTaskSchedulerClientExtensions.cs | Added channel caching with ConcurrentDictionary and IAsyncDisposable implementation for proper cleanup |
| test/Worker/AzureManaged.Tests/DurableTaskSchedulerWorkerExtensionsTests.cs | Added tests for channel reuse, isolation, and disposal; updated existing tests to use async/await |
| test/Client/AzureManaged.Tests/DurableTaskSchedulerClientExtensionsTests.cs | Added tests for channel reuse, isolation, and disposal; updated existing tests to use async/await |
test/Client/AzureManaged.Tests/DurableTaskSchedulerClientExtensionsTests.cs
Outdated
Show resolved
Hide resolved
test/Client/AzureManaged.Tests/DurableTaskSchedulerClientExtensionsTests.cs
Show resolved
Hide resolved
|
Passing over one more concern from an agent:
I don't think we have realistic use cases where we would want to re-establish a connection because of changing options in runtime. But let's mention this as a known limitation in the PR description, so that it is discoverable if this ever becomes a problem. |
AnatoliB
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ready to approve as soon as the catch block concerns are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
test/Worker/AzureManaged.Tests/DurableTaskSchedulerWorkerExtensionsTests.cs
Show resolved
Hide resolved
yeah that's a good concern. Also there is a review ( in old pr ) that if the identity changed but in the same type, the cache might still return the old one as we only cached based on type, and hard to get a unique identifier here for identity. But I feel this case is rare too? I added in the PR description ,let me now if that looks good to you. |
You are talking about the identity used to auth to the DTS service, right? I haven't thought about that. In Functions, it would be quite natural to require an app restart to change to a different identity. But I would like @torosent to confirm this is ok for non-Functions cases. |
AnatoliB
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nytian Please confirm the identity question with Tomer, the rest LGTM.
In DTS, it's not required to restart the workers or DTS itself. So there is a chance that the cache would be stale when the permissions changed. I think it's an edge case that we can address in a different PR |
Same as PR #625
Known limitation: No cache invalidation on options change
gRPC channels are cached per configuration (endpoint, task hub, credential type, etc.). If options are changed after the first channel is created (e.g. via IOptionsMonitor.OnChange or options reload), the cached channel is not updated and will keep using the original configuration until the process restarts or the service provider is disposed.
This is considered acceptable because:
Options are usually fixed at startup and not changed at runtime.
We don’t have a concrete scenario where we need to tear down and recreate connections when options change at runtime.
We’re documenting this so it’s discoverable if runtime option changes and channel refresh ever become a requirement. Implementing cache invalidation on options change would require subscribing to IOptionsMonitor.OnChange and invalidating/refreshing the affected channel(s); that is out of scope for this PR.