Refactor: Improve MCPServer retry logic and transient error detection#329
Merged
Refactor: Improve MCPServer retry logic and transient error detection#329
Conversation
…ging This upgrade fixes an issue where cross-client audience scopes configured in the Dex provider were being ignored when clients requested specific OAuth scopes. The fix in mcp-oauth v0.2.56 ensures that mandatory scopes like `audience:server:client_id:dex-k8s-authenticator` are always merged into client-requested scopes, enabling proper SSO token forwarding for Kubernetes OIDC authentication. Related: giantswarm/mcp-oauth#203
- Add sync.WaitGroup to track in-flight restart goroutines for clean shutdown - Extract shouldAttemptRetry helper method for better readability - Move RetryInterval constant to file-level with other constants - Consolidate HTTP 5xx patterns and add coverage for 501, 507-509 status codes - Add context cancellation check before restart attempts - Expand test coverage for HTTP error detection including mixed case and wrapped errors
…-cross-client-scope-fix * origin/main: Fix: Upgrade mcp-oauth to v0.2.56 for cross-client audience scope merging (#328)
- Add unit tests for orchestrator retry methods (shouldAttemptRetry, attemptReconnectFailedServers, retryFailedMCPServers shutdown) - Add MaxConcurrentRetries (5) to prevent thundering herd on retry - Add HTTP 505/506 status codes to transient error detection - Document the grace period sleep in Restart method
…r handling - Replace time.Sleep in test with proper channel synchronization - Add compile-time interface verification for mock types - Simplify HTTP 5xx pattern matching using loop (DRY) - Add test for restart error handling gracefully - Extract RestartGracePeriod as named constant with documentation - Update comment to reflect HTTP 5xx range 500-511
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
Orchestrator Retry Logic
retryFailedMCPServers()background task that periodically checks for failed serversshouldAttemptRetry()checks service state and backoff expirationMaxConcurrentRetries = 5prevents overwhelming upstream servicesTransient Error Detection
Code Quality Improvements (Code Review)
time.Sleepin test with proper channel synchronization (no race conditions)var _ Interface = (*Mock)(nil))RestartGracePeriodas named constant with documentationDocumentation
Restart()method explaining why it's neededRestartGracePeriodconstant with detailed rationaleTest plan
make test)muster test --parallel 50)shouldAttemptRetrywith various service states and backoff conditionsattemptReconnectFailedServerswith eligible/ineligible servicesMaxConcurrentRetriesenforcement