Complete vMCP circuit breaker configuration and testing#3615
Complete vMCP circuit breaker configuration and testing#3615
Conversation
Wire circuit breaker configuration from VirtualMCPServer CRD to health
monitoring system and add comprehensive E2E test coverage.
Circuit breaker implementation:
- Wire CircuitBreakerConfig from VirtualMCPServer spec to health monitor
- Add HealthCheckTimeout field to FailureHandlingConfig (default: 10s)
- Configure failure threshold, timeout, and enabled flag from CRD
- Add info logging when circuit breaker is enabled
Configuration changes:
- pkg/vmcp/config/config.go: Add HealthCheckTimeout field
- cmd/vmcp/app/commands.go: Wire both circuit breaker config and timeout
to health monitor, converting from config types to health types
Related-to: #3036
There was a problem hiding this comment.
Pull request overview
This PR completes the circuit breaker implementation for the Virtual MCP Server by wiring configuration from the VirtualMCPServer CRD to the health monitoring system and adding comprehensive end-to-end tests.
Changes:
- Added HealthCheckTimeout field to FailureHandlingConfig for configuring health check timeouts
- Wired circuit breaker configuration (enabled flag, failure threshold, timeout) from CRD to health monitor
- Added comprehensive E2E test suite for circuit breaker lifecycle (failure detection, circuit opening, recovery)
- Fixed condition type references to use constants instead of string literals in helper functions
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/vmcp/config/config.go | Added HealthCheckTimeout field to FailureHandlingConfig with proper documentation and kubebuilder annotations |
| cmd/vmcp/app/commands.go | Wired circuit breaker configuration from config types to health monitor types with validation and logging |
| deploy/charts/operator-crds/templates/toolhive.stacklok.dev_virtualmcpservers.yaml | Updated CRD schema to include healthCheckTimeout field with default value and validation pattern |
| deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml | Updated CRD file with healthCheckTimeout field (mirrors template changes) |
| test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go | Added comprehensive E2E test for circuit breaker lifecycle including failure detection, circuit opening, and recovery scenarios |
| test/e2e/thv-operator/virtualmcp/helpers.go | Fixed to use ConditionTypeVirtualMCPServerReady constant instead of string literal |
| test/e2e/thv-operator/virtualmcp/virtualmcp_status_reporting_test.go | Fixed to use ConditionTypeVirtualMCPServerReady constant instead of string literal |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3615 +/- ##
==========================================
- Coverage 65.68% 65.61% -0.08%
==========================================
Files 408 410 +2
Lines 40416 40655 +239
==========================================
+ Hits 26549 26677 +128
- Misses 11805 11899 +94
- Partials 2062 2079 +17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
437603a to
6955d9c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6955d9c to
925b6b6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/e2e/thv-operator/virtualmcp/virtualmcp_circuit_breaker_test.go
Outdated
Show resolved
Hide resolved
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_virtualmcpservers.yaml
Show resolved
Hide resolved
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml
Show resolved
Hide resolved
925b6b6 to
9c01491
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9c01491 to
0a0b2a8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_virtualmcpservers.yaml
Show resolved
Hide resolved
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml
Show resolved
Hide resolved
0a0b2a8 to
160dd4d
Compare
Fix circuit breaker E2E test recovery and improve test reliability Improve the circuit breaker recovery test to handle stuck pods in ImagePullBackOff state. When an MCPServer image is changed from a non-existent image back to a valid image, Kubernetes doesn't automatically recreate the StatefulSet pod - it stays in ImagePullBackOff. The test now deletes stuck pending pods to force recreation with the correct image. Also simplify the failure simulation approach: - Before: Scaled deployments/statefulsets to 0 replicas - After: Change image to non-existent (matches status reporting test pattern) Changes: - Add pod deletion step in recovery test to handle ImagePullBackOff - Switch from scaling approach to image change approach (cleaner) - Remove unused imports (appsv1, labels, ptr) - Regenerate CRD docs with healthCheckTimeout field All 5 circuit breaker E2E tests now pass reliably. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Add validation for health check timeout vs interval constraint Validate that HealthCheckTimeout is less than HealthCheckInterval to prevent health checks from queuing up. This enforces the constraint documented in config.go but was not previously validated at runtime. The validation returns a clear error message when the constraint is violated: health check timeout (Xs) must be less than check interval (Ys) to prevent checks from queuing up Default values are safe (timeout: 10s < interval: 30s), but this prevents invalid user configurations. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Use cbFailureThreshold constant in circuit breaker error message Replace hardcoded threshold value "3" with the cbFailureThreshold constant for consistency and maintainability. This ensures the error message stays accurate if the test configuration changes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Fix circuit breaker state enum mismatch between Go and CRD Change CircuitHalfOpen constant from "half_open" (underscore) to "half-open" (hyphen) to match the CRD enum validation in VirtualMCPServer status. Without this fix, when the circuit breaker transitions to half-open state, the status update would be rejected by the Kubernetes API server because "half_open" doesn't match the CRD enum which requires "half-open". The CRD enum is: - closed - open - half-open All code uses the CircuitHalfOpen constant, so this change is safe and only affects the string value used in status reporting. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Add circuit breaker configuration validation Add runtime and CRD validation for circuit breaker configuration to prevent invalid values that could cause unexpected behavior. Runtime validation (cmd/vmcp/app/commands.go): - FailureThreshold must be >= 1 (can't open circuit with 0 failures) - Timeout must be >= 1s when set (prevents thrashing with rapid recovery attempts) - Validation only runs when circuit breaker is enabled CRD validation (pkg/vmcp/config/config.go): - Add +kubebuilder:validation:Minimum=1 for FailureThreshold - Update documentation to clarify constraints - Kubernetes API will reject invalid configurations at creation time Benefits: - Prevents silent failures from misconfiguration - Clear error messages guide users to correct values - Defense-in-depth: validation at both CRD and runtime levels Example error messages: - "circuit breaker failure threshold must be >= 1, got 0" - "circuit breaker timeout must be >= 1s to prevent thrashing, got 500ms" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
160dd4d to
a0f1282
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Wire circuit breaker configuration from VirtualMCPServer CRD to health monitoring system and add comprehensive E2E test coverage.
Circuit breaker implementation:
Configuration changes:
Related-to: #3036