-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
Description
Epic: Implement Proper Reconciliation for Kubernetes and Filesystem Modes
Problem Statement
Currently, muster uses an event-driven architecture without proper reconciliation loops. This means:
- No automatic change detection: Changes to CRDs/YAML files are not automatically detected and applied
- No drift correction: System state can diverge from desired state without detection
- Manual synchronization: Users must manually trigger operations to apply configuration changes
- Inconsistent behavior: Kubernetes and filesystem modes behave differently regarding state synchronization
Current State Analysis
Based on code analysis, muster currently:
- ✅ Has unified client interface (
MusterClient) supporting both K8s and filesystem modes - ✅ Uses event-driven updates for service state changes
- ❌ Missing: Kubernetes controllers/watchers for CRD changes
- ❌ Missing: Filesystem watchers for YAML file changes
- ❌ Missing: Reconciliation loops to ensure desired state
Desired End State
Implement a unified reconciliation system that:
- Automatically detects changes in both Kubernetes and filesystem modes
- Reconciles desired vs actual state for all resource types
- Maintains consistency across different deployment modes
- Provides feedback on reconciliation status and errors
- Integrates seamlessly with existing event-driven architecture
Success Criteria
- CRD changes in Kubernetes are automatically detected and reconciled
- YAML file changes in filesystem mode are automatically detected and reconciled
- System automatically corrects drift between desired and actual state
- Reconciliation status is visible through API and CLI
- Performance impact is minimal (efficient watching/polling)
- Backward compatibility is maintained
Architecture Overview
Kubernetes Mode
K8s API Server → Controller/Informer → Reconciler → Internal State → Services
↑ ↓
└─────────────── Status Updates ←──────────────── Events ←─────────────┘
Filesystem Mode
File Watcher → Change Detector → Reconciler → Internal State → Services
↑ ↓
└─────────── File Updates ←───────────── Events ←─────────────┘
Epic Breakdown
Phase 1: Foundation (4-6 weeks)
- Reconciliation Framework: Core interfaces and patterns
- Change Detection: File watchers and Kubernetes informers
- Reconciler Registry: Plugin system for different resource types
Phase 2: Resource Reconcilers (6-8 weeks)
- MCPServer Reconciler: Handle MCPServer lifecycle
- ServiceClass Reconciler: Manage ServiceClass definitions
- Workflow Reconciler: Process Workflow definitions
- Service Instance Reconciler: Coordinate running services
Phase 3: Integration & Polish (3-4 weeks)
- Event Integration: Connect with existing event system
- Performance Optimization: Efficient batching and caching
- Monitoring & Observability: Metrics and logging
- Documentation: Architecture and operational guides
Key Components to Implement
1. Reconciliation Framework
ReconcileManager- Central coordinationReconcilerinterface - Standard reconciliation patternChangeDetectorinterface - Unified change detectionReconcileLoop- Generic reconciliation loop implementation
2. Kubernetes Controllers
MCPServerController- Watch and reconcile MCPServer CRDsServiceClassController- Watch and reconcile ServiceClass CRDsWorkflowController- Watch and reconcile Workflow CRDs- Integration with controller-runtime framework
3. Filesystem Watchers
FileSystemWatcher- Monitor YAML file changesDirectoryScanner- Periodic validation scansFileChangeProcessor- Handle file system events
4. State Synchronization
StateComparator- Compare desired vs actual stateDriftDetector- Identify configuration driftActionPlanner- Determine reconciliation actions
Technical Considerations
Performance
- Use Kubernetes informers with proper caching
- Implement file watching with debouncing
- Batch operations where possible
- Configurable reconciliation intervals
Error Handling
- Exponential backoff for failed reconciliations
- Dead letter queue for persistently failing items
- Detailed error reporting and logging
- Circuit breaker patterns for external dependencies
Backward Compatibility
- Maintain existing API interfaces
- Support both reconciled and manual operation modes
- Graceful degradation when reconciliation is disabled
- Migration path for existing deployments
Dependencies
-
External:
controller-runtime(Kubernetes controllers)fsnotify(filesystem watching)golang.org/x/sync(coordination primitives)
-
Internal:
- Existing
MusterClientinterface - Current event system integration
- Service orchestrator patterns
- Existing
Risks & Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
| Performance degradation | High | Benchmarking, caching, configurable intervals |
| Race conditions | High | Proper locking, event ordering, idempotent operations |
| Complex state management | Medium | Clear interfaces, extensive testing, state machines |
| Backward compatibility | Medium | Feature flags, graceful degradation, migration guides |
Acceptance Criteria
Functional
- Changes to CRDs are automatically applied to running services
- Changes to YAML files are automatically detected and processed
- System recovers automatically from configuration drift
- Reconciliation can be disabled/enabled per resource type
- Reconciliation status is exposed via API
Non-Functional
- Reconciliation adds <100ms latency to normal operations
- Memory usage increases by <50MB for typical workloads
- CPU usage increases by <5% for typical workloads
- 99.9% uptime maintained during reconciliation operations
- Zero data loss during reconciliation failures
Testing
- Unit tests for all reconciliation components
Epic Owner: @teemow
Estimated Effort: 13-18 weeks
Priority: High
Labels: epic, enhancement, kubernetes, reconciliation, architecture
Reactions are currently unavailable