Platform-agnostic model deployment orchestration library for the Crucible
ecosystem. Provides progressive rollout strategies, format conversion, and
health monitoring for ML model serving infrastructure.
Core Features:
Target Adapters
- vLLM adapter for OpenAI-compatible inference servers
- Ollama adapter for local model serving
- TGI adapter for Hugging Face Text Generation Inference
- HuggingFace Inference Endpoints adapter
- Kubernetes generic deployment adapter
- Noop adapter for testing and development
Rollout Strategies
- Replace: Direct replacement with old deployment termination
- Blue/Green: Zero-downtime traffic switching between environments
- Canary: Gradual traffic shift with configurable percentage steps
- A/B Test: 50/50 split with evaluation period and metric comparison
Format Converters
- GGUF converter for llama.cpp and Ollama compatibility
- ONNX converter for cross-platform runtime support
- TensorRT converter for NVIDIA-optimized inference
Infrastructure
- GenServer-based state machine for deployment lifecycle management
- Registry for tracking active deployments in-memory
- DynamicSupervisor for supervised deployment processes
- Task.Supervisor for background rollout step execution
- Health monitor with configurable thresholds and auto-rollback signals
Observability
- Telemetry events for deploy, promote, rollback, and health checks
- Configurable health check intervals and rate limiting
- Metric collection from target backends
Integration
- Optional Crucible Framework stage integration for pipeline workflows
- Deploy, Promote, and Rollback stages when crucible_framework present
- Graceful degradation when framework unavailable
Testing
- Comprehensive test suite with Mox-based target mocking
- Strategy tests covering progression, rollback, and edge cases
- Target adapter validation for all backends
- Converter tests for format transformations
Dependencies: crucible_framework 0.4.0, crucible_ir 0.2.0,
crucible_model_registry 0.1.0, req 0.5, finch 0.18, telemetry 1.2