Skip to content

v0.6.0 - SRE v2 Fleet Health

Choose a tag to compare

@Knapp-Kevin Knapp-Kevin released this 17 Mar 23:43
· 3 commits to main since this release

What's New

SRE v2 Fleet Health Dashboard

Full fleet health visibility with per-agent metrics, circuit breaker states, and trust stage progression.

New Endpoints

  • GET /sre/snapshot - Expanded with v2 fields: slis, auditEvents, fleet
  • GET /sre/events - Recent governance audit events with limit parameter
  • GET /sre/fleet - Per-agent health status with circuit breaker state

New Module

  • agent_metrics.py - AgentMetricsRegistry for per-agent operational metrics
    • Circuit breaker: closedhalf-openopen
    • Trust stage derivation: CBT → KBT → IBT based on success rate
    • Thread-safe with configurable thresholds

New Types

  • CircuitBreakerConfig - Configurable circuit breaker thresholds
  • TrustDimension / TrustScoreV2 - Multi-dimensional trust scoring
  • AuditEvent - Governance event for SRE panel
  • FleetAgent - Agent health snapshot
  • SliMetric - Standard 7-SLI dashboard metrics

Enhanced

  • FailSafeComplianceSLI.get_slis() - Returns 7 SliMetric objects
  • FailSafeAuditSink.get_recent_events() - Query recent audit events
  • DecisionCallback now includes latency_ms parameter
  • integration.py wires AgentMetricsRegistry into _on_decision callback

Test Coverage

  • 357 tests passing
  • 46 new tests for SRE v2 components

🤖 Generated with Claude Code