Skip to content

Add service maturity model with CI-enforced tier requirements #122

@haasonsaas

Description

@haasonsaas

Context

The Constellation has an implicit tiering (Tier 1 production, Tier 2 early-stage, Tier 3 skeleton) but no formal maturity model. There's no way to enforce that a Tier 1 service has tracing, alerts, a runbook, and load tests — or to prevent a Tier 3 service from being promoted to Tier 1 without meeting requirements.

Adapted from Google's Production Readiness Review, Uber's production standards, and Mercari's open-source checklist.

Requirements

service.yaml specification

  • Define a service.yaml schema that lives in every repo root:
name: identity
tier: 1  # 1=critical-path, 2=important, 3=internal/experimental
owner: platform-team
dependencies:
  - postgres
  - redis
  - nats-jetstream
slo:
  availability: 99.9
  latency_p99_ms: 200
readiness:
  has_tracing: true
  has_alerts: true
  has_runbook: true
  has_load_test: false
  has_integration_tests: true
  has_openapi_spec: false

Tier requirements

  • Tier 1 (critical path): All readiness fields must be true. SLO must be defined. Breaking change detection in CI.
  • Tier 2 (important): has_tracing, has_alerts, has_integration_tests must be true. SLO defined.
  • Tier 3 (internal): Health check endpoint, CI passing with unit tests, documented owner. Minimal requirements.

CI enforcement

  • Add a GitHub Action (in the shared composite action) that reads service.yaml and validates tier-appropriate requirements
  • Fail the build if a Tier 1 service is missing a required capability
  • Warn (don't fail) for Tier 2 services missing recommended capabilities
  • No enforcement for Tier 3

Rollout

  • Add service.yaml to template-go-service
  • Create initial service.yaml for all existing repos (start with accurate current state, not aspirational)
  • Quarterly review: walk through all services, confirm tiers, identify Tier 3 services that have become load-bearing

Why this matters

Without a formal model, Tier 3 skeleton services quietly become production dependencies without acquiring the operational maturity (tracing, alerts, runbooks) that production demands. This is how incidents happen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions