Skip to content

Agent auto-registration for Kubernetes auto-scaling #234

@JoshuaAFerguson

Description

@JoshuaAFerguson

Problem

The current agent approval workflow requires manual administrator approval for each new agent instance. This creates operational friction and blocks automatic scaling scenarios:

Current Flow:

  1. New agent pod starts (e.g., during K8s HPA scale-up)
  2. Agent sends registration request to Control Plane
  3. Agent status: pending - waits for admin approval
  4. Administrator must manually approve via UI/API
  5. Agent connects and becomes active

Impact on Auto-Scaling:

  • K8s HorizontalPodAutoscaler (HPA) can scale agent pods based on metrics
  • But new pods sit idle waiting for approval
  • No capacity increase until admin approves (defeats auto-scaling purpose)
  • During scale-down, old agents may be terminated before approval
  • Leads to "ghost" agents in database with pending status

Proposed Solutions

Option 1: Namespace-Based Auto-Approval

  • Agents from trusted namespaces (e.g., streamspace) auto-approve
  • Configuration: AGENT_AUTO_APPROVE_NAMESPACES=streamspace,prod
  • Still require manual approval for agents from other namespaces
  • Pros: Simple, preserves security for external agents
  • Cons: Namespace can be spoofed if K8s RBAC misconfigured

Option 2: Shared Secret / API Key

  • Agents authenticate with pre-shared secret or API key
  • Secret stored in Kubernetes Secret, mounted to agent pods
  • Control Plane validates secret on registration
  • Pros: Stronger authentication, industry standard
  • Cons: Secret rotation required, more complex setup

Option 3: Kubernetes Service Account Token

  • Agents use K8s ServiceAccount tokens for authentication
  • Control Plane validates token against K8s API server
  • Leverages existing K8s RBAC and identity
  • Pros: No additional secrets, native K8s integration, auto-rotation
  • Cons: Requires Control Plane → K8s API access (couples to platform)

Option 4: Certificate-Based Mutual TLS (mTLS)

  • Agents use client certificates signed by trusted CA
  • Control Plane validates certificate on WebSocket upgrade
  • Pros: Strongest security, certificate expiry/revocation
  • Cons: Most complex, requires PKI infrastructure

Recommendation

Phase 1 (v2.1.0): Implement Option 1 (Namespace-Based) + Option 2 (API Key)

  • Quick win for K8s auto-scaling with namespace auto-approval
  • API key support for Docker agents (v2.1) and external platforms
  • Configuration:
    agents:
      autoApprove:
        enabled: true
        namespaces: ["streamspace"]  # Auto-approve from these namespaces
        requireApiKey: true           # Also allow API key auth

Phase 2 (v2.2.0): Add Option 3 (ServiceAccount Token) for enhanced K8s security

  • Validate agent ServiceAccount tokens via K8s TokenReview API
  • Eliminates manual secret management

Acceptance Criteria

  • Agents in configured namespaces auto-approve on registration
  • Agents with valid API key auto-approve
  • Manual approval still available for untrusted agents
  • Configuration via Helm chart values
  • Documentation updated with auto-scaling guide
  • Backward compatible: defaults to manual approval (existing behavior)

Related Issues

Technical Notes

Database Schema:
Current agents table has status enum: pending, active, inactive

May need to add approval_method field to track how agent was approved:

  • manual - Admin approval via UI/API
  • namespace - Auto-approved via namespace trust
  • api_key - Auto-approved via API key
  • service_account - Auto-approved via K8s SA token (future)

Implementation Files:

  • api/internal/handlers/agents.go - Registration handler
  • api/internal/middleware/agent_auth.go - New auth middleware
  • agents/k8s-agent/main.go - Add API key to registration request
  • chart/values.yaml - Add agents.autoApprove config

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions