Skip to content

api-proxy: fail-fast validation of API keys at startup #2198

@lpcox

Description

@lpcox

Problem

When an API key is expired, invalid, or misconfigured, the agent only discovers this minutes into a workflow run — after container startup, healthchecks, and initial reasoning turns. The error surfaces as an opaque upstream 401/403 deep in the agent's execution, making it hard to diagnose.

Proposal

Add startup key validation to the api-proxy sidecar (containers/api-proxy/server.js). After all proxy servers are listening (ports 10000–10004), fire lightweight probe requests against each configured provider's API to verify credentials are accepted. Log clear, actionable messages for each result.

Validation endpoints

Provider Probe Auth header Valid response Invalid response
OpenAI GET /v1/models Authorization: Bearer {key} 200 401
Anthropic POST /v1/messages (minimal body) x-api-key + anthropic-version 400 (missing fields = key valid) 401/403
Copilot (COPILOT_GITHUB_TOKEN, non-classic) GET /models Authorization: Bearer {token} 200 401
Copilot (COPILOT_API_KEY / classic PAT) Skip validation Log "validation not supported for this auth mode"
Gemini GET /v1beta/models x-goog-api-key 200 400/403

Design constraints

  1. Only validate default API targets. Custom targets (--openai-api-target, --gemini-api-target, etc.) and non-empty base paths may not support the probe endpoints. Skip validation and log "skipped — custom API target" for these.

  2. Use proxyAgent for all requests. Validation requests must route through Squid, same as normal traffic. This also validates that the proxy chain is working.

  3. Respect startup sequencing. Wrap each server.listen() in a Promise, await Promise.all(...), then fire validation. This ensures Docker healthcheck (port 10000) passes before validation begins.

  4. 10-second timeout per provider. If a probe takes longer, log a warning and move on — the network may not be ready yet.

  5. Never exit the process. Log errors clearly but don't crash. The key might become valid later, or the agent might not use that provider.

  6. Shared helper function. A single validateKey(provider, target, path, headers, expectedStatus) function avoids duplicating URL construction, proxy routing, and timeout logic across providers.

  7. Anthropic probe is inconclusive. Since there's no lightweight GET endpoint, the POST probe should include proper headers (anthropic-version, Content-Type) and classify 400 as "key accepted, request malformed" vs 401/403 as "key rejected." Log the nuance.

  8. Copilot auth mode matters. COPILOT_GITHUB_TOKEN (non-classic) can be validated via /models. Classic ghp_* PATs and COPILOT_API_KEY (BYOK) cannot — log that validation is skipped with the reason.

Log format

[INFO]  key_validation_success  { provider: "openai", message: "OpenAI API key validated successfully", duration_ms: 342 }
[ERROR] key_validation_failed   { provider: "anthropic", status: 401, message: "Anthropic API key is invalid or expired. Requests to this provider will fail." }
[WARN]  key_validation_skipped  { provider: "copilot", message: "Validation skipped — COPILOT_API_KEY auth mode does not support probe endpoint" }
[WARN]  key_validation_skipped  { provider: "openai", message: "Validation skipped — custom API target (my-llm-router.internal)" }
[WARN]  key_validation_timeout  { provider: "gemini", message: "Key validation timed out after 10s — network may not be ready" }

Testing

  • Unit tests: mock https.request to return various status codes; verify correct log events
  • Integration: can be tested manually with --enable-api-proxy and deliberately expired keys

Files to modify

  • containers/api-proxy/server.js — add validateKey() helper and validateApiKeys() orchestrator
  • containers/api-proxy/server.test.js — unit tests for the validation logic
  • Export validateKey for testability

Out of scope

  • Periodic re-validation (only at startup)
  • Key rotation / refresh
  • Blocking the agent until validation completes (fire-and-forget after servers listen)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions