Skip to content

design: unified OAuth2 auth for internal platform tools (Prometheus, AlertManager)#75

Merged
bdchatham merged 2 commits intomainfrom
design/platform-oauth2-ext-authz
Apr 10, 2026
Merged

design: unified OAuth2 auth for internal platform tools (Prometheus, AlertManager)#75
bdchatham merged 2 commits intomainfrom
design/platform-oauth2-ext-authz

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

  • Design doc for protecting Prometheus and AlertManager behind Google OAuth using Istio ext_authz + OAuth2 Proxy
  • Fixes broken PagerDuty generatorURL links — Prometheus will be externally accessible at prometheus.prod.platform.sei.io with authentication
  • SSO via shared cookie domain (.prod.platform.sei.io) — authenticate once, access all protected tools

Motivation

PagerDuty alert links point to http://sei-prod-prometheus.monitoring:9090/graph?g0.expr=... which is the cluster-internal Prometheus address. On-call engineers can't click through to see the metric that triggered an alert.

Approach

  • OAuth2 Proxy deployed as shared ext_authz provider in auth namespace
  • Istio AuthorizationPolicy with action: CUSTOM scoped to specific hostnames (opt-in model)
  • Google OAuth with same allowed domains as Grafana (seinetwork.io, sei.io, seifdn.org)
  • Fail-closed — tools return 503 when auth is down, never unprotected
  • Grafana keeps its own OAuth (needs user identity for role mapping)

Key Design Decisions

Decision Choice Rationale
Auth pattern Istio ext_authz (not per-service oauth2-proxy sidecars) Single deployment, uniform auth, any service opts in via AuthorizationPolicy
OAuth client Separate from Grafana Redirect URIs are per-client in Google Cloud Console
Session storage Cookie-based (no Redis) Stateless, HA, appropriate for team scale
Cookie domain .prod.platform.sei.io SSO across all protected tools
Failure mode fail-closed Tools inaccessible > tools unprotected

Implementation will touch

  • platform repo: OAuth2 Proxy Helm, HTTPRoutes, AuthorizationPolicy, Istio mesh config, Prometheus externalUrl
  • Google Cloud Console: new OAuth2 web client

Requesting feedback on

  • Overall approach (ext_authz vs alternatives)
  • Namespace choice (auth vs monitoring)
  • Cookie TTL (12h session, 1h refresh)
  • Any tools beyond Prometheus/AlertManager to include in v1?
  • Istio mesh config change mechanism (Helm values vs ConfigMap patch)

🤖 Generated with Claude Code

bdchatham and others added 2 commits April 10, 2026 14:43
Adds design doc for protecting Prometheus and AlertManager behind Google
OAuth using Istio ext_authz + OAuth2 Proxy. Fixes broken PagerDuty
generatorURL links by enabling external Prometheus access with auth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Share the Grafana Google OAuth client rather than creating a separate
one. Just add the OAuth2 Proxy callback URI to the existing client.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit 8c760be into main Apr 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant