Skip to content

Feat: Implement OAuth Proxy for Remote MCP Servers#149

Merged
teemow merged 17 commits intomainfrom
feat/issue-144-oauth-proxy
Dec 21, 2025
Merged

Feat: Implement OAuth Proxy for Remote MCP Servers#149
teemow merged 17 commits intomainfrom
feat/issue-144-oauth-proxy

Conversation

@teemow
Copy link
Member

@teemow teemow commented Dec 20, 2025

Summary

This PR implements the OAuth Proxy feature for remote MCP server authentication as outlined in Epic #144.

Changes

Phase 1: Foundation & Configuration

  • Config types: Added OAuthConfig to MusterConfig.Aggregator with fields for publicUrl, clientId, callbackPath, and enabled
  • CLI flags: Added --oauth, --oauth-public-url, and --oauth-client-id flags to muster serve
  • Default values: Added sensible defaults including the CIMD URL and callback path

Phase 2: Server-Side OAuth Logic

  • New internal/oauth package containing:

    • types.go: OAuth types including Token, AuthChallenge, OAuthState, OAuthMetadata, ClientMetadata, WWWAuthenticateParams
    • token_store.go: Thread-safe in-memory token storage with automatic cleanup
    • state_store.go: Thread-safe state parameter storage for CSRF protection (stores issuer and code verifier with state)
    • client.go: OAuth 2.1 client with PKCE support, metadata discovery (with thread-safe caching and TTL), token exchange, and refresh
    • www_authenticate.go: WWW-Authenticate header parser for extracting issuer and scope
    • handler.go: HTTP handler for OAuth callbacks with success/error pages
    • manager.go: Coordinates OAuth flows and integrates with the aggregator
    • api_adapter.go: Implements api.OAuthHandler following the service locator pattern
  • API layer integration: Added OAuthHandler interface in internal/api/oauth.go with registration functions

  • Aggregator integration: Updated aggregator to mount OAuth callback handler on the HTTP mux

Phase 3: Agent Integration

  • Auth challenge detection: Agent detects auth_required responses and displays formatted messages using api.AuthChallenge
  • User-friendly messages: Agent formats auth challenges with the auth URL for user authentication

Phase 4: Documentation

  • CIMD file: Created docs/oauth-client.json for GitHub Pages hosting (Client ID Metadata Document)

Phase 5: Synthetic Tool Placeholder Pattern

Implemented the "Synthetic Tool Placeholder" pattern to solve the chicken-and-egg problem where OAuth authentication is required before the MCP protocol handshake can complete:

  • 401 Detection: StreamableHTTPClient and SSEClient now detect 401 errors during initialization and return AuthRequiredError with parsed OAuth information from the WWW-Authenticate header
  • AuthRequiredError: New error type in mcpserver/types.go containing URL and OAuth info (issuer, scope, resource metadata URL)
  • ServerStatus: New enum in aggregator/types.go with StatusConnected, StatusDisconnected, and StatusAuthRequired
  • Synthetic Auth Tools: When a server returns 401 during init, it's registered in StatusAuthRequired state with a synthetic authenticate_<server> tool
  • Auth Tool Handler: Calling the synthetic tool creates an OAuth challenge with a sign-in link, or attempts to upgrade the server if a token already exists
  • Server Upgrade: UpgradeToConnected() method in registry upgrades a pending auth server to connected status after successful OAuth

Synthetic Tool Flow

  1. Server returns 401 during Initialize()AuthRequiredError
  2. Server registered in StatusAuthRequired with synthetic authenticate_<server> tool
  3. User/Agent sees the auth tool in the tool list
  4. User calls authenticate_<server>
  5. If no token: returns OAuth challenge with sign-in URL
  6. User authenticates in browser → callback → token stored
  7. User retries tool or calls it again → server upgraded to connected
  8. Real tools from the server become available

Phase 6: Helm Chart Configuration

Added OAuth proxy configuration to the Helm chart:

  • values.yaml: Added muster.oauth section with:
    • enabled: Enable/disable OAuth proxy (default: false)
    • publicUrl: Publicly accessible URL for OAuth callbacks
    • clientId: OAuth client identifier (CIMD URL)
    • callbackPath: OAuth callback endpoint path (default: /oauth/callback)
  • values.schema.json: Added schema validation for OAuth configuration
  • deployment.yaml: Passes OAuth flags to the container when enabled
  • README.md: Documented OAuth configuration with examples

Example Helm values:

muster:
  oauth:
    enabled: true
    publicUrl: "https://muster.example.com"
    clientId: "https://giantswarm.github.io/muster/oauth-client.json"
    callbackPath: "/oauth/callback"

Phase 7: Integration Testing & Bug Fixes

Tested with real remote MCP server (mcp-kubernetes on Gazelle cluster):

  • Bug Fix - Auth Required Server Registration: The orchestrator now properly catches AuthRequiredError and registers servers in auth_required state with the aggregator via RegisterServerPendingAuth
  • Bug Fix - Event Handler Deregistration: Event handler now skips deregistration for servers in auth_required state (they should remain registered with their synthetic auth tool)
  • Bug Fix - Nil Pointer in Deregister: Fixed nil pointer dereference when deregistering servers without a client (auth_required servers have nil clients until authenticated)
  • New API Method: Added RegisterServerPendingAuth to AggregatorHandler interface and implemented in api_adapter.go
  • New Callback: Added isServerAuthRequired callback to event handler for checking server status

Verified working flow:

MCPServer kubernetes-gazelle requires authentication, registering pending auth
Registered pending auth server: kubernetes-gazelle (requires authentication)
Skipping deregistration of kubernetes-gazelle - server is in auth_required state
Server kubernetes-gazelle requires auth, exposing synthetic tool
GetAllTools: returning 96 tools from 6 connected + 1 auth_required servers

Security Review & Improvements

A comprehensive security review was conducted and the following findings were identified:

Security Strengths (Already Present)

  • PKCE Implementation: Uses S256 code challenge method with 32-byte crypto/rand verifiers
  • State Parameter Security: Cryptographically random nonces, single-use states, 10-minute expiry
  • XSS Prevention: html.EscapeString() for all user-controlled HTML output
  • Information Disclosure Prevention: Error responses don't expose token response bodies
  • Token Security: Tokens stored server-side only, never sent to agent
  • Thread Safety: All stores protected with RWMutex, no race conditions

Security Review Findings (All Addressed)

Finding Severity Resolution
Session ID logging Low Session IDs now truncated to first 8 chars in logs
Token refresh monitoring Info Added INFO-level logging with duration metrics
TLS requirements Medium Comprehensive TLS documentation in doc.go
Rate limiting Medium Rate limiting recommendations in Helm values/README
CIMD redirect URI limitation Medium Documented in Helm README with instructions for custom deployments
stdio session fallback Low Added warning log when default session is used
Metadata cache without signature validation Low Added TLS assumption comment; TLS provides integrity
No token encryption at rest Info Documented in-memory-only storage in doc.go

Critical: Session Isolation (Multi-User Security)

How Session IDs Work:

The mcp-go library provides unique session IDs for each MCP connection:

  • SSE Transport: Each connection gets a UUID session ID via uuid.New().String()
  • Streamable HTTP Transport: Each connection gets a unique session ID from sessionIdManager.Generate()

The session ID is automatically injected into the context by the mcp-go library and retrieved using server.ClientSessionFromContext(ctx).

Token Isolation:

Tokens are stored with a composite key: (SessionID, Issuer, Scope). This ensures:

  • User A's tokens are never accessible by User B
  • Each user must authenticate independently to remote MCP servers
  • SSO works within a single user's session (same session, different servers using same IdP)

Stdio Fallback:

For stdio transport (single-user CLI), falls back to "default-session" which is acceptable since stdio is inherently single-user (one process = one user). A warning is now logged when this fallback is used.

Security Test Coverage:

Added TestTokenStore_SessionIsolation which explicitly verifies:

  • User 1's session cannot access User 2's tokens
  • User 2's session cannot access User 1's tokens
  • GetByIssuer() respects session boundaries
  • Exact token count validation (one per user)

Security Fixes Applied

  1. Session ID Extraction: Uses server.ClientSessionFromContext(ctx) to extract the mcp-go library's unique session ID
  2. Security Headers on HTML Responses: Added comprehensive security headers to OAuth callback pages:
    • X-Content-Type-Options: nosniff
    • X-Frame-Options: DENY
    • Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'
    • Referrer-Policy: no-referrer
    • Cache-Control: no-store, no-cache, must-revalidate
  3. Metadata Cache TTL: Reduced from 1 hour to 30 minutes for faster key rotation updates
  4. TLS Documentation: Added security comments about TLS requirements for production
  5. In-Memory Storage Documentation: Enhanced doc.go with security notes about token storage
  6. Session ID Truncation: Logs now show truncated session IDs (first 8 chars) to prevent exposure
  7. Token Refresh Monitoring: INFO-level logging for token refresh with duration metrics
  8. Rate Limiting Documentation: Added rate limiting recommendations to Helm values.yaml and README

Code Quality Improvements (Post-Review)

Applied the following improvements based on Go code review:

  1. DRY: Removed duplicate AuthChallenge type from agent package - now uses api.AuthChallenge
  2. Thread Safety: Added mutex protection to metadataCache in OAuth client to prevent race conditions
  3. KISS: Removed redundant pendingFlows from Handler and pendingVerifiers from Manager
  4. Bug Fix: Store issuer and code verifier with OAuth state to fix empty issuer bug in callback handler
  5. Consolidation: Simplified callback handling by keeping all state data in the StateStore
  6. DRY (Latest): Extracted duplicated checkForAuthRequiredError and parseAuthInfoFromError from SSEClient and StreamableHTTPClient to shared helper functions in mcpserver/types.go
  7. DRY (Latest): Replaced duplicate AuthInfo and AuthRequiredError types in aggregator/types.go with type alias to mcpserver.AuthInfo, reducing code duplication by ~90 lines

Latest Code Quality Improvements (Post-Security-Review)

  1. DRY: Extracted tokenToAPIToken helper function to remove duplicate Token → OAuthToken conversion in api_adapter.go
  2. Constants: Defined tokenExpiryMargin constant (30s) to eliminate magic numbers in token expiration checks
  3. Embedded Templates: Moved HTML templates to internal/oauth/templates/ using Go's embed package for better maintainability and syntax highlighting
  4. Singleflight: Added singleflight.Group to fetchMetadata to prevent concurrent duplicate fetches for the same issuer (fixes TOCTOU race condition)
  5. Security: OAuth error responses now use generic messages instead of forwarding error_description from OAuth providers (prevents information disclosure)
  6. API Simplification: Simplified StateStore.GenerateState to return only encodedState (the nonce is embedded within and callers don't need it separately)

Code Review Improvements (Latest Commit)

  1. Test Coverage: Increased OAuth package test coverage from 53% to 82%+ (above 80% threshold)
    • Added comprehensive tests for api_adapter.go (was 0%)
    • Added tests for token/state store cleanup functions
    • Added manager method tests for nil-safety and edge cases
  2. DRY: Use strings.TrimSuffix consistently in config/types.go (removed custom trimTrailingSlash function)
  3. Constants: Extracted httpClientTimeout (30s) and softwareVersion ("1.0.0") constants in OAuth client
  4. Code Cleanup: Fixed unused variable assignment in server.go GetToolsWithStatus
  5. Refactor: Removed time.Sleep from OnToolsUpdated - goroutine scheduling provides sufficient separation
  6. Documentation: Added goroutine lifecycle requirements to TokenStore and StateStore godocs (callers MUST call Stop())
  7. Session ID Logging: Added truncateSessionID helper to truncate session IDs to first 8 chars in all log output
  8. Token Refresh Logging: Upgraded to INFO level with duration metrics for operations monitoring

Unit Tests

Comprehensive unit tests added for the internal/oauth package (82%+ coverage):

  • token_store_test.go: Token storage, retrieval, expiration, SSO lookup by issuer, deletion, cleanup, session isolation security
  • state_store_test.go: State generation, validation, CSRF protection, code verifier security, state expiration, cleanup
  • www_authenticate_test.go: WWW-Authenticate header parsing, OAuth challenge detection, issuer extraction
  • client_test.go: Redirect URI generation, PKCE generation, token storage/retrieval, metadata fetching with caching, code exchange, token refresh
  • handler_test.go: OAuth callback handling, error cases, success/error page rendering, parameter validation, security headers verification, CIMD serving
  • manager_test.go: Manager lifecycle, server registration, nil-safety, configuration handling, GetToken flows
  • api_adapter_test.go: Adapter methods, token conversion, registration delegation

Architecture

The OAuth proxy follows the architecture outlined in docs/explanation/decisions/004-oauth-proxy.md:

  1. Muster Server acts as the OAuth client and proxy
  2. Tokens are stored server-side - never sent to the Agent
  3. SSO is supported through token reuse by (SessionID, Issuer, Scope)
  4. PKCE is used for enhanced security
  5. User authenticates via their browser
  6. Synthetic Tool Placeholder pattern handles OAuth required before MCP handshake
  7. Session isolation ensures multi-user security

Testing

  • All 135 existing BDD scenarios pass
  • Unit tests pass with race detection enabled (82%+ coverage for OAuth package)
  • Helm chart lints successfully
  • Code formatted with goimports and go fmt
  • Integration tested with mcp-kubernetes on Gazelle cluster (401 detection and synthetic tool registration verified)

Related

Implements #144

Security Review Checklist (All Passed)

Check Status
PKCE implementation PASS
State parameter CSRF protection PASS
State single-use (deleted after validation) PASS
State expiration (10 min) PASS
Code verifier not exposed in state PASS
XSS prevention (HTML escaping) PASS
Security headers on responses PASS
Session isolation PASS
No token logging PASS
Error message sanitization PASS
HTTPS requirement documented PASS
Metadata cache TTL PASS (30 min)
HTTP client timeout PASS (30 sec)
Thread-safe stores PASS
Background cleanup goroutines PASS
Proper Stop() cleanup PASS
Session ID truncation in logs PASS
Token refresh monitoring PASS
Rate limiting documented PASS

@teemow teemow requested a review from a team as a code owner December 20, 2025 14:55
…ents

- Remove duplicate AuthChallenge type from agent, use api.AuthChallenge instead
- Add mutex protection to metadataCache in oauth/client.go for thread safety
- Remove redundant pendingFlows from Handler and pendingVerifiers from Manager
- Store issuer and code verifier with OAuth state to fix empty issuer bug
- Simplify callback handling by consolidating state management
This commit adds unit tests for the internal/oauth package, covering:

- token_store_test.go: Tests for token storage, retrieval, expiration,
  SSO-based lookup by issuer, deletion, and cleanup
- state_store_test.go: Tests for OAuth state generation, validation,
  CSRF protection, code verifier security, and state expiration
- www_authenticate_test.go: Tests for parsing WWW-Authenticate headers,
  identifying OAuth challenges, and extracting issuer information
- client_test.go: Tests for redirect URI generation, PKCE generation,
  token storage/retrieval, and OAuth metadata fetching with caching
- handler_test.go: Tests for OAuth callback handling, error cases,
  success/error page rendering, and missing parameter validation
- manager_test.go: Tests for manager lifecycle, server registration,
  nil-safety, and configuration handling

All tests pass with race detection enabled.
Security improvements to the OAuth proxy implementation:

1. XSS Prevention: Added html.EscapeString() to sanitize serverName and
   message parameters before embedding them in HTML pages (success/error).

2. Information Disclosure: Sanitized error messages from token exchange
   and refresh operations to avoid exposing sensitive response bodies.
   Full errors are logged at debug level for troubleshooting.

3. Metadata Cache TTL: Added 1-hour TTL to OAuth metadata cache to ensure
   stale metadata is periodically refreshed from issuers.
…shake auth

This implements the Synthetic Tool Placeholder pattern to solve the
chicken-and-egg problem where OAuth authentication is required before
the MCP protocol handshake can complete.

Changes:
- Add AuthRequiredError type in aggregator/types.go and mcpserver/types.go
- Add ServerStatus enum with StatusAuthRequired for tracking auth state
- Detect 401 errors in StreamableHTTPClient and SSEClient Initialize()
- Parse WWW-Authenticate headers to extract OAuth information
- Add RegisterPendingAuth to registry for servers requiring auth
- Create synthetic authenticate_<server> tools for auth-required servers
- Handle synthetic auth tool calls by creating OAuth challenges
- Add UpgradeToConnected to registry for post-auth upgrade
- Update addNewItems and collectItemsFromServers to include auth tools
- Add MCPServerAuthRequired event reason and template

The flow:
1. Server returns 401 during Initialize() -> AuthRequiredError
2. Server registered in StatusAuthRequired with synthetic auth tool
3. User calls authenticate_<server> tool
4. Tool creates OAuth challenge with sign-in link
5. User authenticates in browser -> token stored
6. User retries -> server upgraded to connected status

Addresses issue #144 comment about OAuth before MCP handshake
- Extracted checkForAuthRequiredError and parseAuthInfoFromError from
  SSEClient and StreamableHTTPClient to shared functions in mcpserver/types.go
- Replaced duplicate AuthInfo/AuthRequiredError types in aggregator/types.go
  with type alias to mcpserver.AuthInfo
- Reduces code duplication by ~90 lines while maintaining functionality

This follows DRY/KISS principles by centralizing the auth error detection
logic that was previously duplicated in both MCP client implementations.
- Extract session ID from MCP ClientSession context (fixes shared session issue)
- Add security headers to OAuth HTML responses (X-Frame-Options, CSP, etc.)
- Reduce metadata cache TTL from 1h to 30min for faster key rotation updates
- Add comprehensive tests for security headers
…tion

- Add comprehensive TestTokenStore_SessionIsolation test to verify users cannot access each other's tokens
- Improve getSessionIDFromContext documentation with security implications
- Add debug logging when falling back to default session (stdio mode only)
- The mcp-go library provides unique UUID session IDs for SSE and Streamable HTTP connections
- Add muster.oauth section to values.yaml with enabled, publicUrl, clientId, callbackPath options
- Update values.schema.json with OAuth schema validation
- Update deployment.yaml to pass OAuth flags when enabled
- Update README.md with OAuth configuration documentation
- Add RegisterServerPendingAuth to AggregatorHandler interface
- Orchestrator now catches AuthRequiredError and registers servers in pending auth state
- Event handler skips deregistration for servers in auth_required state
- Fix nil pointer in registry Deregister when client is nil (auth_required servers)
- Add isServerAuthRequired callback to event handler
- Update api_adapter to implement RegisterServerPendingAuth
- Add discoverAuthorizationServer() to fetch issuer from /.well-known/oauth-protected-resource
- Fix URL display bug in agent logger when text contains % (URL-encoded chars)
- Print strings literally when no format args are provided
- Extract tokenToAPIToken helper to remove duplicate Token -> OAuthToken conversion
- Define tokenExpiryMargin constant to eliminate magic numbers
- Use embed package for HTML templates (success.html, error.html) for better maintainability
- Use singleflight.Group to prevent concurrent metadata fetches for same issuer
- Sanitize OAuth error messages to prevent leaking sensitive information
- Simplify StateStore.GenerateState to return only encodedState (nonce is embedded)
- Update tests for new API signatures and security improvements
- Document CIMD redirect URI limitation in Helm chart README
- Add security notes to values.yaml about TLS requirements
- Add warning log for default session (stdio) token storage
- Add TLS assumption comment to metadata cache
- Enhance doc.go with comprehensive security documentation
  - Token storage (in-memory only, no persistence)
  - Session isolation details
  - TLS requirements for production
Muster can now serve its own Client ID Metadata Document (CIMD) at
/.well-known/oauth-client.json, eliminating the need for external
static file hosting.

Changes:
- Add CIMD serving endpoint in OAuth handler
- Auto-derive clientId from publicUrl when not explicitly set
- Mount CIMD endpoint in aggregator HTTP mux when self-hosting
- Add OAuthConfig methods: GetEffectiveClientID, ShouldServeCIMD,
  GetCIMDPath, GetRedirectURI
- Update helm values with new oauth.cimdPath option and docs
- Add comprehensive tests for new functionality

Users can now simply set oauth.publicUrl and muster will auto-generate
and serve the CIMD with correct redirect_uris for their deployment.
- Increase test coverage for OAuth package from 53% to 82%+
- Add tests for api_adapter, manager, client, stores
- Fix DRY violation: use strings.TrimSuffix consistently in config/types.go
- Extract HTTP timeout to named constant (httpClientTimeout)
- Extract software version to constant for CIMD handler
- Fix unused variable assignment in server.go GetToolsWithStatus
- Remove sleep in OnToolsUpdated, use goroutine scheduling instead
- Document goroutine lifecycle requirements in TokenStore/StateStore

Addresses code review recommendations.
- Add session ID truncation in logs to prevent full session IDs from
  appearing in debug logs (first 8 chars + ...)
- Upgrade token refresh logging to INFO level for operations monitoring
  with duration metrics for performance tracking
- Add comprehensive TLS/HTTPS requirements documentation in doc.go
- Add rate limiting recommendations for OAuth callback endpoint in
  doc.go and Helm values/README
- Document logging security practices (token truncation, no access
  tokens logged)
- Add self-hosted CIMD documentation in Helm README
@teemow teemow merged commit 1651341 into main Dec 21, 2025
5 checks passed
@teemow teemow deleted the feat/issue-144-oauth-proxy branch December 21, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant