Skip to content

Diagnostics: Add INFO-level logging to SSO session init paths#522

Merged
teemow merged 3 commits intomainfrom
diagnostics/sso-session-init-logging
Mar 18, 2026
Merged

Diagnostics: Add INFO-level logging to SSO session init paths#522
teemow merged 3 commits intomainfrom
diagnostics/sso-session-init-logging

Conversation

@teemow
Copy link
Member

@teemow teemow commented Mar 18, 2026

Summary

Adds diagnostic logging and fixes for log noise from stale Valkey sessions after pod restarts.

Diagnostic Logging (original)

  • Adds INFO-level diagnostic logging to the three SSO session init paths (initSSOForSession, SessionCreationHandler, onAuthenticated) to help debug why certain clients (e.g. Claude) fail to trigger downstream SSO connections after OAuth login completes.
  • initSSOForSession: logs all inputs on entry and a filter breakdown (total servers, pending count, skip reasons).
  • SessionCreationHandler: logs when fired with userID, familyID, and ID token presence.
  • onAuthenticated: logs callback invocation with sessionID, userID, authAlive state, and ID token presence.

Fix A: Gate SSO init on valid ID token (closes #520)

  • In onAuthenticated, after authStore.Touch() returns false, checks whether the ID token is empty before calling initSSOForSession().
  • After a pod restart, Valkey may still have valid access tokens but no ID token in the OAuth store. Without this gate, downstream connections are established that immediately start spamming 403 errors.
  • Prevents both the local server 403 spam and the remote server 403 spam that never self-heals.

Fix B: Rate-limit headerFunc warning (closes #520)

  • Extracts the inline headerFunc closure from EstablishConnectionWithTokenForwarding into makeTokenForwardingHeaderFunc for testability.
  • Rate-limits the WARN log when the token lookup fails to at most once per minute per session+server pair. Subsequent failures within the window are logged at DEBUG.
  • When the token recovers after a failure period, logs an INFO message.

Refactor: Code review cleanups

  • Consistent defer in registry: Use defer r.nameMu.Unlock() uniformly in ExposedToolName/ExposedPromptName (was inconsistent with ExposedResourceURI).
  • Propagate AsBool errors: Valkey Touch/Exists/IsAuthenticated now return the error from AsBool() instead of silently returning (false, nil).
  • DRY up StoreToken logic: Extracted storeIDTokenForSSO helper to eliminate duplicate token-storing code between SetSessionCreationHandler and SetTokenRefreshHandler.
  • Single-responsibility createStores: Moved encryptor creation into createStores/storeBundle, removing the double type-assertion from NewAggregatorServer.
  • Remove dead code: Removed unused purgeServerNames method (defined but never called).
  • Idiomatic test assertions: Replaced assert.True(t, strings.Contains(...)) with assert.Contains/assert.NotContains for better failure messages.

Test plan

  • All unit tests pass (make test)
  • All 166 BDD scenarios pass (muster test --parallel 50)
  • TestOnAuthenticated_SkipsSSO_WhenNoIDToken verifies Fix A guard logic
  • TestHeaderFunc_RateLimitsWarning verifies Fix B rate-limiting and recovery logging
  • Deploy and verify new logs appear in muster pod output during login flows
  • Confirm filter breakdown log reveals which servers are skipped and why

Add diagnostic logging to three critical SSO code paths to help debug
why certain clients (e.g. Claude) fail to trigger SSO connections after
OAuth login completes. Logs inputs, filter breakdown, and callback state.

Made-with: Cursor
@teemow teemow requested a review from a team as a code owner March 18, 2026 08:32
teemow added 2 commits March 18, 2026 09:57
…gs (#520)

After a pod restart, stale Valkey sessions trigger massive log spam because
onAuthenticated fires for sessions without a usable ID token. This commit:

- Skips initSSOForSession when the ID token is empty (Fix A), preventing
  doomed downstream connections that spam 403 errors.
- Extracts headerFunc into makeTokenForwardingHeaderFunc and rate-limits
  the WARN log to once per minute per session+server pair (Fix B), logging
  subsequent failures at DEBUG instead.
- Adds unit tests for both fixes.

Made-with: Cursor
- Use defer consistently in ExposedToolName/ExposedPromptName
- Return errors from AsBool() in Valkey Touch/Exists instead of swallowing
- Extract storeIDTokenForSSO helper to DRY up session creation/refresh handlers
- Move encryptor creation into createStores for single-responsibility
- Remove unused purgeServerNames method
- Use assert.Contains/NotContains in headerFunc rate-limit tests

Made-with: Cursor
@teemow teemow merged commit ad21aeb into main Mar 18, 2026
5 of 6 checks passed
@teemow teemow deleted the diagnostics/sso-session-init-logging branch March 18, 2026 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce log noise from stale Valkey sessions after pod restart

1 participant