RFC: Graceful Degradation for Multi-Upstream OAuth in vMCP#72
Merged
Conversation
jhrozek
reviewed
May 14, 2026
Fold the partial-auth toggle into a single per-upstream `optional` flag (default false, preserves today's all-or-nothing default), reuse the existing `UpstreamTokens` row with `Skipped`/`SkippedReason` fields instead of a parallel storage interface, drop the invented restart-all branch (every `/authorize` already mints a fresh SessionID), and move backend filtering from list-method time to session-creation time — vMCP freezes the capability snapshot at MCP initialize, so re-auth recovery requires a new MCP session via reconnect with the new JWT. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions Move the per-session filter into the pre-init backend filter loop of makeBaseSession so unauthorized backends never run initOneBackend at all (no connection, handshake, or capability listing), rather than filtering post-aggregation. Drop the invented dispatcher gate — the existing strategy ErrUpstreamTokenNotFound path already covers out-of-band revocation. Remove the standalone refresh-token-expiry and fresh-MCP-session subsections (covered concisely elsewhere or pre-existing behavior). Fold Phase 2's re-authorization pinning tests into Phase 1 and renumber subsequent phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an RFC proposing opt-in partial-completion semantics for the
embedded authorization server's multi-upstream OAuth chain in
VirtualMCPServer.
Related issue: stacklok/toolhive#5162
Why
Today the embedded auth server's multi-upstream chain is all-or-nothing:
one upstream IdP outage or one declined consent screen invalidates every
collected token and locks the user out of every backend on the vMCP —
including backends that have nothing to do with the failed upstream.
Operators aggregating backends across multiple SaaS IdPs (e.g. github +
slack + google) see the whole vMCP appear down for what should be a
single-provider problem.
Key design decisions
optionalflag (defaultfalse) onUpstreamProviderConfig. Marking any provideroptional: trueoptsthe deployment into partial completion for that provider; leaving
every provider at the default reproduces today's all-or-nothing
behavior exactly. No top-level mode toggle.
authzConfig.inline.primaryUpstreamProvider— is always required;admission webhook rejects configs that set
optional: trueon it.required upstream has a token; optional upstreams that error or are
declined are recorded as session-scoped skip rows.
UpstreamTokensstorage row. Twonew scalar fields (
Skipped bool,SkippedReason string) on theexisting row; the
UpstreamTokenStorageinterface is unchanged. Theexisting
InProcessService.GetAllValidTokens/GetValidTokenfilter skip rows out of the live-token map, so
identity.UpstreamTokensconsumers behave exactly as today when no rows are skipped.
/authorizerestart-all branch. The existing handler alreadymints a fresh
SessionIDper call, so re-running/authorizeisstructurally a new session — prior session rows age out under their
existing TTL. Per-upstream retry is explicitly rejected — identity-
binding hazards outweigh the round-trip savings.
the per-session capability set at MCP
initializetime, so thefilter hooks into the pre-init backend filter loop of
makeBaseSession: backends whose required upstream is missing fromidentity.UpstreamTokensnever runinitOneBackendat all — no HTTPconnection, no handshake, no capability listing, no entry into the
routing table or per-session SDK tool store.
initializeand call is already caught by the existing per-requestidentity.UpstreamTokensre-hydration plus each backend authstrategy's
ErrUpstreamTokenNotFoundpath./authorizeproduces a newJWT with a new
tsid; reconnecting to vMCP with the new bearertriggers a fresh
initializeand re-aggregates capabilities.Out of scope
Per-upstream retry, MCP-protocol-level signaling of filtered state,
silent backend dropping on RT expiry, proactive token introspection,
dynamic upstream addition.
Notes for reviewers
and filtering can land and be tested before the CRD surface
solidifies.
metric cardinality, status conditions) were resolved during drafting;
see the body for the decisions.
source line numbers); file-path references are plain text so they
remain readable when the RFC is moved to its destination repo.
🤖 Generated with Claude Code