Allow operators to inject baseline scopes into DCR registrations#5233
Conversation
d79a2dd to
81deab2
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5233 +/- ##
==========================================
+ Coverage 68.02% 68.06% +0.04%
==========================================
Files 616 617 +1
Lines 63005 63067 +62
==========================================
+ Hits 42857 42926 +69
+ Misses 16945 16938 -7
Partials 3203 3203 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-agent review summary
Five specialist reviewers (OAuth/security, Kubernetes operator/CRD, Go correctness, test coverage, general code quality) plus a consensus-scoring pass. Codex cross-review was skipped because the codex CLI isn't installed locally.
Overall: COMMENT — no blockers, nice piece of work. The defense-in-depth design, the three-layer validation, and the explicit regression test for anthropics/claude-code#4540 are all strong. The notes below cluster into trust-boundary documentation, observability, defense-in-depth completeness, CRD validation hardening, and operator UX — each addressable without rework.
Consensus summary
| # | Finding | Severity | Score |
|---|---|---|---|
| F1 | Operator-configured baseline silently widens every DCR client's registered scope set — trust boundary only weakly documented | MEDIUM | 8/10 |
| F2 | WARN log includes attacker-controlled client_name (log-injection risk) |
LOW | 7/10 |
| F3 | WARN log fires per registration when expansion is the documented happy path | LOW | 7/10 |
| F4 | BaselineClientScopes slice flows by reference through all layers — copy-before-mutate rule |
LOW | 7/10 |
| F5 | No direct unit test for the third defense-in-depth gate (validateParams) |
MEDIUM | 8/10 |
| F6 | Negative integration subtest's 400 branch doesn't assert invalid_scope |
MEDIUM | 7/10 |
| F7 | No operator status condition for baseline misconfig — surfaces only as CrashLoopBackOff | MEDIUM | 7/10 |
| F8 | CRD field has no items:MinLength=1 — empty/whitespace entries pass admission |
LOW | 8/10 |
| F9 | No user-facing docs / Helm examples for the new operator knob | MEDIUM | 7/10 |
| F10 | ValidateScopeSubset doc enumerates 2 of 3 defense-in-depth gates |
LOW | 7/10 |
| F11 | RegisterClientHandler response comment can be misread as implying re-validation of unioned set |
LOW | 7/10 |
| F12 | Config.Validate() (sibling helper at line 472) doesn't run the subset check — three-layer claim slightly overstated |
LOW | 7/10 |
Detailed findings are inline. The dropped findings (consensus < 7, mostly stylistic / single-agent informational) are listed in the run log and not surfaced here to keep the review focused.
Holistic assessment
- Backwards compatibility: safe. Empty/nil
BaselineClientScopesis bit-exact no-op vialen(...) > 0guard, and the field is+optionalwithomitempty. - Plumbing symmetry: the CRD →
controllerutil→RunConfig→Config→AuthorizationServerParams→AuthorizationServerConfigflow exactly mirrorsScopesSupportedandAllowedAudiences. Easy to follow. - CRD generation: all four chart copies (
files/×2,templates/×2) anddocs/operator/crd-api.mdare consistent; deepcopy is regenerated. - Integration regression test: excellent — explicit reference to
anthropics/claude-code#4540and end-to-end coverage of the bug-trigger pattern. The negative subtest catches the obvious privilege-escalation regression; tightening per F6 closes the last gap. - Test coverage: thorough for
unionScopes,ValidateScopeSubset,RunConfig.Validate,Config.applyDefaults, and the DCR handler path. One gate (validateParams) lacks a direct test — see F5. - Open design question (non-blocking): this baseline mechanism applies globally to every DCR registration on the AS instance. CRD docs warn operators to keep it narrow but the implementation has no per-client filter (e.g. by
software_id). F1 flags this as documentation, not code.
Review consulted: OAuth/DCR/Security, K8s operator/CRD, Go correctness, test coverage, and general code quality reviewers.
🤖 Generated with Claude Code
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
|
I think this needs a small fix before merging if we plan to release The new baselineClientScopes:
- offline_accessfail startup with Suggested fix: validate against the effective/defaulted scope set, or have the operator populate default |
Some DCR clients narrow the scope field at /oauth/register but later request additional scopes at /oauth/authorize, getting rejected with invalid_scope. RFC 7591 §3.1.1 explicitly permits the AS to override the registered scope, so let operators declare a baseline set that the embedded auth server unions into every DCR registration. This commit only adds the CRD field. The plumbing through RunConfig, the runner, the server provider, and the DCR handler comes in subsequent commits. Refs #5224
Add the BaselineClientScopes field on the on-disk RunConfig and copy it from the CRD's EmbeddedAuthServerConfig in the operator-side builder. The runtime Config and the DCR handler are wired in subsequent commits; startup validation that the baseline is a subset of ScopesSupported lands with the next commit. Refs #5224
If an operator configures baseline_client_scopes with a value missing from scopes_supported, the embedded DCR handler would later register clients with a scope the server does not advertise, and fosite would reject those clients at /oauth/authorize with invalid_scope. Catching the misconfiguration at startup gives operators a clear error instead of debugging silent rejections in production. Add RunConfig.Validate() with a subset check, and call it from the runner entry point before any secret resolution or HTTP wiring. errors.Join wraps the (currently single) sub-check so future RunConfig invariants compose without dropping existing checks. Refs #5224
Add BaselineClientScopes to the runtime Config struct and copy it from RunConfig in the runner's resolvedCfg block. The DCR handler needs the baseline at request time, so it must travel through the runtime Config the same way ScopesSupported does. The field is populated but not yet consumed by any downstream code; the next commit plumbs it from Config into the AuthorizationServerConfig that the DCR handler reads. Refs #5224
Add BaselineClientScopes to AuthorizationServerParams and AuthorizationServerConfig and copy it through Config -> Params -> ASConfig so the DCR handler can read it via h.config in the next commit. Defense-in-depth: validateParams re-checks the baseline-subset-of- supported invariant. RunConfig.Validate already enforces it for the runner-driven path, but a direct caller of authserver.New(Config) bypasses that check. Validating again at NewAuthorizationServerConfig catches the misconfiguration at the deepest layer where both slices are simultaneously available. Refs #5224
The DCR registration handler will use this in the next commit to union an operator-configured scope baseline into every newly registered client's scope set. Pulling the union into a separate helper keeps the handler readable and lets the order/dedup contract be unit-tested in isolation. Order matters: requested scopes appear first (in input order), then baseline scopes not already present. The handler uses slices.Equal(union, requested) to decide whether to log a WARN, so stable order is part of the contract. Refs #5224
This is the user-visible behavior change: between scope validation and client construction, the DCR handler now unions the operator-configured BaselineClientScopes into the registered client's scope set. RFC 7591 §3.2.1 permits the AS to replace requested client metadata, which is what fixes the Claude Code bug (anthropics/claude-code#4540): a client whose DCR request narrows the scope field can still request the baseline at /oauth/authorize without invalid_scope rejection. WARN logs only when the union actually expands the requested set (slices.Equal short-circuits the no-op path). The DCR response at the bottom of the handler already echoes the effective scope set back to the client, so clients see exactly what they got. Refs #5224
Cover the four contract paths in TestRegisterClientHandler_- BaselineClientScopes: empty client scope unions with baseline, baseline-as-subset of requested produces no expansion, partial overlap appends only non-overlapping scopes, disjoint baseline expands the registered set, and nil baseline preserves existing behavior. DoAndReturn captures the fosite client passed to RegisterClient so the test asserts not only that the DCR response echoes the expected scope set but also that the same set actually reaches storage. The disjoint case is the canonical regression test for anthropics/claude-code#4540. Refs #5224
TestRunConfigValidate covers seven contract paths: nil and empty baselines pass with non-empty scopes_supported, single and multi-element baseline subsets pass, a baseline scope missing from scopes_supported is reported with both the offending scope name and the "not in scopes_supported" phrase, a non-nil baseline with nil scopes_supported fails closed, and the helper reports a missing scope when multiple are missing. Refs #5224
This is the canonical regression test for anthropics/claude-code#4540 (Claude Code DCR-registers with a narrowed scope but later requests the full set at /oauth/authorize). Three sequential subtests on a shared client_id: 1. DCR-register with scope="openid" against a server configured with BaselineClientScopes=["offline_access"]; assert the response's scope field echoes "openid offline_access". 2. /authorize with scope="openid offline_access" against the same client; assert HTTP 302 redirect to upstream (not 400 invalid_scope, the pre-fix behavior). 3. /authorize with a scope outside scopes_supported; assert fosite rejects with invalid_scope. Guards against silent privilege escalation if BaselineClientScopes ever drifts to bypass the ScopesSupported boundary. WithBaselineClientScopes is added to the test helpers alongside the existing WithScopesSupported option. Refs #5224
The new BaselineClientScopes field on authserver.RunConfig is referenced (transitively) by the API's swagger surface, so the CI docs check requires the regenerated swagger.json/yaml/docs.go. Mechanical regen via task docs; no behavioral or hand-written change. Refs #5224
Trey's review (F1) noted the trust boundary is only weakly defended in docs. Two changes address this: The CRD field doc gains an explicit privilege-escalation example so operators writing YAML see the warning, not just someone reading the handler source. A baseline of "admin:read" is now called out by name as something they should never configure. NewAuthorizationServerConfig emits a one-time INFO log when the baseline is non-empty, so operators can verify in their logs what their DCR registrations will be auto-granted. This matches the "INFO sparingly — only for long-running operations" rule for a process-lifetime event. Refs #5224
Per .claude/rules/go-style.md, Warn is for non-fatal issues (deprecations, fallback, cleanup failures), and Info is for long-running events. Baseline-driven expansion of a DCR scope set is the intended behavior whenever baseline_client_scopes is configured, not a problem to flag — so per-registration audit belongs at Debug. Operator-visible signal that the baseline is in effect already comes from the one-time Info log at NewAuthorizationServerConfig startup. Refs #5224
The DCR handler reads BaselineClientScopes, ScopesSupported, and AllowedAudiences via h.config on every request. They all flow by reference from cfg through RunConfig → Config → AuthorizationServerParams → AuthorizationServerConfig. Nothing mutates them today, but the runner is the natural boundary at which operator-supplied input becomes runtime state, and slices.Clone costs nothing for the bounded slice sizes involved. Cloning once here lets every downstream stage share by reference safely against a future hot-reload path or test code that retains the original. Refs #5224
NewAuthorizationServerConfig has its own subset check (validateParams → ValidateScopeSubset) so a direct caller that bypasses RunConfig.Validate is still rejected when BaselineClientScopes contains a scope absent from ScopesSupported. The case lived in design only — adding it to the existing TestNewAuthorizationServerConfig_InvalidConfig table protects the gate against silent regression in a future refactor. Refs #5224
Symmetric to the redirect-with-error branch, which already asserts on \`invalid_scope\` in the query string: if fosite ever returns 400 directly (rather than redirecting with error), the test must still assert the error code is invalid_scope and not something else (server_error, unauthorized_client, etc.). Otherwise a future fosite upgrade could swap in a different code and the privilege-escalation regression would silently slip past this test. Reads the body up front so the existing structure works without teeing around a deferred drain. Refs #5224
A baselineClientScopes value of [""] or ["foo bar"] would slip past admission and only fail at the runtime subset check inside NewEmbeddedAuthServer — confusing error, late feedback. Two kubebuilder item-level validations move the rejection forward to the apiserver: - items:MinLength=1 — no empty strings - items:Pattern=^[\x21\x23-\x5B\x5D-\x7E]+$ — the RFC 6749 §3.3 scope-token charset (printable ASCII excluding space, ", and \) Operators now get a clean "spec is invalid" from the apiserver instead of a CrashLoopBackOff with a subset-check log line. Refs #5224
The previous doc enumerated two call sites and missed a third (Config.applyDefaults's empty-supported-with-non-empty-baseline check, which is a related invariant even though it doesn't go through this helper). Per .claude/rules/go-style.md "Keep Comments Synchronized With Code", call-site lists rot when call sites move or new ones appear. Describing what the function does and why callers share it is enough for the reader. Refs #5224
The previous wording "validated against ScopesSupported and, if configured, unioned with BaselineClientScopes" could be read as "the unioned set is validated against ScopesSupported." It isn't — ValidateScopes runs against the client-supplied set BEFORE the union, and the union is trusted by construction because BaselineClientScopes is itself a subset of ScopesSupported per startup-time validation. Rephrased to make the ordering and the "not re-validated here" intent explicit. Refs #5224
A caller that constructs the runtime Config directly (bypassing YAML loading and therefore bypassing RunConfig.Validate) was only caught by the inner validateParams in the provider layer. That works but gives a deeper call stack at the failure site. Adding the check to Config.Validate closes the gap and makes the three-gate defense in depth (RunConfig.Validate → Config.Validate → validateParams) fully symmetric on the subset invariant. New TestConfigValidate case asserts the wired-up check fires with the same error message that the other gates produce. Refs #5224
The natural first-use config
baselineClientScopes:
- offline_access
was failing at startup because RunConfig.Validate (and Config.Validate
after F12) checked baselineClientScopes against the raw scopesSupported
slice, which the operator passes as nil to mean "use server defaults"
(documented at controllerutil/authserver.go:461 and
vmcpconfig/converter.go:404-415 deriveScopesSupported). applyDefaults
only substitutes registration.DefaultScopes later, after validation
has already rejected the config.
Both validation gates now treat empty scopesSupported as equivalent to
registration.DefaultScopes for the subset check — without mutating the
struct, so wire-format stays faithful to operator input. The
applyDefaults inner guard that rejected this case is now dead and
removed; defaulting symmetry with HMACSecrets and KeyProvider is
restored. Doc comments on both RunConfig.BaselineClientScopes and
Config.BaselineClientScopes describe the new contract.
Tests flip the previously-rejecting fixtures to the documented happy
path and add coverage for the still-rejecting case (nil supported with
a baseline scope outside DefaultScopes).
Refs #5224
790f9e7 to
84a4eec
Compare
|
Good catch, fixed in b1d7ca33e. Both validate gates now treat an empty |
The previous commit reworded RunConfig.BaselineClientScopes' Go doc
comment to describe the new "nil scopesSupported uses DefaultScopes
for the subset check" contract. swag init renders that doc comment
into docs/server/swagger.{json,yaml,docs.go}, so the verify-swagger
CI check fails until the artifacts are regenerated to match.
Mechanical regen via swag init; no behavioral or hand-written change.
Refs #5224
Summary
Some MCP clients (notably Claude Code) DCR-register with a narrowed
scopefield at/oauth/registerbut later request a wider set at/oauth/authorize. fosite rejects those requests withinvalid_scopebecause the registered client's scope set does not include the requested scopes. RFC 7591 §3.2.1 explicitly permits the AS to replace requested client metadata; the embedded auth server now uses that to expand the registered scope set.EmbeddedAuthServerConfig.baselineClientScopesto the CRD — operators declare scopes that must be in every registered client's scope set.RunConfig→ runtimeConfig→AuthorizationServerConfigso the DCR handler can read it at request time.slices.Equal).baseline ⊆ scopesSupportedat three layers (CRD-loadedRunConfig,applyDefaults,validateParams) so misconfiguration fails loudly at startup.scopeParameter in Dynamic Client Registration and Authorization Requests anthropics/claude-code#4540) end-to-end.Refs #5224
Type of change
Test plan
task test)task test-e2e) — covered by the new integration test intest/integration/authserver/task lint-fix)Manual: end-to-end deploy still pending — opening as draft for that.
API Compatibility
v1beta1API, OR theapi-break-allowedlabel is applied and the migration guidance is described above.baselineClientScopesis a new optional field withomitemptyand a nil-safe deepcopy. Existing CRs without the field decode asnil; the validator early-returns and behavior is unchanged.Does this introduce a user-facing change?
Yes — operators of `MCPExternalAuthConfig` and `VirtualMCPServer` resources gain a new optional `spec.embedded.baselineClientScopes` field. When set, every DCR-registered client's scope set is expanded to include these scopes regardless of what the client requested at `/oauth/register`.
Special notes for reviewers
Large PR Justification
🤖 Generated with Claude Code