Skip to content

Allow operators to inject baseline scopes into DCR registrations#5233

Merged
jhrozek merged 22 commits into
mainfrom
worktree-as-dcr-scopes
May 12, 2026
Merged

Allow operators to inject baseline scopes into DCR registrations#5233
jhrozek merged 22 commits into
mainfrom
worktree-as-dcr-scopes

Conversation

@jhrozek
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek commented May 8, 2026

Summary

Some MCP clients (notably Claude Code) DCR-register with a narrowed scope field at /oauth/register but later request a wider set at /oauth/authorize. fosite rejects those requests with invalid_scope because the registered client's scope set does not include the requested scopes. RFC 7591 §3.2.1 explicitly permits the AS to replace requested client metadata; the embedded auth server now uses that to expand the registered scope set.

  • Add EmbeddedAuthServerConfig.baselineClientScopes to the CRD — operators declare scopes that must be in every registered client's scope set.
  • Plumb the field through RunConfig → runtime ConfigAuthorizationServerConfig so the DCR handler can read it at request time.
  • DCR handler unions the baseline into each new registration; logs a WARN when expansion actually happens (no-op short-circuited via slices.Equal).
  • Validate baseline ⊆ scopesSupported at three layers (CRD-loaded RunConfig, applyDefaults, validateParams) so misconfiguration fails loudly at startup.
  • Integration regression test reproduces the Claude Code repro pattern ([BUG] Missing scope Parameter in Dynamic Client Registration and Authorization Requests anthropics/claude-code#4540) end-to-end.

Refs #5224

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e) — covered by the new integration test in test/integration/authserver/
  • Linting (task lint-fix)
  • Manual testing (describe below)

Manual: end-to-end deploy still pending — opening as draft for that.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

baselineClientScopes is a new optional field with omitempty and a nil-safe deepcopy. Existing CRs without the field decode as nil; the validator early-returns and behavior is unchanged.

Does this introduce a user-facing change?

Yes — operators of `MCPExternalAuthConfig` and `VirtualMCPServer` resources gain a new optional `spec.embedded.baselineClientScopes` field. When set, every DCR-registered client's scope set is expanded to include these scopes regardless of what the client requested at `/oauth/register`.

Special notes for reviewers

  • Defense-in-depth validation runs at three layers; the wire-format-boundary check (`RunConfig.Validate`) is the earliest, the `Config.applyDefaults` guard catches direct callers, and `validateParams` is a final safety net.
  • The `unionScopes` helper preserves requested-scope order; the `slices.Equal` no-op detection in the handler depends on that invariant.
  • `MaxItems=10` on the CRD field bounds operator misuse; we considered an item-level `Pattern` marker and decided runtime validation suffices for v1.

Large PR Justification

  • the CRDs got regenerated

🤖 Generated with Claude Code

@jhrozek jhrozek marked this pull request as ready for review May 11, 2026 15:58
@jhrozek jhrozek force-pushed the worktree-as-dcr-scopes branch from d79a2dd to 81deab2 Compare May 11, 2026 16:19
@jhrozek jhrozek requested a review from tgrunnagle May 11, 2026 16:20
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label May 11, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 94.52055% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.06%. Comparing base (030594a) to head (29a3621).

Files with missing lines Patch % Lines
pkg/authserver/runner/embeddedauthserver.go 60.00% 1 Missing and 1 partial ⚠️
pkg/authserver/server/provider.go 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5233      +/-   ##
==========================================
+ Coverage   68.02%   68.06%   +0.04%     
==========================================
  Files         616      617       +1     
  Lines       63005    63067      +62     
==========================================
+ Hits        42857    42926      +69     
+ Misses      16945    16938       -7     
  Partials     3203     3203              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review summary

Five specialist reviewers (OAuth/security, Kubernetes operator/CRD, Go correctness, test coverage, general code quality) plus a consensus-scoring pass. Codex cross-review was skipped because the codex CLI isn't installed locally.

Overall: COMMENT — no blockers, nice piece of work. The defense-in-depth design, the three-layer validation, and the explicit regression test for anthropics/claude-code#4540 are all strong. The notes below cluster into trust-boundary documentation, observability, defense-in-depth completeness, CRD validation hardening, and operator UX — each addressable without rework.

Consensus summary

# Finding Severity Score
F1 Operator-configured baseline silently widens every DCR client's registered scope set — trust boundary only weakly documented MEDIUM 8/10
F2 WARN log includes attacker-controlled client_name (log-injection risk) LOW 7/10
F3 WARN log fires per registration when expansion is the documented happy path LOW 7/10
F4 BaselineClientScopes slice flows by reference through all layers — copy-before-mutate rule LOW 7/10
F5 No direct unit test for the third defense-in-depth gate (validateParams) MEDIUM 8/10
F6 Negative integration subtest's 400 branch doesn't assert invalid_scope MEDIUM 7/10
F7 No operator status condition for baseline misconfig — surfaces only as CrashLoopBackOff MEDIUM 7/10
F8 CRD field has no items:MinLength=1 — empty/whitespace entries pass admission LOW 8/10
F9 No user-facing docs / Helm examples for the new operator knob MEDIUM 7/10
F10 ValidateScopeSubset doc enumerates 2 of 3 defense-in-depth gates LOW 7/10
F11 RegisterClientHandler response comment can be misread as implying re-validation of unioned set LOW 7/10
F12 Config.Validate() (sibling helper at line 472) doesn't run the subset check — three-layer claim slightly overstated LOW 7/10

Detailed findings are inline. The dropped findings (consensus < 7, mostly stylistic / single-agent informational) are listed in the run log and not surfaced here to keep the review focused.

Holistic assessment

  • Backwards compatibility: safe. Empty/nil BaselineClientScopes is bit-exact no-op via len(...) > 0 guard, and the field is +optional with omitempty.
  • Plumbing symmetry: the CRD → controllerutilRunConfigConfigAuthorizationServerParamsAuthorizationServerConfig flow exactly mirrors ScopesSupported and AllowedAudiences. Easy to follow.
  • CRD generation: all four chart copies (files/×2, templates/×2) and docs/operator/crd-api.md are consistent; deepcopy is regenerated.
  • Integration regression test: excellent — explicit reference to anthropics/claude-code#4540 and end-to-end coverage of the bug-trigger pattern. The negative subtest catches the obvious privilege-escalation regression; tightening per F6 closes the last gap.
  • Test coverage: thorough for unionScopes, ValidateScopeSubset, RunConfig.Validate, Config.applyDefaults, and the DCR handler path. One gate (validateParams) lacks a direct test — see F5.
  • Open design question (non-blocking): this baseline mechanism applies globally to every DCR registration on the AS instance. CRD docs warn operators to keep it narrow but the implementation has no per-client filter (e.g. by software_id). F1 flags this as documentation, not code.

Review consulted: OAuth/DCR/Security, K8s operator/CRD, Go correctness, test coverage, and general code quality reviewers.

🤖 Generated with Claude Code

Comment thread pkg/authserver/server/handlers/dcr.go
Comment thread pkg/authserver/server/handlers/dcr.go
Comment thread pkg/authserver/server/handlers/dcr.go Outdated
Comment thread pkg/authserver/runner/embeddedauthserver.go Outdated
Comment thread pkg/authserver/server/provider.go
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go
Comment thread pkg/authserver/server/registration/dcr.go Outdated
Comment thread pkg/authserver/server/handlers/dcr.go Outdated
Comment thread pkg/authserver/config.go
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 11, 2026
@github-actions github-actions Bot dismissed their stale review May 11, 2026 22:10

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@rdimitrov
Copy link
Copy Markdown
Member

I think this needs a small fix before merging if we plan to release baselineClientScopes as supported behavior.

The new RunConfig.Validate() checks baseline_client_scopes against the raw ScopesSupported slice before defaults are applied. However, the operator paths still intentionally pass nil/empty scopes to mean “use the auth server defaults” (for example from deriveScopesSupported() / oidcConfig.Scopes). That makes a natural config like:

baselineClientScopes:
  - offline_access

fail startup with baseline_client_scopes contains "offline_access" which is not in scopes_supported when OIDC scopes are omitted, even though offline_access is included in registration.DefaultScopes.

Suggested fix: validate against the effective/defaulted scope set, or have the operator populate default ScopesSupported before building the RunConfig when baselineClientScopes is configured. Otherwise the first-use path for the new field is likely to fail unless users also explicitly configure all default scopes.

jhrozek added 9 commits May 12, 2026 11:37
Some DCR clients narrow the scope field at /oauth/register but later
request additional scopes at /oauth/authorize, getting rejected with
invalid_scope. RFC 7591 §3.1.1 explicitly permits the AS to override
the registered scope, so let operators declare a baseline set that the
embedded auth server unions into every DCR registration.

This commit only adds the CRD field. The plumbing through RunConfig,
the runner, the server provider, and the DCR handler comes in
subsequent commits.

Refs #5224
Add the BaselineClientScopes field on the on-disk RunConfig and copy
it from the CRD's EmbeddedAuthServerConfig in the operator-side
builder. The runtime Config and the DCR handler are wired in
subsequent commits; startup validation that the baseline is a subset
of ScopesSupported lands with the next commit.

Refs #5224
If an operator configures baseline_client_scopes with a value missing
from scopes_supported, the embedded DCR handler would later register
clients with a scope the server does not advertise, and fosite would
reject those clients at /oauth/authorize with invalid_scope. Catching
the misconfiguration at startup gives operators a clear error instead
of debugging silent rejections in production.

Add RunConfig.Validate() with a subset check, and call it from the
runner entry point before any secret resolution or HTTP wiring.
errors.Join wraps the (currently single) sub-check so future
RunConfig invariants compose without dropping existing checks.

Refs #5224
Add BaselineClientScopes to the runtime Config struct and copy it
from RunConfig in the runner's resolvedCfg block. The DCR handler
needs the baseline at request time, so it must travel through the
runtime Config the same way ScopesSupported does. The field is
populated but not yet consumed by any downstream code; the next
commit plumbs it from Config into the AuthorizationServerConfig that
the DCR handler reads.

Refs #5224
Add BaselineClientScopes to AuthorizationServerParams and
AuthorizationServerConfig and copy it through Config -> Params ->
ASConfig so the DCR handler can read it via h.config in the next
commit.

Defense-in-depth: validateParams re-checks the baseline-subset-of-
supported invariant. RunConfig.Validate already enforces it for
the runner-driven path, but a direct caller of authserver.New(Config)
bypasses that check. Validating again at NewAuthorizationServerConfig
catches the misconfiguration at the deepest layer where both
slices are simultaneously available.

Refs #5224
The DCR registration handler will use this in the next commit to
union an operator-configured scope baseline into every newly
registered client's scope set. Pulling the union into a separate
helper keeps the handler readable and lets the order/dedup contract
be unit-tested in isolation.

Order matters: requested scopes appear first (in input order), then
baseline scopes not already present. The handler uses
slices.Equal(union, requested) to decide whether to log a WARN, so
stable order is part of the contract.

Refs #5224
This is the user-visible behavior change: between scope validation
and client construction, the DCR handler now unions the
operator-configured BaselineClientScopes into the registered
client's scope set. RFC 7591 §3.2.1 permits the AS to replace
requested client metadata, which is what fixes the Claude Code
bug (anthropics/claude-code#4540): a client whose DCR request
narrows the scope field can still request the baseline at
/oauth/authorize without invalid_scope rejection.

WARN logs only when the union actually expands the requested set
(slices.Equal short-circuits the no-op path). The DCR response at
the bottom of the handler already echoes the effective scope set
back to the client, so clients see exactly what they got.

Refs #5224
Cover the four contract paths in TestRegisterClientHandler_-
BaselineClientScopes: empty client scope unions with baseline,
baseline-as-subset of requested produces no expansion, partial
overlap appends only non-overlapping scopes, disjoint baseline
expands the registered set, and nil baseline preserves existing
behavior.

DoAndReturn captures the fosite client passed to RegisterClient
so the test asserts not only that the DCR response echoes the
expected scope set but also that the same set actually reaches
storage. The disjoint case is the canonical regression test for
anthropics/claude-code#4540.

Refs #5224
TestRunConfigValidate covers seven contract paths: nil and empty
baselines pass with non-empty scopes_supported, single and
multi-element baseline subsets pass, a baseline scope missing from
scopes_supported is reported with both the offending scope name and
the "not in scopes_supported" phrase, a non-nil baseline with nil
scopes_supported fails closed, and the helper reports a missing
scope when multiple are missing.

Refs #5224
jhrozek added 12 commits May 12, 2026 11:37
This is the canonical regression test for anthropics/claude-code#4540
(Claude Code DCR-registers with a narrowed scope but later requests
the full set at /oauth/authorize). Three sequential subtests on a
shared client_id:

1. DCR-register with scope="openid" against a server configured with
   BaselineClientScopes=["offline_access"]; assert the response's
   scope field echoes "openid offline_access".

2. /authorize with scope="openid offline_access" against the same
   client; assert HTTP 302 redirect to upstream (not 400
   invalid_scope, the pre-fix behavior).

3. /authorize with a scope outside scopes_supported; assert fosite
   rejects with invalid_scope. Guards against silent privilege
   escalation if BaselineClientScopes ever drifts to bypass the
   ScopesSupported boundary.

WithBaselineClientScopes is added to the test helpers alongside the
existing WithScopesSupported option.

Refs #5224
The new BaselineClientScopes field on authserver.RunConfig is
referenced (transitively) by the API's swagger surface, so the CI
docs check requires the regenerated swagger.json/yaml/docs.go.
Mechanical regen via task docs; no behavioral or hand-written
change.

Refs #5224
Trey's review (F1) noted the trust boundary is only weakly defended
in docs. Two changes address this:

The CRD field doc gains an explicit privilege-escalation example so
operators writing YAML see the warning, not just someone reading
the handler source. A baseline of "admin:read" is now called out by
name as something they should never configure.

NewAuthorizationServerConfig emits a one-time INFO log when the
baseline is non-empty, so operators can verify in their logs what
their DCR registrations will be auto-granted. This matches the
"INFO sparingly — only for long-running operations" rule for a
process-lifetime event.

Refs #5224
Per .claude/rules/go-style.md, Warn is for non-fatal issues
(deprecations, fallback, cleanup failures), and Info is for
long-running events. Baseline-driven expansion of a DCR scope set
is the intended behavior whenever baseline_client_scopes is
configured, not a problem to flag — so per-registration audit
belongs at Debug. Operator-visible signal that the baseline is in
effect already comes from the one-time Info log at
NewAuthorizationServerConfig startup.

Refs #5224
The DCR handler reads BaselineClientScopes, ScopesSupported, and
AllowedAudiences via h.config on every request. They all flow by
reference from cfg through RunConfig → Config →
AuthorizationServerParams → AuthorizationServerConfig. Nothing
mutates them today, but the runner is the natural boundary at which
operator-supplied input becomes runtime state, and slices.Clone
costs nothing for the bounded slice sizes involved. Cloning once
here lets every downstream stage share by reference safely against
a future hot-reload path or test code that retains the original.

Refs #5224
NewAuthorizationServerConfig has its own subset check
(validateParams → ValidateScopeSubset) so a direct caller that
bypasses RunConfig.Validate is still rejected when
BaselineClientScopes contains a scope absent from ScopesSupported.
The case lived in design only — adding it to the existing
TestNewAuthorizationServerConfig_InvalidConfig table protects the
gate against silent regression in a future refactor.

Refs #5224
Symmetric to the redirect-with-error branch, which already asserts
on \`invalid_scope\` in the query string: if fosite ever returns 400
directly (rather than redirecting with error), the test must still
assert the error code is invalid_scope and not something else
(server_error, unauthorized_client, etc.). Otherwise a future fosite
upgrade could swap in a different code and the
privilege-escalation regression would silently slip past this test.

Reads the body up front so the existing structure works without
teeing around a deferred drain.

Refs #5224
A baselineClientScopes value of [""] or ["foo bar"] would slip past
admission and only fail at the runtime subset check inside
NewEmbeddedAuthServer — confusing error, late feedback. Two
kubebuilder item-level validations move the rejection forward to
the apiserver:

- items:MinLength=1 — no empty strings
- items:Pattern=^[\x21\x23-\x5B\x5D-\x7E]+$ — the RFC 6749 §3.3
  scope-token charset (printable ASCII excluding space, ", and \)

Operators now get a clean "spec is invalid" from the apiserver
instead of a CrashLoopBackOff with a subset-check log line.

Refs #5224
The previous doc enumerated two call sites and missed a third
(Config.applyDefaults's empty-supported-with-non-empty-baseline
check, which is a related invariant even though it doesn't go
through this helper). Per .claude/rules/go-style.md "Keep Comments
Synchronized With Code", call-site lists rot when call sites move
or new ones appear. Describing what the function does and why
callers share it is enough for the reader.

Refs #5224
The previous wording "validated against ScopesSupported and, if
configured, unioned with BaselineClientScopes" could be read as
"the unioned set is validated against ScopesSupported." It isn't —
ValidateScopes runs against the client-supplied set BEFORE the
union, and the union is trusted by construction because
BaselineClientScopes is itself a subset of ScopesSupported per
startup-time validation. Rephrased to make the ordering and the
"not re-validated here" intent explicit.

Refs #5224
A caller that constructs the runtime Config directly (bypassing YAML
loading and therefore bypassing RunConfig.Validate) was only caught
by the inner validateParams in the provider layer. That works but
gives a deeper call stack at the failure site. Adding the check to
Config.Validate closes the gap and makes the three-gate defense in
depth (RunConfig.Validate → Config.Validate → validateParams) fully
symmetric on the subset invariant.

New TestConfigValidate case asserts the wired-up check fires with
the same error message that the other gates produce.

Refs #5224
The natural first-use config

  baselineClientScopes:
    - offline_access

was failing at startup because RunConfig.Validate (and Config.Validate
after F12) checked baselineClientScopes against the raw scopesSupported
slice, which the operator passes as nil to mean "use server defaults"
(documented at controllerutil/authserver.go:461 and
vmcpconfig/converter.go:404-415 deriveScopesSupported). applyDefaults
only substitutes registration.DefaultScopes later, after validation
has already rejected the config.

Both validation gates now treat empty scopesSupported as equivalent to
registration.DefaultScopes for the subset check — without mutating the
struct, so wire-format stays faithful to operator input. The
applyDefaults inner guard that rejected this case is now dead and
removed; defaulting symmetry with HMACSecrets and KeyProvider is
restored. Doc comments on both RunConfig.BaselineClientScopes and
Config.BaselineClientScopes describe the new contract.

Tests flip the previously-rejecting fixtures to the documented happy
path and add coverage for the still-rejecting case (nil supported with
a baseline scope outside DefaultScopes).

Refs #5224
@jhrozek jhrozek force-pushed the worktree-as-dcr-scopes branch from 790f9e7 to 84a4eec Compare May 12, 2026 10:37
@jhrozek
Copy link
Copy Markdown
Contributor Author

jhrozek commented May 12, 2026

Good catch, fixed in b1d7ca33e. Both validate gates now treat an empty scopesSupported as the default scope set for the subset check (the same set applyDefaults would substitute), so the natural config you described just works. The redundant guard in applyDefaults is gone too. Updated the doc comments and tests to match.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 12, 2026
The previous commit reworded RunConfig.BaselineClientScopes' Go doc
comment to describe the new "nil scopesSupported uses DefaultScopes
for the subset check" contract. swag init renders that doc comment
into docs/server/swagger.{json,yaml,docs.go}, so the verify-swagger
CI check fails until the artifacts are regenerated to match.

Mechanical regen via swag init; no behavioral or hand-written change.

Refs #5224
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 12, 2026
@jhrozek jhrozek merged commit 952c9ef into main May 12, 2026
133 of 138 checks passed
@jhrozek jhrozek deleted the worktree-as-dcr-scopes branch May 12, 2026 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants