Skip to content

Add authserver DCR credential store and resolver#5042

Merged
tgrunnagle merged 12 commits intomainfrom
issue_5038_dcr-2b
May 1, 2026
Merged

Add authserver DCR credential store and resolver#5042
tgrunnagle merged 12 commits intomainfrom
issue_5038_dcr-2b

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

@tgrunnagle tgrunnagle commented Apr 23, 2026

Summary

Upstream Identity Providers that support RFC 7591 Dynamic Client Registration can now be configured without a pre-provisioned client_id. At startup the authserver discovers the registration endpoint, registers itself once per (issuer, redirect_uri, scopes) tuple, and caches the resulting credentials for the process lifetime. This unblocks the remaining wiring + observability work in sub-issue C and removes the last manual provisioning step for DCR-capable upstreams (e.g., nanobot, Hydra).

  • WHY: Operators running the embedded authserver against a DCR-capable upstream were forced to pre-register a client manually and hand-roll the client_id/client_secret into the run config. Phase 2 of the DCR story (Authserver DCR integration (Phase 2, Steps 2a-2g) #4978) lets the authserver do this at startup, keyed canonically so a future Redis backend (Phase 3) can share the cache across replicas without redefining the key shape.
  • WHAT:
    • DCRCredentialStore interface + in-memory implementation keyed by (Issuer, RedirectURI, ScopesHash). ScopesHash is SHA-256 of the deduplicated + sorted scope list, so permuted or duplicated scope inputs share a digest. No TTL, no eviction — RFC 7591 registrations are long-lived and are only revoked by explicit RFC 7592 deregistration.
    • resolveDCRCredentials performs (1) cache lookup (short-circuits before any network I/O on hit), (2) authorization-server metadata discovery via FetchAuthorizationServerMetadata when DiscoveryURL is set, or the operator-supplied RegistrationEndpoint directly, (3) explicit rc.AuthorizationEndpoint / rc.TokenEndpoint override of discovered values, (4) redirect-URI derivation from the issuer origin + /oauth/callback with HTTPS enforcement outside loopback, (5) auth-method intersection against the preference order private_key_jwt > client_secret_basic > client_secret_post > none, (6) synthesis of {origin}/register when metadata omits registration_endpoint (nanobot/Hydra convention), (7) a single call to oauthproto.RegisterClientDynamically with the initial access token injected as Authorization: Bearer via a wrapping RoundTripper, and (8) DCRCredentialStore.Put before returning — per the write-durable-first rule in .claude/rules/go-style.md.
    • Full RFC 7591 + RFC 7592 response is captured on the DCRResolution (RegistrationAccessToken, RegistrationClientURI, TokenEndpointAuthMethod, CreatedAt) so Phase 3 has everything it needs for management operations.
    • DCRUpstreamConfig.DiscoveryURL is honoured exactly (no well-known-path fallback) via the new oauthproto.FetchAuthorizationServerMetadataFromURL, with the RFC 8414 §3.3 issuer-equality check preserved.

Closes #5038

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)

New test coverage (pkg/authserver/runner/dcr_store_test.go, pkg/authserver/runner/dcr_test.go):

  • Put/Get round-trip; distinct DCRKey values don't collide; ScopesHash is stable across permuted and duplicated scope order.
  • Resolver end-to-end flow against an httptest.NewServer mounting AS metadata + DCR endpoint.
  • Cache-hit short-circuit: store pre-populated, then assert the counting server receives zero requests. The server is wired as the issuer so a regression that moved discovery before the cache lookup would fail.
  • Explicit AuthorizationEndpoint overrides discovered value.
  • Initial access token forwarded as Authorization: Bearer ….
  • client_secret_basic chosen over none when both are advertised.
  • Metadata missing registration_endpoint triggers the synthesized /register path.
  • Discovered-scopes fallback and empty-scope omitted-field branches.
  • grep -nE '(client_secret|registration_access_token|initial_access_token|refresh_token)' pkg/authserver/runner/dcr.go returns only struct-field / JSON-tag hits — no slog.* leakage.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label is applied and the migration guidance is described above.

No CRD changes. The new DCRConfig field on OAuth2UpstreamRunConfig is additive and gated by an exactly-one-of validator against ClientID.

Changes

File Change
pkg/authserver/runner/dcr_store.go New DCRCredentialStore interface, DCRKey, in-memory implementation, dcrStaleAgeThreshold constant.
pkg/authserver/runner/dcr_store_test.go Round-trip, collision, and scope-hash canonicalisation tests.
pkg/authserver/runner/dcr.go DCRResolution, needsDCR, applyResolution, scopesHash, auth-method intersection, resolveDCRCredentials.
pkg/authserver/runner/dcr_test.go Resolver flow, cache short-circuit, endpoint overrides, bearer token, auth-method selection, synthesized /register.
pkg/authserver/config.go DCRUpstreamConfig + DCRConfig field on OAuth2UpstreamRunConfig; Validate enforces exactly-one-of for ClientID vs DCRConfig and DiscoveryURL vs RegistrationEndpoint.
pkg/authserver/config_test.go Validation table tests.
pkg/oauthproto/discovery.go FetchAuthorizationServerMetadata (three-path RFC 8414 / OIDC Discovery fallback, bounded per-call timeout, issuer-equality check) and FetchAuthorizationServerMetadataFromURL (exact fetch, no fallback). ErrRegistrationEndpointMissing sentinel.
pkg/oauthproto/discovery_test.go Three-path fallback, issuer-equality, loopback HTTP carve-out, timeout-overrides-caller-context, invalid issuer rejection, body-cap verification.
pkg/oauthproto/dcr.go Moved from pkg/auth/oauth/dynamic_registration.go; RegisterClientDynamically now takes *http.Client directly (nil builds default); NewDefaultDCRClient exposed so callers inherit the canonical timeout policy.
pkg/oauthproto/{constants,errors,redirect,locality}.go Moved from pkg/oauth; IsLoopbackHost relocated here with pkg/networking.IsLocalhost kept as a thin wrapper.
pkg/oauth/* Deleted — orphaned after rename.
17 import sites Updated to pkg/oauthproto; 8 redundant aliases dropped.

Does this introduce a user-facing change?

Yes. Operators can now omit client_id on an OAuth2UpstreamRunConfig and instead set dcr_config with either a discovery_url or a direct registration_endpoint. The authserver will register itself with the upstream at startup and cache the result in memory for the process lifetime. The existing pre-provisioned-client_id path is unchanged.

Implementation plan

Approved implementation plan

Phase 2 of the DCR story (#4978), delivered across four commits on this branch:

  1. Phase 1 (3cb8de58) — Rename pkg/oauthpkg/oauthproto and consolidate DCR primitives into pkg/oauthproto/dcr.go. Relocate IsLocalhost logic into pkg/oauthproto/IsLoopbackHost, keep pkg/networking.IsLocalhost as a thin wrapper. RegisterClientDynamically takes *http.Client directly. Host NewDynamicClientRegistrationRequest in pkg/auth/discovery/dcr_request.go (sole caller; loopback redirect-URI assumption stays out of the protocol package).

  2. Step 2a/2b/2e (d578dfee) — Add FetchAuthorizationServerMetadata to pkg/oauthproto (three-path RFC 8414 / OIDC Discovery fallback, bounded per-call timeout, RFC 8414 §3.3 issuer equality check, ErrRegistrationEndpointMissing sentinel). Add DCRUpstreamConfig struct and DCRConfig field on OAuth2UpstreamRunConfig. Exactly-one-of validation for ClientID vs DCRConfig and DiscoveryURL vs RegistrationEndpoint. Purely additive: no behaviour change, no CRD changes.

  3. Step 2f/2c (e965b86d) — Add DCRCredentialStore interface + in-memory implementation keyed by (Issuer, RedirectURI, ScopesHash). Add resolveDCRCredentials: cache lookup, metadata discovery or direct registration endpoint, explicit endpoint override, auth-method intersection, synthesis of {origin}/register when metadata omits it, single call to oauthproto.RegisterClientDynamically with initial access token injected via a wrapping RoundTripper, Put before returning.

  4. Review iterations (abbc15ab + amendments in earlier commits) — Address HIGH / MEDIUM findings:

    • Honour DCRUpstreamConfig.DiscoveryURL exactly via FetchAuthorizationServerMetadataFromURL (no well-known-path fallback on an operator-supplied URL).
    • Rewrite cache-hit test to wire the counting server as the issuer and assert zero requests.
    • Expose NewDefaultDCRClient so newDCRHTTPClient inherits the canonical timeout.
    • scopesHash deduplicates via slices.Compact.
    • applyResolution doc comment clarifies it overwrites ClientID.
    • Direct-RegistrationEndpoint path documented as disabling server-capability negotiation.
    • io.ReadAll replaces hand-rolled body readers; response bodies drained before close in error paths.

Final review state: 0 CRITICAL / 0 HIGH / 0 MEDIUM.

Large PR Justification

  • The resolver core (~700 lines) is irreducible: RFC 7591's validate → cache → singleflight → discover → register → cache-write sequence is one logical change with no shippable subunit, so a minimal viable DCR resolver lands well over the 400-line guidance no matter how it's structured.
  • The original issue scoped this poorly: it framed "DCR credential store + resolver" as a single deliverable, where in hindsight three independent units (pkg/oauthproto FetchFromURL ~94 lines, DCRCredentialStore ~150 lines, run-config validator additions ~50 lines) had clean dependency seams and could have stacked ahead of the resolver core.
  • Review-driven additions inflated the bundle by ~250 lines (panic recovery, RFC 7591 expiry refetch, S256 PKCE gate, redirect refusal, URL validators, path-preservation fixes, endpoint-required-on-bypass) that each could have been small follow-up PRs against a merged resolver, but folded in here because they surfaced during this PR's review — splitting them off now would re-pay merge cost across the stack and lose the review trail, so the bundle stays as-is.

@tgrunnagle tgrunnagle changed the base branch from main to issue_5037_dcr-2a April 23, 2026 19:31
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Apr 23, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 23, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 89.57746% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.51%. Comparing base (ffefa61) to head (5c1e606).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
pkg/authserver/runner/dcr.go 88.44% 18 Missing and 17 partials ⚠️
pkg/oauthproto/discovery.go 90.90% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5042      +/-   ##
==========================================
+ Coverage   67.29%   67.51%   +0.21%     
==========================================
  Files         598      601       +3     
  Lines       60348    60969     +621     
==========================================
+ Hits        40612    41163     +551     
- Misses      16659    16693      +34     
- Partials     3077     3113      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 27, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 28, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
@tgrunnagle tgrunnagle force-pushed the issue_5037_dcr-2a branch 2 times, most recently from e6588ea to 7dc73f8 Compare April 29, 2026 16:13
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
Base automatically changed from issue_5037_dcr-2a to main April 29, 2026 18:33
Implements Phase 2 steps 2f/2c of the DCR story (#5038):

- DCRCredentialStore interface with in-memory implementation keyed by
  (Issuer, RedirectURI, ScopesHash). No TTL — RFC 7591 registrations
  are long-lived and only purged by explicit deregistration. The
  ScopesHash tuple is designed to compose a Redis key segment in
  Phase 3 without redefining the canonical form.
- resolveDCRCredentials performs cache lookup, authorization server
  metadata discovery (or direct registration endpoint), explicit
  endpoint override, auth-method intersection against the preference
  order private_key_jwt > client_secret_basic > client_secret_post >
  none, synthesis of {origin}/register when metadata omits
  registration_endpoint, and a single call to
  oauthproto.RegisterClientDynamically with the initial access token
  injected as Authorization: Bearer via a wrapping RoundTripper.
- Resolution stored before returning, per the write-durable-first
  rule in .claude/rules/go-style.md.

Address code review feedback on DCR resolver

Fixed issues from code review:

- HIGH: DCRUpstreamConfig.DiscoveryURL now honoured. Added
  oauthproto.FetchAuthorizationServerMetadataFromURL which fetches
  the operator-configured URL exactly (no well-known-path fallback)
  and enforces RFC 8414 §3.3 issuer equality. Updated field doc
  comments to match behaviour.
- HIGH: TestResolveDCRCredentials_CacheHitShortCircuits rewritten
  to wire the count-everything server as the issuer and assert
  totalRequests == 0, so a regression that moved discovery before
  the cache lookup would actually fail the test.
- MEDIUM: Added tests for discovered-scopes fallback and empty-scope
  omitted-field branches.
- MEDIUM: Exposed oauthproto.NewDefaultDCRClient so newDCRHTTPClient
  inherits the canonical DCR timeout policy rather than duplicating
  it — future timeout changes propagate automatically.
- MEDIUM: Replaced hand-rolled readAll / readAllResp helpers with
  io.ReadAll.
- MEDIUM: Documented that direct RegistrationEndpoint disables
  server-capability negotiation.
- MEDIUM: Documented that applyResolution always overwrites ClientID
  and explained the precondition enforcement.
- MEDIUM: scopesHash now deduplicates (slices.Compact) so scope sets
  that differ only by duplicated values share a canonical digest.
  Test subtests updated accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
Closes the 14 review threads from review #4199898531 that are scoped to
this PR; #3 and #5 are intentionally deferred to PR #5044 which is the
PR that consumes the ClientSecret plumbing and RedirectURI propagation.

Behaviour fixes:

- #2 synthesiseRegistrationEndpoint and resolveUpstreamRedirectURI now
  preserve the issuer's path component, so multi-tenant upstreams that
  ship DCR without advertising it (or this auth server when it itself
  has a tenant prefix in its issuer) keep their tenant prefix in the
  synthesised URL and the defaulted redirect URI.
- #7 resolveDCRCredentials wraps registration in a package-level
  singleflight.Group keyed by DCRKey so concurrent callers for the same
  upstream coalesce into a single RFC 7591 registration. The leader's
  closure rechecks the cache before any network I/O so followers that
  arrive just after a Put returns observe the fresh entry.
- #9 applyResolution writes ClientID only when the run-config copy
  leaves it empty, treating it symmetrically with the endpoint writes.
  This is defence-in-depth for callers that bypass the resolver's
  validateResolveInputs precondition check.
- #10 the cfg.RegistrationEndpoint short-circuit branch in
  resolveDCREndpoints validates URL well-formedness and HTTPS-or-
  loopback locally, so a non-HTTPS endpoint fails before
  performRegistration constructs a bearer-token transport for it.
- #13 endpointsFromMetadata validates discovered authorization_endpoint
  and token_endpoint before populating dcrEndpoints, so a self-
  consistent metadata document with http:// URLs cannot flow through
  to the auth-code or token-exchange path.

Doc / scope:

- #6 trust-assumption note added on DCRUpstreamConfig with a link to
  the follow-up SSRF-hardening issue (#5135) per the
  "Document Architectural Constraints" rule.
- #8 NewInMemoryDCRCredentialStore doc now explicitly calls out what
  the in-memory store does and does NOT solve (cross-replica sharing,
  durability, cross-process write coordination).

Test hygiene:

- #4 deleted the literal-equals-literal dcrStaleAgeThreshold test that
  added no meaningful coverage.
- #11 atomic.AddInt32 / LoadInt32 for wellKnownHits / customHits in
  TestFetchAuthorizationServerMetadataFromURL, matching the sibling
  pattern in dcr_test.go.
- #12 new failingDCRStore test double + two cases asserting the
  cache.Get / cache.Put wrap messages ("dcr: cache lookup", "dcr: cache
  put") that operators see when a future Redis backend errors.
- #14 renamed the TestNeedsDCR case from
  "client_id with dcr (invalid but returns false)" to
  "client_id wins over dcr_config (defensive AND semantic)" so the
  case label describes the intentional defensive behaviour rather
  than reading like a bug report.
- #15 removed a //nolint:lll directive that was suppressing a non-
  existent violation (the line is 108 chars; the limit is 130).
- #16 new TestInMemoryDCRCredentialStore_ConcurrentAccess fans out
  16 goroutines doing alternating Put/Get against overlapping and
  disjoint keys, with a fail-fast deadline, so go test -race actually
  exercises the sync.RWMutex guard.

New tests cover each behaviour change:

- TestResolveDCRCredentials_SingleflightCoalescesConcurrentCallers
  pins exactly-one-registration under N concurrent callers.
- TestSynthesiseRegistrationEndpoint_PreservesIssuerPath /
  TestResolveUpstreamRedirectURI_PreservesIssuerPath cover the path-
  preservation fix.
- TestApplyResolution_DoesNotOverwritePreProvisionedClientID covers
  the symmetric ClientID write.
- TestResolveDCREndpoints_DirectRegistrationEndpointValidated covers
  the local registration-endpoint guard.
- TestEndpointsFromMetadata_RejectsInsecureDiscoveredEndpoints covers
  the metadata authz/token endpoint guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tgrunnagle tgrunnagle requested a review from amirejaz as a code owner April 29, 2026 21:00
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
@jhrozek
Copy link
Copy Markdown
Contributor

jhrozek commented Apr 30, 2026

While reviewing I noticed we're about to ship two DCR-client implementations: this resolver and the CLI flow in pkg/auth/discovery/discovery.go::PerformOAuthFlow (which already calls oauthproto.RegisterClientDynamically and persists creds via Config.CachedClientID/CachedClientSecretRef/CachedRegTokenRef). Same RFC 7591 role, different client profile (public-PKCE vs confidential) and concurrency model.

A shared pkg/auth/dcr package — stateful resolver, cache, singleflight, scope canonicalisation — consumed by both, with stateless RFC primitives lifted into pkg/oauthproto, would prevent the duplication calcifying. Worth filing a follow-up issue to track? Easier to migrate now than after another consumer lands.

Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline review of the DCR resolver scaffolding — seven questions / suggestions across dcr.go, dcr_test.go, and dcr_store.go. Top-level architectural note posted separately.

Comment thread pkg/authserver/runner/dcr.go Outdated
Comment thread pkg/authserver/runner/dcr.go
Comment thread pkg/authserver/runner/dcr.go Outdated
Comment thread pkg/authserver/runner/dcr_test.go Outdated
Comment thread pkg/authserver/runner/dcr_test.go Outdated
Comment thread pkg/authserver/runner/dcr_store.go
Comment thread pkg/authserver/runner/dcr_store.go
RFC 7636 §4.2 / OAuth 2.1 require S256 PKCE for public clients. The
resolver previously accepted the "none" token-endpoint auth method
whenever an upstream advertised it, with no check against
code_challenge_methods_supported. Registering as "none" against an
upstream that only advertises "plain" — or omits the field entirely —
would either break at runtime (our auth-code path always sends
code_challenge_method=S256) or silently downgrade to a non-compliant
plain-PKCE flow.

- dcrEndpoints now carries codeChallengeMethodsSupported, populated
  from metadata.CodeChallengeMethodsSupported in endpointsFromMetadata.
- selectTokenEndpointAuthMethod takes the slice as a second argument
  and skips "none" when S256 is not advertised, falling through to
  the next less-preferred method. When "none" is the only mutually
  supported method and S256 is missing, the function returns an
  explicit error citing the spec so operators see a clear failure
  at boot rather than a silent runtime downgrade.
- TestResolveDCRCredentials_AuthMethodPreference's "falls back to
  none" case now also advertises S256 to reflect the compliant flow.
- New TestResolveDCRCredentials_RefusesNoneWithoutS256 pins the
  rejection in both shapes the upstream can take it (field omitted,
  field lists only plain).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: security, concurrency, test-coverage, error-handling/api-design, architecture, oauth-protocol, code-quality (general). Codex cross-review: skipped (codex CLI not installed).

Consensus Summary

# Finding Consensus Severity Action
F1 resolveDCRCredentials doc misidentifies the issuer parameter as the upstream's 9/10 HIGH Fix doc / consider rename
F2 Direct RegistrationEndpoint path: nothing forces caller to set AuthorizationEndpoint/TokenEndpoint 8/10 MEDIUM Tighten OAuth2UpstreamRunConfig.Validate
F3 endpointsFromMetadata derefs metadata without nil guard 7/10 LOW Add nil guard
F4 singleflight returns shared *DCRResolution to followers 7/10 LOW Document or copy
F5 Synthesised registration endpoint not re-validated for HTTPS-or-loopback 7/10 LOW Add validateUpstreamEndpointURL call
F6 chooseRegistrationScopes warning logs the wrong issuer (auth-server vs. upstream) 7/10 LOW Pass upstream issuer (related to F1)
F7 validateResolveInputs nil-rc / empty-issuer / nil-cache branches uncovered 7/10 LOW Add table rows
F8 synthesiseRegistrationEndpoint error paths uncovered 7/10 LOW Add error rows
F9 deriveExpectedIssuerFromDiscoveryURL missing-host branch uncovered 7/10 LOW Add 1 row
F10 buildResolution fallback to sent auth method uncovered 7/10 LOW Add 1 test
F11 dcrStaleAgeThreshold is dead code in this PR (nolint:unused) 9/10 INFO Acknowledge / track for #5039

Overall

The PR delivers Phase 2 of the DCR story with strong defensive engineering: HTTPS-or-loopback enforced at every URL boundary, RFC 8414 §3.3 issuer-equality, a registration redirect-block that prevents bearer-token leakage to foreign origins (pinned by _DoesNotForwardBearerOnRedirect with a foreign-origin tripwire), defensive copies on the cache, and singleflight-coalesced registration to prevent orphaned clients at the upstream. Test coverage is dense (~30 functions, 1216 lines on the resolver alone) with explicit security-property tests and a table-driven concurrent-store stress test bounded by a fail-fast deadline. Module boundaries are clean — protocol primitives stay in pkg/oauthproto, runtime resolution and the bearer-token transport stay in pkg/authserver/runner. The PR description and doc-comment density are exemplary.

The blocking concern is F1: the public doc on resolveDCRCredentials says issuer is the upstream's identifier used both for discovery and for keying the cache, but the implementation never uses issuer for discovery — deriveExpectedIssuerFromDiscoveryURL(cfg.DiscoveryURL) recovers the upstream issuer separately. The inline comment in registerAndCache and the test _UpstreamIssuerDerivedFromDiscoveryURL both confirm that the parameter actually names this auth server (used for redirect-URI defaulting and cache keying). The next caller (#5039) reading the public doc will pass the wrong identifier, producing a wrong-origin default redirect URI under the upstream and a wrong cache key. Either fix the doc or rename the parameter (localIssuer / authserverIssuer); the same ambiguity is on DCRKey.Issuer in dcr_store.go.

F2 is the next concern: when an operator chooses dcr_config.registration_endpoint (bypass discovery), the doc on RegistrationEndpoint warns them to also supply AuthorizationEndpoint/TokenEndpoint/Scopes, but OAuth2UpstreamRunConfig.Validate does not enforce this. A misconfigured run-config registers a client successfully and silently fails at first authorize/token-exchange call. Per .claude/rules/go-style.md Constructor Validation: Fail Loudly on Invalid Input, this should fail at startup.

The remaining findings (F3–F11) are individually small but mostly mechanical: a defensive nil-guard, a singleflight pointer-aliasing note, a missed re-validation of one synthesised URL, a logging-issuer mix-up that follows naturally from F1, and four test-coverage gaps each a row or two of additions. F11 is a tracked forward-prep nolint:unused constant that is fine as-is provided #5039 lands on schedule.

No security-property regression and no breakage of the pre-provisioned-client_id path. Recommendation: COMMENT — F1/F2 are worth addressing before merge but neither blocks correctness today (the wiring caller does not yet exist).

Documentation

  • F1's recommendation should propagate to the doc on DCRKey.Issuer in pkg/authserver/runner/dcr_store.go, which currently reads the authorization server's issuer identifier — same auth-server-vs-upstream ambiguity.

Generated with Claude Code

Comment thread pkg/authserver/runner/dcr.go Outdated
Comment thread pkg/authserver/config.go
Comment thread pkg/authserver/runner/dcr.go
Comment thread pkg/authserver/runner/dcr.go Outdated
Comment thread pkg/authserver/runner/dcr.go
Comment thread pkg/authserver/runner/dcr_test.go Outdated
Comment thread pkg/authserver/runner/dcr_test.go
Comment thread pkg/authserver/runner/dcr_test.go
Comment thread pkg/authserver/runner/dcr.go
Comment thread pkg/authserver/runner/dcr_store.go
tgrunnagle and others added 8 commits April 30, 2026 08:00
oauthproto decodes RFC 7591 §3.2.1 client_id_issued_at and
client_secret_expires_at, but buildResolution dropped both. The
authoritative consequence was for client_secret_expires_at: a non-zero
upstream-asserted expiry was invisible to lookupCachedResolution, so
the cache served an expired secret indefinitely, every token-endpoint
call after the upstream's expiry would 401, and the resolver had no
signal to refetch.

The 90-day dcrStaleAgeThreshold added for Step 2g observability is a
separate, heuristic signal ("consider rotating") and does not address
this — the upstream-asserted expiry is authoritative.

- DCRResolution now carries ClientIDIssuedAt and ClientSecretExpiresAt
  as time.Time. The wire convention "0 means absent / does not expire"
  maps to the zero time.Time so callers can use IsZero() uniformly,
  via a small epochSecondsToTime helper.
- buildResolution populates both fields by converting the int64 epoch
  values from the registration response.
- lookupCachedResolution treats an entry with a non-zero
  ClientSecretExpiresAt that has passed as a cache miss, so the
  singleflight body re-runs registerAndCache and overwrites the stale
  entry via the existing cache.Put. A Debug log attributes the
  refetch to client_secret_expires_at so operators can correlate the
  re-registration with the upstream's expiry rather than guessing.

Tests:

- TestBuildResolution_PopulatesRFC7591ExpiryFields covers all three
  shapes (both populated, expires_at omitted, both omitted) and that
  zero on the wire becomes the zero time.Time.
- TestResolveDCRCredentials_RefetchesOnExpiredCachedSecret pins the
  cache-miss-on-expiry behaviour: an upstream that returns an already-
  expired secret causes back-to-back resolver calls to register twice.
- TestResolveDCRCredentials_HonoursFutureExpiryAndZero pins the cache-
  hit path: future expiry and the zero-means-does-not-expire wire
  convention both serve from the cache without re-registering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
singleflight.Group re-panics the leader's panic in every follower, so
without recovery a panic in registerAndCache (or anything it calls —
the registration HTTP exchange, the cache.Put, etc.) would crash the
leader goroutine and every concurrent follower for the same DCRKey
with the same value. Net effect: a single bad upstream response could
take out N goroutines instead of one error path.

- The dcrFlight.Do closure now installs a defer/recover that converts
  a panic into a normal error. The recovered value is logged at
  slog.Error with a full debug.Stack() so the panic is not silently
  swallowed; the returned error wraps the recovered value so callers
  can react to it as a normal failure mode.
- Doc comment on the singleflight call expanded to explain why the
  recover is necessary (the fan-out semantics of singleflight).

Tests:

- panickingPutDCRStore: a new test double whose Put panics with a
  configured value while Get returns a normal cache miss, so callers
  reach the singleflight closure and trigger the panic via cache.Put
  inside registerAndCache.
- TestResolveDCRCredentials_RecoversPanicInsideSingleflight: spawns
  N=6 concurrent callers for the same DCRKey, each with its own
  defer/recover so an un-recovered panic is captured as a test
  failure rather than crashing the suite. Asserts every caller
  received an error containing the panic-marker string and the
  recovered value, and that none of them observed an actual panic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing assertion "registrationCalls == 1" is necessary but not
sufficient: a goroutine that reached resolveDCRCredentials after the
leader's cache.Put returned would short-circuit through
lookupCachedResolution, take the cache hit, and still leave the count
at 1. Under CI load the 50ms gate window was easy to miss, which would
silently weaken what the test claims to cover.

- Wrap the in-memory store in a countingStore decorator that records
  every Get that returned a hit.
- Assert cache.hits.Load() == 0 after wg.Wait — any non-zero value
  means a late-arriver took the lookupCachedResolution short-circuit
  instead of joining the singleflight, and the coalescing path was
  not actually exercised.
- Bump the gate window from 50ms to 250ms for headroom under CI load.
  The countingStore assertion turns any remaining timing slip into a
  loud failure rather than a silent regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous TestResolveDCRCredentials_RequiresClientIDEmpty and
TestResolveDCRCredentials_RequiresDCRConfig each exercised one branch
of validateResolveInputs, leaving three uncovered: nil run-config,
empty issuer, and nil credential store. Per
.claude/rules/testing.md "Consolidation", folding into one table-
driven test covers every branch in one place and shrinks duplicated
setup.

- Replace the two single-branch tests with
  TestResolveDCRCredentials_RejectsInvalidInputs, which iterates over
  all five branches and asserts the wrapped error message contains
  the expected substring drawn from validateResolveInputs's literals.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer asked us to confirm the framing and document it on the
constructor. The unbounded-cache concern in .claude/rules/go-style.md
"Resource Leaks" does not bite for this implementation because the
key space is bounded by the operator-configured upstream count and
because Phase 3 Redis is the production answer; the doc now states
that explicitly so a future reader does not assume this is a
production-grade store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A nil resolution previously fell through as a silent no-op: the caller
got a successful return, the next Get returned a miss, and there was
no error trail to debug from. Per the constructor-validation rule in
.claude/rules/go-style.md, fail loudly at the boundary.

- inMemoryDCRCredentialStore.Put returns an explicit error when the
  resolution is nil, before taking the lock.
- DCRCredentialStore.Put interface doc records the contract so future
  backends (e.g. the Phase 3 Redis store) behave consistently.
- New TestInMemoryDCRCredentialStore_Put_RejectsNilResolution pins the
  rejection and asserts the rejected Put did not leave any partial
  entry behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The resolveDCRCredentials doc said "issuer is the upstream authorization
server's issuer identifier used both for discovery and for keying the
cache." That was wrong on both counts: the upstream's issuer is
recovered separately from rc.DCRConfig.DiscoveryURL via
deriveExpectedIssuerFromDiscoveryURL, and the parameter actually names
*this* auth server. An inline comment in registerAndCache already
contradicted the public doc, and a future caller reading only the
public doc would have produced (a) a wrong-origin default redirect URI
under the upstream IdP and (b) a cache key that does not identify the
auth-server context.

Renaming the parameter and documenting the contract closes the trap.

- resolveDCRCredentials: parameter renamed issuer → localIssuer; doc
  rewritten to spell out that it is *this* auth server's issuer, used
  to key the cache and default the redirect URI, and that the
  upstream's issuer is recovered separately. Same rename applied to
  validateResolveInputs, registerAndCache, lookupCachedResolution,
  chooseRegistrationScopes, and resolveUpstreamRedirectURI for
  internal consistency.
- endpointsFromMetadata, synthesiseRegistrationEndpoint: parameter
  renamed issuer → upstreamIssuer (it is in fact the upstream's,
  recovered from the DiscoveryURL) so the local/upstream split is
  symmetric.
- DCRKey.Issuer: doc clarifies the field carries *this* auth server's
  local issuer; two different local issuers registering against the
  same upstream are distinct OAuth clients and must not share cache
  entries.
- Structured-log keys "issuer" → "local_issuer" everywhere the value
  is the local issuer, so operators can tell the two apart in logs
  too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When dcr_config.registration_endpoint is set, the resolver bypasses
RFC 8414 / OIDC Discovery and therefore cannot populate
authorization_endpoint or token_endpoint from server metadata. The
operator-facing prose on DCRUpstreamConfig already says the run-config
must supply both, but neither validator enforced it. The downstream
effect was silent: dcrEndpoints came back with empty authorize/token
URLs, applyExplicitEndpointOverrides only filled them from rc, and
applyResolution only wrote when rc fields were empty — so a
misconfigured run-config completed registration successfully and only
broke at the first authorize or token-exchange call.

- OAuth2UpstreamRunConfig.Validate now rejects a config that sets
  DCRConfig.RegistrationEndpoint without both AuthorizationEndpoint
  and TokenEndpoint, with an error message that names the field and
  explains the discovery-bypass rationale.
- The discovery-flow path (DCRConfig.DiscoveryURL) is unaffected:
  metadata populates the endpoints there, so leaving them empty in
  the run-config remains valid.
- Existing test case "DCRConfig with only registration_endpoint is
  valid" updated to also set authorize/token endpoints (the path it
  was claiming to cover); two new cases pin the rejection when each
  endpoint is missing; one new case pins that DiscoveryURL is still
  valid without explicit endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 30, 2026
@github-actions github-actions Bot dismissed their stale review April 30, 2026 15:40

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seven inline comments addressed across dedicated commits — each fix matches or exceeds the suggestion. Two extra commits close adjacent footguns (local/upstream issuer disambiguation, explicit-endpoints validation when discovery is bypassed). LGTM.

@tgrunnagle tgrunnagle merged commit b532156 into main May 1, 2026
74 of 75 checks passed
@tgrunnagle tgrunnagle deleted the issue_5038_dcr-2b branch May 1, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Authserver DCR: credential store and resolver (Phase 2, Steps 2f/2c)

2 participants