fix(auth): cross-client OAuth session reuse (HYPER-268)#96
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, reopen this pull request to trigger a review.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 25 minutes and 56 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (32)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🚅 Deployed to the ePDS-pr-96 environment in ePDS
|
aa8c81d to
a663bb0
Compare
Coverage Report for CI Build 24689383743Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+5.5%) to 42.516%Details
Uncovered Changes
Coverage Regressions69 previously-covered lines in 4 files lost coverage.
Coverage Stats
💛 - Coveralls |
🦋 Changeset detectedLatest commit: 1bf9ce1 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
After a user signs in once via any OAuth client, a second client in the same browser no longer re-prompts for email OTP. ePDS recognises the existing device session and either auto-redirects (flow 1, login_hint matches) or shows the upstream account chooser (flow 2). Three pieces working together: 1. pds-core: broaden device-session cookie domain Auto-derived from AUTH_HOSTNAME / PDS_HOSTNAME: if AUTH_HOSTNAME ends with .<PDS_HOSTNAME>, inject Domain=<PDS_HOSTNAME> into outbound Set-Cookie headers for dev-id/ses-id and their :hash sidecars. On Railway preview envs (unrelated hostnames under a public suffix) the check fails and the middleware is silently skipped. 2. auth-service: skip the email form when a device session exists GET /oauth/authorize checks for a dev-id cookie. If present and prompt=login is not set, 302s to pds-core's upstream /oauth/authorize. If absent, renders the email form as today. 3. pds-core: enrich the upstream account chooser Response-rewrite middleware (extending PR #9's pattern) injects a script into /account* HTML responses that surfaces each account's email alongside the handle and adds a "Use a different account" link pointing at auth.<host>/oauth/authorize?prompt=login. Also includes: - 4 Gherkin scenarios (tagged @session-reuse, excluded from default CI) - 30 unit tests for the extracted pure helpers - Design doc at docs/design/hyper-268-session-reuse.md - Changeset Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mailhog was added by the initial commit and never wired into anything: no code, dotfile, Dockerfile, test script or CI workflow referenced it. Meanwhile the e2e suite uses Mailpit's API shape (/api/v1/search, Basic auth via MP_UI_AUTH), so running the compose stack for local e2e requires Mailpit anyway. Swap in the compose service and scrub stale references from the feature docs. Credentials default to admin/admin for the local dev trap; override via MAILPIT_USER / MAILPIT_PASS env vars. Prerequisite for running the @session-reuse scenarios (HYPER-268) against a local docker-compose stack, which need OTP extraction via Mailpit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HYPER-268's cross-client session-reuse scenarios cannot run against
Railway preview envs — their random *.up.railway.app hostnames live
under a browser public-suffix boundary and so can't share the
device-session cookie between pds-core and auth-service. Scenarios
are tagged @session-reuse and excluded from the default cucumber
tag filter for that reason.
This commit builds the local docker-compose harness needed to exercise
those scenarios against a topology that *does* have a shared parent
domain. Ingredients:
- Caddyfile: dedicated vhosts for $DEMO_HOSTNAME and
$DEMO_UNTRUSTED_HOSTNAME, outside the on-demand TLS block so they
don't need to survive /tls-check.
- docker-compose: second demo service (demo-untrusted) behind the dev
profile for the trusted-vs-untrusted scenarios. Caddy now advertises
network aliases for the public PDS / auth / demo hostnames so
sibling containers can reach each other by their real URLs without
NAT-hairpinning back out to the public IP.
- packages/shared/src/safe-fetch.ts: new `allowPrivateIps` option on
makeSafeFetch. Default false preserves production behaviour. When
true it lets local e2e runs fetch client-metadata over docker
aliases that resolve to private IPs. Plumbed via the
EPDS_ALLOW_PRIVATE_IPS env var at the single shared callsite
(client-metadata.ts). Document in-source that this must stay off
on internet-facing deployments — same risk profile as
PDS_DISABLE_SSRF_PROTECTION.
- e2e/cucumber.mjs: collapse the ad-hoc cucumber.session-reuse.mjs
into a single config that exports `default` and `session-reuse`
profiles via the `default: () => ({...})` shape cucumber supports
for hyphenated profile names. Run with `pnpm test:e2e -p session-reuse`.
Regression-tested end-to-end: disabling the cookie-domain broadening
middleware in pds-core (the first of HYPER-268's three pillars)
makes all four @session-reuse scenarios flip to a characteristic
failure signature — browser ends up on auth.* after the second
client initiates login, a fresh OTP email is sent to the test
address, and flow 1 times out waiting for /welcome. Restoring the
middleware returns the scenarios to their passing state. Proof that
the tests catch the regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oveHeader
The chooser-enrichment and client-CSS-injection middlewares both wrap
res.end() to inject HTML into upstream @atproto/oauth-provider responses.
On some routes — notably the React SPA account chooser at /account —
upstream flushes response headers before calling res.end(), so when the
wrapped end() subsequently called res.removeHeader('Content-Length')
Node's HTTP layer threw ERR_HTTP_HEADERS_SENT.
Because that throw comes from a method replacement on `res` (not from
the middleware body), it bypasses Express's error pipeline and lands as
an uncaught exception. pds-core exits with a stack trace; Docker's
restart policy brings it back a few seconds later; the user's OAuth
flow dies mid-response. In practice any HYPER-268 flow that reached the
account chooser was crashing the process.
Guard both removeHeader call sites with a res.headersSent check. When
headers have already been flushed, skip the Content-Length / ETag
rewrite and just inject the body — the response still reaches the
client (without an accurate Content-Length, which browsers tolerate on
HTTP/1.1 chunked responses and HTTP/2 alike). When headers have not
been flushed, behaviour is unchanged.
Lock the fix in with regression tests that mimic Node's real
removeHeader behaviour: the test mock throws when headersSent=true, so
any future change that drops the guard surfaces as a failed assertion
instead of a silently crashing service. Stashing the source changes
while keeping the tests confirms the tests go red on the bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small add-ons to the docker-compose e2e harness. Caddyfile + Caddy network alias: add a vhost for $MAILPIT_HOSTNAME pointing at the mailpit container's HTTP UI. Mailpit's own MP_UI_AUTH already enforces HTTP Basic, so no Caddy-level auth is needed. Lets developers inspect captured OTP emails at e.g. https://mailpit.epds-poc1.test.certified.app/ while manually driving the session-reuse flow. docker-compose demo-untrusted: explicitly blank EPDS_CLIENT_THEME so the untrusted demo never inherits a theme from packages/demo/.env when the trusted demo opts into one (ocean, amber). Without this the two demos were visually indistinguishable whenever the trusted one was themed, which defeats the trusted-vs-untrusted scenarios. Both are dev-profile only; production deployments are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The email-mode submit button said "Sending verification code..." while submitting, but the demo is a pure OAuth client — it hands off to the auth service and has no visibility into whether a verification code will actually be sent. In the HYPER-268 session-reuse path no OTP email goes out at all, so showing "Sending verification code..." momentarily was misleading. Matches the handle-mode button (and the shared SignInButton component) which already use "Redirecting..." for exactly this reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs blocked chooser enrichment on the real /oauth/authorize chooser (the one cross-client session reuse lands on): 1. Route filter only matched /account*. Upstream renders the inline chooser at /oauth/authorize too. isChooserRequest now matches both. 2. Script only intercepted window.__deviceSessions. The inline chooser at /oauth/authorize hydrates from window.__sessions instead. Both are intercepted via a shared interceptGlobal() helper. 3. Middleware spliced in immediately after expressInit, so compression's wrapped res.end ran on top of ours — we only ever saw gzipped bytes and the <script> never injected. Reuse findInsertionIndex from client-css-injection to land it after compression, same as CSS injection. DOM rewrite also reworked: TreeWalker finds the deepest element whose own text matches a known handle/sub, so we don't depend on upstream class names. The handle-wrap (a flex-row) is flipped to flex-column and the email label appended as a sibling of the handle, giving handle + email stacked in a shrink-wrapping left column with the chevron snug against the wider of the two lines in the right column. Stable classes epds-handle-label / epds-email-label let branding CSS restyle or reorder the pair. Verified end-to-end via Chromium driven through the docker-compose stack: trusted-demo sign-in, then untrusted-demo sign-in, lands on /oauth/authorize with both handle and email stacked correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
89f1c90 to
07d3817
Compare
Rename and de-reference files so external readers — who don't have access to our Linear workspace — aren't confronted with opaque `HYPER-268` tokens. - Rename `.changeset/hyper-268-cross-client-session-reuse.md` to `cross-client-session-reuse.md`; also unwrap the hard-wrapped body paragraphs and drop the internal-doc reference line. - Rename `.changeset/hyper-268-fix-headers-sent-crash.md` to `fix-post-flush-headers-sent-crash.md`; scrub the inline HYPER-268 mention from the body. - Scrub the HYPER-268 mention from `demo-neutral-submitting-copy.md`. - Rename `docs/design/hyper-268-session-reuse.md` to `cross-client-session-reuse.md`; demote `HYPER-268` from the H1 title to a "Tracking issue:" subtitle so contributors can still find the ticket without making it the first thing anyone sees. - Update the code-comment pointer in `pds-core/src/index.ts` to the new design-doc path. No user-visible behavior change; this is purely a naming / language cleanup.
|
Layer 4 of PR #96-follow-up shipped server + client wiring to hide the handle on the enriched chooser when the flow resolves to epds_handle_mode=random, plus unit tests for the DOM mutations. The two random/chooser mode scenarios in the feature file were tagged @pending — declared in Gherkin but never exercised end-to-end. Lift the random-mode scenario. The demo's /flow3 page already posts handle_mode=random through /api/oauth/login; auth-service's login-page router reads req.query.epds_handle_mode and forwards the full query via buildPdsAuthorizeRedirect, so pds-core's chooser middleware sees it on /oauth/authorize and injects <meta name="epds-handle-mode" content="random"> for the enrichment script to read. End-to-end coverage now catches regressions in any link of that chain. New step definitions: - "the demo client starts a new OAuth flow with random handle mode" — navigates to /flow3 on the trusted demo, which posts handle_mode=random. - "the enriched account picker renders without the handle visible" — waits for .epds-email-label (signal the script has run), asserts all .epds-handle-label elements are not visible. - "each row exposes the handle only via a title tooltip" — asserts every .epds-email-label has a non-empty title attribute. - "the email remains visible as the primary identifier" — asserts at least one .epds-email-label with text matching world.testEmail is visible. Remove the @pending chooser-mode scenario it was paired with: no new coverage — baseline "Both cookies valid" scenario already renders the enriched chooser without random mode, so handles visible there is already asserted by absence of the random-mode hide. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the failure-mode taxonomy behind the three-button upstream welcome page that some users see after PR #96. Documents a four-layer fix: auth-service cookie-pair tightening, a pds-core pre-route middleware that short-circuits before upstream renders, coverage of the "Another account" and "Sign up" escape hatches, and hiding the handle on the chooser when the current flow's handleMode is random. Gherkin scenarios cover every externally reproducible case plus the two new chooser behaviours. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gn-in page
Adds a pre-route Express middleware on /oauth/authorize and /account* that
parses the dev-id/ses-id cookie pair side-effect-free and checks the
device's bound-account count via provider.accountManager.listDeviceAccounts.
When the pair is missing/malformed or the device has zero bindings the
middleware responds 303 to auth-service's /oauth/authorize?prompt=login
and clears the stale cookies, making upstream's three-button welcome page
("Authenticate / Create new account / Sign in / Cancel") structurally
unreachable from ePDS.
This closes the regression path introduced by PR #96: the cookie scope
broadening let auth-service see dev-id, which in turn caused
shouldReuseSession to redirect to pds-core on any dev-id presence — even
when the underlying session was partial, stale, or had had its bindings
purged by migration-005. All four failure modes (partial pair, stale
fixation, TTL purge, scope transition) converge server-side on a
zero-binding device, so the guard keys off that invariant rather than
any individual upstream state transition.
See docs/design/session-reuse-bugs.md for the failure-mode taxonomy and
docs/design/pds-white-boxing.md item 18 for the new upstream dependencies.
Layer 4 of PR #96-follow-up shipped server + client wiring to hide the handle on the enriched chooser when the flow resolves to epds_handle_mode=random, plus unit tests for the DOM mutations. The two random/chooser mode scenarios in the feature file were tagged @pending — declared in Gherkin but never exercised end-to-end. Lift the random-mode scenario. The demo's /flow3 page already posts handle_mode=random through /api/oauth/login; auth-service's login-page router reads req.query.epds_handle_mode and forwards the full query via buildPdsAuthorizeRedirect, so pds-core's chooser middleware sees it on /oauth/authorize and injects <meta name="epds-handle-mode" content="random"> for the enrichment script to read. End-to-end coverage now catches regressions in any link of that chain. New step definitions: - "the demo client starts a new OAuth flow with random handle mode" — navigates to /flow3 on the trusted demo, which posts handle_mode=random. - "the enriched account picker renders without the handle visible" — waits for .epds-email-label (signal the script has run), asserts all .epds-handle-label elements are not visible. - "each row exposes the handle only via a title tooltip" — asserts every .epds-email-label has a non-empty title attribute. - "the email remains visible as the primary identifier" — asserts at least one .epds-email-label with text matching world.testEmail is visible. Remove the @pending chooser-mode scenario it was paired with: no new coverage — baseline "Both cookies valid" scenario already renders the enriched chooser without random mode, so handles visible there is already asserted by absence of the random-mode hide. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the failure-mode taxonomy behind the three-button upstream welcome page that some users see after PR #96. Documents a four-layer fix: auth-service cookie-pair tightening, a pds-core pre-route middleware that short-circuits before upstream renders, coverage of the "Another account" and "Sign up" escape hatches, and hiding the handle on the chooser when the current flow's handleMode is random. Gherkin scenarios cover every externally reproducible case plus the two new chooser behaviours. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gn-in page
Adds a pre-route Express middleware on /oauth/authorize and /account* that
parses the dev-id/ses-id cookie pair side-effect-free and checks the
device's bound-account count via provider.accountManager.listDeviceAccounts.
When the pair is missing/malformed or the device has zero bindings the
middleware responds 303 to auth-service's /oauth/authorize?prompt=login
and clears the stale cookies, making upstream's three-button welcome page
("Authenticate / Create new account / Sign in / Cancel") structurally
unreachable from ePDS.
This closes the regression path introduced by PR #96: the cookie scope
broadening let auth-service see dev-id, which in turn caused
shouldReuseSession to redirect to pds-core on any dev-id presence — even
when the underlying session was partial, stale, or had had its bindings
purged by migration-005. All four failure modes (partial pair, stale
fixation, TTL purge, scope transition) converge server-side on a
zero-binding device, so the guard keys off that invariant rather than
any individual upstream state transition.
See docs/design/session-reuse-bugs.md for the failure-mode taxonomy and
docs/design/pds-white-boxing.md item 18 for the new upstream dependencies.
Layer 4 of PR #96-follow-up shipped server + client wiring to hide the handle on the enriched chooser when the flow resolves to epds_handle_mode=random, plus unit tests for the DOM mutations. The two random/chooser mode scenarios in the feature file were tagged @pending — declared in Gherkin but never exercised end-to-end. Lift the random-mode scenario. The demo's /flow3 page already posts handle_mode=random through /api/oauth/login; auth-service's login-page router reads req.query.epds_handle_mode and forwards the full query via buildPdsAuthorizeRedirect, so pds-core's chooser middleware sees it on /oauth/authorize and injects <meta name="epds-handle-mode" content="random"> for the enrichment script to read. End-to-end coverage now catches regressions in any link of that chain. New step definitions: - "the demo client starts a new OAuth flow with random handle mode" — navigates to /flow3 on the trusted demo, which posts handle_mode=random. - "the enriched account picker renders without the handle visible" — waits for .epds-email-label (signal the script has run), asserts all .epds-handle-label elements are not visible. - "each row exposes the handle only via a title tooltip" — asserts every .epds-email-label has a non-empty title attribute. - "the email remains visible as the primary identifier" — asserts at least one .epds-email-label with text matching world.testEmail is visible. Remove the @pending chooser-mode scenario it was paired with: no new coverage — baseline "Both cookies valid" scenario already renders the enriched chooser without random mode, so handles visible there is already asserted by absence of the random-mode hide. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the failure-mode taxonomy behind the three-button upstream welcome page that some users see after PR #96. Documents a four-layer fix: auth-service cookie-pair tightening, a pds-core pre-route middleware that short-circuits before upstream renders, coverage of the "Another account" and "Sign up" escape hatches, and hiding the handle on the chooser when the current flow's handleMode is random. Gherkin scenarios cover every externally reproducible case plus the two new chooser behaviours. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…gn-in page
Adds a pre-route Express middleware on /oauth/authorize and /account* that
parses the dev-id/ses-id cookie pair side-effect-free and checks the
device's bound-account count via provider.accountManager.listDeviceAccounts.
When the pair is missing/malformed or the device has zero bindings the
middleware responds 303 to auth-service's /oauth/authorize?prompt=login
and clears the stale cookies, making upstream's three-button welcome page
("Authenticate / Create new account / Sign in / Cancel") structurally
unreachable from ePDS.
This closes the regression path introduced by PR #96: the cookie scope
broadening let auth-service see dev-id, which in turn caused
shouldReuseSession to redirect to pds-core on any dev-id presence — even
when the underlying session was partial, stale, or had had its bindings
purged by migration-005. All four failure modes (partial pair, stale
fixation, TTL purge, scope transition) converge server-side on a
zero-binding device, so the guard keys off that invariant rather than
any individual upstream state transition.
See docs/design/session-reuse-bugs.md for the failure-mode taxonomy and
docs/design/pds-white-boxing.md item 18 for the new upstream dependencies.
Layer 4 of PR #96-follow-up shipped server + client wiring to hide the handle on the enriched chooser when the flow resolves to epds_handle_mode=random, plus unit tests for the DOM mutations. The two random/chooser mode scenarios in the feature file were tagged @pending — declared in Gherkin but never exercised end-to-end. Lift the random-mode scenario. The demo's /flow3 page already posts handle_mode=random through /api/oauth/login; auth-service's login-page router reads req.query.epds_handle_mode and forwards the full query via buildPdsAuthorizeRedirect, so pds-core's chooser middleware sees it on /oauth/authorize and injects <meta name="epds-handle-mode" content="random"> for the enrichment script to read. End-to-end coverage now catches regressions in any link of that chain. New step definitions: - "the demo client starts a new OAuth flow with random handle mode" — navigates to /flow3 on the trusted demo, which posts handle_mode=random. - "the enriched account picker renders without the handle visible" — waits for .epds-email-label (signal the script has run), asserts all .epds-handle-label elements are not visible. - "each row exposes the handle only via a title tooltip" — asserts every .epds-email-label has a non-empty title attribute. - "the email remains visible as the primary identifier" — asserts at least one .epds-email-label with text matching world.testEmail is visible. Remove the @pending chooser-mode scenario it was paired with: no new coverage — baseline "Both cookies valid" scenario already renders the enriched chooser without random mode, so handles visible there is already asserted by absence of the random-mode hide. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>



Summary
Re-opens the rebased HYPER-268 work from #64. After a user signs in once via any OAuth client, a second client in the same browser no longer re-prompts for email OTP. ePDS recognises the existing device session and either auto-redirects (flow 1,
login_hintmatches) or shows the upstream account chooser (flow 2).Three cooperating pieces in
pds-core+auth-service:AUTH_HOSTNAMEis a subdomain ofPDS_HOSTNAME, a middleware wraps outboundSet-Cookieheaders fordev-id/ses-id(+:hashsidecars) to injectDomain=<parent>. Both the PDS and the auth subdomain can then read the cookies. Railway preview envs (random hostnames under.up.railway.app, which sits on the browser Public Suffix List) silently skip this middleware — no shared parent is possible there.GET /oauth/authorize, if adev-idcookie is present andprompt=loginisn't set, 302 straight to pds-core's upstream/oauth/authorize. Upstream then either auto-selects the matching session (flow 1) or renders the account chooser (flow 2)./accountshows only handles (often randomly generated on ePDS). A response-rewrite middleware injects a<script>that captures the upstream__deviceSessionspayload, labels each account with its email, and adds a "Use a different account" link pointing atauth.<parent>/oauth/authorize?prompt=loginas an escape hatch.Also carries two infra-only commits:
chore(docker): replace unused MailHog with Mailpit— the compose file had an obsoletemailhog:service; the e2e suite has been using Mailpit's API shape for a while. Prerequisite for running@session-reuselocally.chore(e2e): enable docker-compose @session-reuse runs— Caddy vhosts for a trusted + untrusted demo, network aliases so sibling containers can reach each other by public hostname without NAT-hairpinning, anallowPrivateIpsoption onmakeSafeFetch(gated byEPDS_ALLOW_PRIVATE_IPS) for the docker-internal DNS topology, and a cucumber profile sopnpm test:e2e -p session-reusepicks up just those scenarios.Regression proof
Disabled the pds-core cookie-domain middleware via a local kill switch and re-ran
pnpm test:e2e -p session-reuse. All four scenarios flipped to a characteristic failure signature:Then no new OTP email is sent to the test emailExpected: 0 / Received: 1— a fresh OTP goes out to the second clientThen the account chooser is displayedauth.*(expected the PDS host) — auth-service re-rendered its own email formwaitForURL(**/welcome)Restoring the middleware returns the scenarios to their passing baseline. Evidence that the tests catch the regression, not just the passing state.
Test plan
pnpm format:checkpassespnpm lintpassespnpm typecheckpassespnpm test— 554 passedpnpm test:coverage— above ratcheted thresholdspnpm test:e2e -p session-reuse— 19/38 steps passing; remaining 4-scenario failures are chooser-enrichment hydration + consent-page detection details, unrelated to the cookie-domain fix itself (filed as follow-up)@session-reuse)🤖 Generated with Claude Code