Heartbeat + clean-exit so slow sign-ins recover instead of stranding the user#154
Heartbeat + clean-exit so slow sign-ins recover instead of stranding the user#154
Conversation
…(NO FIX)
Add four RED scenarios under @otp-and-par-expiry that reproduce the
"stuck in a loop" failure modes a slow user can trip during OAuth
login. All four exercise the same root cause — upstream's PAR
request_uri (5 min sliding TTL, hardcoded in @atproto/oauth-provider)
dies while the user reads their email — but they surface on three
different downstream pages depending on which call hits the dead PAR
first:
* @otp-and-par-expiry: OTP-expired UI → Resend → submit lands on
auth-service /auth/choose-handle's "Session expired, please start
over" — the original bug a user reported.
* @multiple-resend: two Resend cycles silently age out the PAR
without ever showing the OTP-expired UI; same dead-end. Locks in
the regression for fixes that only patch the OTP-expired branch.
* @prompt-login: forced re-authentication with a slow user; browser
reaches /oauth/epds-callback but never /welcome.
* @recovery: backup-email recovery + slow user; lands on pds-core's
"Your sign-in took too long…" page.
The existing @otp-expiry scenario deliberately keeps the PAR alive
to isolate the auth-service-side TTL fix from the PDS layer. That is
faithful to the auth-service-side regression but masks the broader
problem: in production the PAR is already dead by the time OTP
expires, since 5 min < 10 min. None of the prior expiry fixes covered
the cross-layer interaction these scenarios exercise.
Implementation:
* One new step phrase, "the PAR request_uri is captured for later
expiry", that snapshots the request_uri before the user navigates
to a page where the URL no longer carries it (recovery scenario).
* captureRequestUriFromPage() falls back to a previously stashed
world.lastRequestUri so existing call sites still work.
* No production-code changes — these scenarios reproduce the bug on
origin/main and remain RED until the clean-exit fix lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New living document under docs/design/ that captures the bug a real
user reported (stuck in a sign-in loop after OTP-expired + Resend),
the three-timer interaction that causes it (OTP / PAR / auth_flow),
why the existing OTP-expiry scenarios missed it, and the strategy to
fix it.
Two layers, in priority order:
1. Clean exit — every error a slow user can trip must end with a
working path back to the OAuth client. Audit of all 12
renderError() call sites on the auth path is included, grouped
into three failure clusters (redirect-when-possible /
PAR-dead-on-handle-page / nothing-recoverable). Today every
one of these is a dead-end static page.
2. Heartbeat — opportunistic UX polish that pings the existing
/_internal/ping-request endpoint while the user is on the OTP
and recovery forms, so the common case (slow inbox / slow
typist) doesn't hit the dead-end at all. Bounded by auth_flow's
60-min ceiling; no new tokens, no new credentials.
Implementation will land mostly in auth-service:
* @certified-app/shared already exposes resolveClientMetadata(),
so cluster A/B redirects can be built without a new pds-core
internal endpoint and without any new requestManager/provider
access. Net new white-boxing surface: zero.
* Heartbeat reuses the existing /_internal/ping-request endpoint;
only the public auth-service /auth/ping wrapper is new.
* Recovery flow's clientId is propagated via the login-page →
recovery-link path, not by exposing expired PAR rows from
pds-core.
Includes the resolved decisions for all three open questions (Start
Over destination, cluster B redirect feasibility, recovery clientId
propagation) — each resolved toward the contract of "minimum friction
for the end user, fewest steps to retry".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…form
Adds a server-mediated heartbeat that slides the upstream PAR
request_uri inactivity timer (atproto's
AUTHORIZATION_INACTIVITY_TIMEOUT, 5 min, hardcoded) while a user is on
the OTP form or the recovery OTP form. Without it, a user who reads
their email for >5 min trips a downstream "session expired" page on
submit even though both their OTP and auth_flow are still well within
their own (10 min and 60 min) TTLs. See
docs/design/par-expiry-and-clean-exits.md for the broader strategy.
Implementation lives entirely in auth-service:
* New GET /auth/ping route reads the user's epds_auth_flow cookie,
looks up the auth_flow row, and proxies the existing
pingParRequest() helper into pds-core's existing
/_internal/ping-request. No new pds-core endpoint, no new
requestManager / provider access. Bounded by the auth_flow's
own 60-min TTL — once the auth_flow row is gone, the ping
becomes a no-op and the browser stops pinging.
* Browser-side: a 3-min setInterval is inlined into the login
page's OTP step and into the recovery OTP form. The interval
starts when the OTP step becomes visible (or, for login_hint
flows, on initial render), pauses on form submit, resumes on
a verify failure, and stops on page unload.
* Test toggle: a `?no_heartbeat=1` query param disables the
inlined interval, but only when EPDS_TEST_HOOKS=1 is also set
server-side. Lets future e2e scenarios prove the heartbeat is
what saves a slow user without removing it in production.
Heartbeat is opportunistic UX polish, not a security boundary. A
truly absent user (closed tab, walked away long enough) still ends
up on whatever expiry page exists today; the clean-exit work
tracked separately in the same design doc gives those pages a
working path back to the OAuth client.
Tests:
* Unit: 6 cases on the route covering no-cookie / dead-flow /
happy-path / dead-PAR / operational-error / Cache-Control.
* Toggle: 9 cases covering heartbeatEnabledFor() precedence
(production default, EPDS_TEST_HOOKS gating, no_heartbeat=1
interaction) plus rendered-HTML smoke tests for both the
login page and the recovery OTP form.
* No e2e proof yet — that lands when the docker-compose stack
is rebuilt against this branch (the running stack belongs to
another worktree on origin/main).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a new @par-heartbeat scenario that proves the in-page heartbeat reaches /auth/ping and that /auth/ping forwards to pds-core's /_internal/ping-request without manual intervention. The scenario loads the OTP form, then synchronously invokes the same fetch the 3-min setInterval would fire — same origin, same cookies, same browser security boundary — and asserts ok:true on the response. This is wiring liveness; routing logic is covered by the unit tests in heartbeat.test.ts. The existing four @otp-and-par-expiry scenarios remain RED — they hard-delete the PAR row via /_internal/test/delete-par, which the heartbeat cannot revive. They turn GREEN when the clean-exit work (layer 2 in docs/design/par-expiry-and-clean-exits.md) gives the dead-PAR pages a working path back to the OAuth client. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ure instead of stranding the user
Every dead-end auth-service / pds-core page that previously rendered a
static "Session expired" or "Authentication failed" page with no
recoverable action now bounces the user back to the OAuth client's
registered redirect_uri with the RFC 6749 §4.1.2.1 error parameters
(`error`, `error_description`, `iss`, and `state` when available).
The client's own UI then handles retry. See
docs/design/par-expiry-and-clean-exits.md for the audit of the 12
call sites this covers and the three failure clusters they
collapse into.
White-boxing budget: zero net new pds-core internal endpoints, zero
new requestManager / provider access. Auth-service resolves OAuth
client metadata via the existing @certified-app/shared
resolveClientMetadata() helper to recover redirect_uris[0] when the
PAR row has died. Pds-core's existing handleCallbackError() is
extended to do the same thing on the /oauth/epds-callback path
when Step 2 itself throws (the row is gone before its
.parameters.redirect_uri can be captured); the only new piece of
trust-boundary plumbing is an additional `client_id` field on the
HMAC-signed callback URL so pds-core has a clientId in scope. The
field is signed so an attacker cannot tamper to redirect a victim's
flow at a different OAuth client.
Implementation surface:
* New auth-service helpers: `lib/clean-exit.ts` (response emitter
that prefers redirect-to-client and falls back to a styled
Start Over page) and `lib/redirect-to-client-error.ts` (URL
builder for the redirect path).
* `complete.ts`, `choose-handle.ts`: replaced six renderError
dead-ends with cleanExit() calls. flow.clientId is included in
the signed callback so pds-core's catch block can recover it.
* pds-core `lib/epds-callback-error.ts`: now async; gains a new
`signedClientId` opt that triggers a metadata-resolved redirect
fallback when no capturedRedirectUri was stashed.
* `@certified-app/shared`:
- `crypto.ts` `CallbackParams` adds an optional `client_id`
field (signed via the same '' sentinel pattern as `handle`).
- `client-metadata.ts` adds `redirect_uris` to the
`ClientMetadata` interface.
- `render-error.ts` accepts an optional `startOverHref` so the
Start Over fallback page can show a working button when a
client URL is in scope.
Tests:
* 7 new pds-core tests for the signedClientId fallback path
(happy redirect, missing redirect_uris, lookup throws,
malformed URL, captured-vs-signed precedence, no signedClientId
no-op). resolveClientMetadata is mocked at the module
boundary; existing handleCallbackError tests were converted
from sync to async.
* 4 new shared crypto tests pinning the signed-client_id
contract (round-trip, tamper rejection, undefined sentinel,
omit-vs-present asymmetry).
* The four @otp-and-par-expiry e2e scenarios go GREEN: they now
assert "browser lands back at the demo client with an auth
error" (the demo translates `error=access_denied` to
`?error=auth_failed`) instead of asserting full happy-path
completion. The hard-deleted PAR cannot be revived; clean-exit
is the contract those scenarios exercise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the Status section to reflect that all four working layers (reproduction scenarios, audit, heartbeat, clean-exit) have landed. The four originally-RED @otp-and-par-expiry scenarios are now GREEN under the clean-exit contract — they assert that the user lands back at the OAuth client with an auth error rather than being stranded on a static page. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…EME takes effect The demo's OAuth client metadata route was statically prerendered at `next build` time, so its `brand_color`, `background_color`, and `branding.css` fields were frozen at the defaults regardless of the container's runtime EPDS_CLIENT_THEME. That broke the @client-branding e2e scenarios on local docker-compose runs and made the two-demo-clients split in docker-compose.yml (trusted vs untrusted, distinguished by EPDS_CLIENT_THEME) a no-op. Add `export const dynamic = 'force-dynamic'` to the route, matching the sibling `page.tsx` which already carries the same flag for the same reason. Each request now re-reads `EPDS_CLIENT_THEME` so the trusted demo serves clay's palette and the untrusted demo serves the defaults — the visible distinction the compose file was always trying to produce. No code changes anywhere else — the theme primitives in packages/demo/src/lib/theme.ts already supported this; the route just needed the dynamic flag to consult them at request time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🦋 Changeset detectedLatest commit: 160f20d The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📝 WalkthroughWalkthroughAdds an opportunistic PAR heartbeat (server /auth/ping + client polling) to reduce slow-user expiries, threads ChangesPAR Heartbeat and Clean-Exit Recovery
Sequence Diagram(s)sequenceDiagram
participant User as User/Browser
participant Auth as Auth Service
participant Ping as Auth Service /auth/ping
participant PDS as PDS (PAR)
participant Client as OAuth Client
User->>Auth: GET /oauth/authorize → render OTP (heartbeatEnabled=true)
loop every 3min while OTP active
User->>Ping: fetch /auth/ping (with cookie)
Ping->>PDS: pingParRequest(requestUri)
PDS-->>Ping: success / failure (status)
Ping-->>User: {ok:true} or {ok:false, reason:...}
end
User->>Auth: submit OTP → /auth/complete
Auth->>PDS: bridge to PAR /oauth/epds-callback
alt PAR expired or callback error
Auth->>Auth: handleCallbackError() (try captured redirect → try metadata redirect)
Auth->>Client: 303 redirect ?error=...
else redirect construction fails
Auth-->>User: Styled HTML error with optional "Return to sign in"
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🚅 Deployed to the ePDS-pr-154 environment in ePDS
|
Coverage Report for CI Build 25334769474Coverage increased (+2.4%) to 55.53%Details
Uncovered Changes
Coverage Regressions17 previously-covered lines in 4 files lost coverage.
Coverage Stats
💛 - Coveralls |
There was a problem hiding this comment.
Pull request overview
This PR hardens the ePDS OAuth sign-in flow against PAR expiry by adding a browser-driven heartbeat during OTP entry and by redirecting users back to the OAuth client with standard error parameters when recovery is no longer possible. It also includes a small demo fix so runtime theme settings affect the published client metadata.
Changes:
- Added
/auth/pingplus OTP/recovery-page heartbeat wiring to keep PARs alive while users wait for email codes. - Reworked dead-end expiry/error paths to use clean exits back to OAuth clients, including signed
client_idfallback handling in/oauth/epds-callback. - Added tests, e2e coverage, design docs, and a demo metadata route fix for runtime theming.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/shared/src/render-error.ts | Adds optional “Start over” CTA rendering to shared HTML error pages. |
| packages/shared/src/crypto.ts | Extends signed callback payload with optional client_id. |
| packages/shared/src/client-metadata.ts | Documents/supports redirect_uris in shared client metadata typing. |
| packages/shared/src/tests/crypto.test.ts | Adds callback-signing tests for client_id sentinel behavior. |
| packages/pds-core/src/lib/epds-callback-error.ts | Adds async fallback redirect logic using signed client_id metadata lookup. |
| packages/pds-core/src/index.ts | Threads client_id through callback verification and error handling. |
| packages/pds-core/src/tests/epds-callback-error.test.ts | Expands callback error-path coverage for signed-client fallback. |
| packages/demo/src/app/client-metadata.json/route.ts | Forces runtime rendering for demo client metadata/theme fields. |
| packages/auth-service/src/routes/recovery.ts | Wires heartbeat flag into recovery OTP rendering. |
| packages/auth-service/src/routes/preview.ts | Disables heartbeat in preview-only surfaces. |
| packages/auth-service/src/routes/login-page.ts | Adds heartbeat toggle logic and OTP-page keepalive script. |
| packages/auth-service/src/routes/heartbeat.ts | Introduces /auth/ping route to proxy PAR keepalive server-side. |
| packages/auth-service/src/routes/complete.ts | Replaces dead-end errors with clean-exit behavior and signs client_id. |
| packages/auth-service/src/routes/choose-handle.ts | Replaces handle-flow dead ends with clean-exit behavior and signs client_id. |
| packages/auth-service/src/lib/render-error.ts | Threads new start-over options through auth-service error renderer. |
| packages/auth-service/src/lib/redirect-to-client-error.ts | Adds helper to build OAuth error redirects from client metadata. |
| packages/auth-service/src/lib/clean-exit.ts | Centralizes redirect-or-fallback clean-exit response handling. |
| packages/auth-service/src/index.ts | Mounts the new heartbeat router. |
| packages/auth-service/src/tests/login-page.test.ts | Updates login page test fixtures for heartbeat option. |
| packages/auth-service/src/tests/heartbeat.test.ts | Adds unit tests for /auth/ping. |
| packages/auth-service/src/tests/heartbeat-toggle.test.ts | Adds tests for heartbeat toggle/rendering behavior. |
| features/passwordless-authentication.feature | Adds e2e scenarios for PAR expiry clean exits and heartbeat wiring. |
| e2e/step-definitions/auth.steps.ts | Adds steps for PAR capture/expiry and clean-exit assertions. |
| docs/design/par-expiry-and-clean-exits.md | Documents timer interactions, failure clusters, and implementation choices. |
| .changeset/par-heartbeat-keeps-slow-signins-alive.md | User-facing changeset for heartbeat behavior. |
| .changeset/demo-client-metadata-runtime-theme.md | Operator-facing changeset for demo runtime theming fix. |
| .changeset/clean-exit-on-expired-signin.md | User/client-facing changeset for clean-exit behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
packages/pds-core/src/lib/epds-callback-error.ts (1)
104-208: 🏗️ Heavy liftRefactor
handleCallbackErrorto reduce branching complexity.Line 104 currently packs classification, logging, two redirect tiers, and HTML fallback into one path-heavy function. Splitting tier logic into small helpers will improve testability and address the complexity gate.
Refactor sketch
-export async function handleCallbackError(opts: HandleCallbackErrorOpts): Promise<void> { - // classify + log + tier1a + tier1b + html fallback -} +function buildErrorRedirect( + redirectUri: string, + code: 'access_denied' | 'server_error', + description: string, + pdsUrl: string, + state?: string, +): URL | null { /* parse + set params + return null on invalid */ } + +async function tryRedirectFromSignedClient( + signedClientId: string, + code: 'access_denied' | 'server_error', + description: string, + pdsUrl: string, + logger: Pick<Logger, 'warn' | 'error'>, +): Promise<string | null> { /* resolve metadata + return redirect uri */ } + +export async function handleCallbackError(opts: HandleCallbackErrorOpts): Promise<void> { + // 1) classify+log + // 2) tier1a via helper + // 3) tier1b via helper + // 4) html fallback +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/pds-core/src/lib/epds-callback-error.ts` around lines 104 - 208, handleCallbackError is too branchy—split the Tier-1a and Tier-1b redirect logic and the logging into small helpers to reduce complexity and improve testability: extract the captured-redirect flow into a helper (e.g., handleCapturedRedirect(capturedRedirectUri, capturedState, pdsUrl, res, logger, code, description)) that validates the URL, sets Cache-Control, appends query params and performs the 303 redirect; extract the signed-client fallback into another helper (e.g., handleSignedClientFallback(signedClientId, pdsUrl, res, logger, code, description, resolveClientMetadata)) that resolves metadata, validates the fallback redirect URI and redirects or returns a signal to fall back to HTML; keep classifyCallbackError usage and the expiry/error logging in the main function but call these helpers and short-circuit on successful redirect so the remaining HTML fallback is reached only when helpers return false or throw handled errors.docs/design/par-expiry-and-clean-exits.md (2)
201-210: ⚡ Quick winAlign the dead-end target table with the final contract.
Line 203 and Line 209 still describe fallback paths (
/oauth/authorizerestart and/_internal/par...-style lookup/defer) that conflict with the resolved decisions later in this doc (Line 257-260, Line 293-313). Please update this table to the final agreed behavior so future work doesn’t reintroduce rejected paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/par-expiry-and-clean-exits.md` around lines 201 - 210, Update the dead-end target table rows to match the final contract decisions described later in the doc: replace any mentions of falling back to restarting via `/oauth/authorize` and any `/_internal/par…`-style lookup/defer behavior with the resolved behavior (redirect to client with appropriate error when redirect_uri is recoverable; otherwise show Start Over) for the affected entries—specifically update the `/auth/complete` (no cookie / no flow / better-auth session error) rows and the `/auth/choose-handle` rows (no cookie/no flow, session errors, no session user, PAR ping fails on render/POST) so the table’s Target column exactly mirrors the decisions in the later contract text.
340-354: ⚡ Quick winMake the status block explicitly time-scoped.
This section uses “now GREEN” plus hard counts and commit claims, which will stale quickly in a living doc. Add an “as of ” stamp or CI run reference to preserve trustworthiness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/par-expiry-and-clean-exits.md` around lines 340 - 354, The status block currently uses ephemeral phrases like "now GREEN" and hard counts/commit mentions (e.g., commits `b1fc940`, `2e4d327`, test tags `@par-heartbeat`, `@otp-and-par-expiry`) which will age; update the paragraph to be explicitly time-scoped by appending an "as of <date>" stamp or a CI run identifier (e.g., a specific workflow run ID or timestamp) after the status statement, and where you reference counts of scenarios/tests/commits keep the same details but add the same date/CI reference to each claim so readers can verify the state against that snapshot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/par-expiry-and-clean-exits.md`:
- Around line 265-268: Update the wording around `state` in the PAR expiry
paragraph to correct the RFC claim: explicitly state that RFC 6749 §4.1.2.1
requires `state` in the error response if it was present in the authorization
request, but because the PAR expiry loses the original `state` (it lived in the
PAR) the implementation pragmatically degrades and issues an error redirect
without `state`; clarify this is a limitation of our system (PAR expiry) and not
permitted by the spec, and note that clients will treat such redirects as
anonymous errors and restart.
In `@packages/auth-service/src/lib/clean-exit.ts`:
- Around line 115-124: resolveClientHome currently returns metadata.client_uri
verbatim which can lead to unsafe schemes being rendered as startOverHref;
update resolveClientHome to validate and sanitize metadata.client_uri by
attempting to parse it with the URL constructor, ensure the parsed URL.protocol
is only "http:" or "https:" and that it has a hostname (or other required
components), and only return metadata.client_uri when it passes these checks;
otherwise fall back to the existing URL(clientId).origin logic or return null.
Apply this validation inside resolveClientHome before returning
metadata.client_uri so startOverHref only ever receives safe http/https URLs.
In `@packages/auth-service/src/lib/redirect-to-client-error.ts`:
- Around line 89-109: The helper currently accepts any URL scheme because it
only checks new URL(redirectUri); after constructing the URL in the try block
(or immediately after parsing), validate that url.protocol is either 'http:' or
'https:' and if not, log a warning (same shape as the existing logger.warn
including err/clientId/redirectUri context) and return null; update the logic
around the try/catch in redirect-to-client-error.ts so non-web schemes (e.g.
'javascript:') are rejected before setting searchParams and returning
url.toString().
In `@packages/auth-service/src/routes/heartbeat.ts`:
- Around line 60-68: The current heartbeat handler treats every pingParRequest()
failure as 'par_expired', which makes transient failures terminal; update the
error handling in the block that calls pingParRequest(flow.requestUri, pdsUrl,
internalSecret) so that you only set HeartbeatResponse.reason = 'par_expired'
when ping.status === 410 (or the explicit indicator of an expired PAR), and for
other failures (network timeouts, 5xx, non-410 status, or ping.err present)
return a non-terminal reason such as 'par_unreachable' or 'transient_error' (and
keep logging the details with logger.debug({ status: ping.status, err: ping.err,
flowId }, ...)). Ensure the response shape (HeartbeatResponse) is updated if
necessary to accept the new reason values and do not change the flow state to
terminal in the non-410 cases.
In `@packages/shared/src/render-error.ts`:
- Around line 66-68: The current startOverHtml construction renders any escaped
startOverHref (via escapeHtml) but doesn't validate the URL scheme, allowing
dangerous schemes like javascript:; update the logic that builds startOverHtml
so it only renders a link when startOverHref is a valid absolute URL whose
protocol is exactly "http:" or "https:" (use the URL constructor in a try/catch
to parse and verify url.protocol === 'http:' || url.protocol === 'https:'),
otherwise set startOverHtml to an empty string; reference the
startOverHtml/startOverHref/startOverLabel variables and the existing escapeHtml
call when applying this validation.
---
Nitpick comments:
In `@docs/design/par-expiry-and-clean-exits.md`:
- Around line 201-210: Update the dead-end target table rows to match the final
contract decisions described later in the doc: replace any mentions of falling
back to restarting via `/oauth/authorize` and any `/_internal/par…`-style
lookup/defer behavior with the resolved behavior (redirect to client with
appropriate error when redirect_uri is recoverable; otherwise show Start Over)
for the affected entries—specifically update the `/auth/complete` (no cookie /
no flow / better-auth session error) rows and the `/auth/choose-handle` rows (no
cookie/no flow, session errors, no session user, PAR ping fails on render/POST)
so the table’s Target column exactly mirrors the decisions in the later contract
text.
- Around line 340-354: The status block currently uses ephemeral phrases like
"now GREEN" and hard counts/commit mentions (e.g., commits `b1fc940`, `2e4d327`,
test tags `@par-heartbeat`, `@otp-and-par-expiry`) which will age; update the
paragraph to be explicitly time-scoped by appending an "as of <date>" stamp or a
CI run identifier (e.g., a specific workflow run ID or timestamp) after the
status statement, and where you reference counts of scenarios/tests/commits keep
the same details but add the same date/CI reference to each claim so readers can
verify the state against that snapshot.
In `@packages/pds-core/src/lib/epds-callback-error.ts`:
- Around line 104-208: handleCallbackError is too branchy—split the Tier-1a and
Tier-1b redirect logic and the logging into small helpers to reduce complexity
and improve testability: extract the captured-redirect flow into a helper (e.g.,
handleCapturedRedirect(capturedRedirectUri, capturedState, pdsUrl, res, logger,
code, description)) that validates the URL, sets Cache-Control, appends query
params and performs the 303 redirect; extract the signed-client fallback into
another helper (e.g., handleSignedClientFallback(signedClientId, pdsUrl, res,
logger, code, description, resolveClientMetadata)) that resolves metadata,
validates the fallback redirect URI and redirects or returns a signal to fall
back to HTML; keep classifyCallbackError usage and the expiry/error logging in
the main function but call these helpers and short-circuit on successful
redirect so the remaining HTML fallback is reached only when helpers return
false or throw handled errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f5e441cc-a167-49f6-8eb6-7ed93ec7bd77
📒 Files selected for processing (27)
.changeset/clean-exit-on-expired-signin.md.changeset/demo-client-metadata-runtime-theme.md.changeset/par-heartbeat-keeps-slow-signins-alive.mddocs/design/par-expiry-and-clean-exits.mde2e/step-definitions/auth.steps.tsfeatures/passwordless-authentication.featurepackages/auth-service/src/__tests__/heartbeat-toggle.test.tspackages/auth-service/src/__tests__/heartbeat.test.tspackages/auth-service/src/__tests__/login-page.test.tspackages/auth-service/src/index.tspackages/auth-service/src/lib/clean-exit.tspackages/auth-service/src/lib/redirect-to-client-error.tspackages/auth-service/src/lib/render-error.tspackages/auth-service/src/routes/choose-handle.tspackages/auth-service/src/routes/complete.tspackages/auth-service/src/routes/heartbeat.tspackages/auth-service/src/routes/login-page.tspackages/auth-service/src/routes/preview.tspackages/auth-service/src/routes/recovery.tspackages/demo/src/app/client-metadata.json/route.tspackages/pds-core/src/__tests__/epds-callback-error.test.tspackages/pds-core/src/index.tspackages/pds-core/src/lib/epds-callback-error.tspackages/shared/src/__tests__/crypto.test.tspackages/shared/src/client-metadata.tspackages/shared/src/crypto.tspackages/shared/src/render-error.ts
This comment has been minimized.
This comment has been minimized.
…e + Blacksmith) A bundle of small, mechanically-independent fixes raised on PR #154: * heartbeat: only treat a 404 from pds-core's /_internal/ping-request as terminal `par_expired`. Anything else (5xx, network blip, thrown error with no status) is reported as `transient` so the browser keeps polling and the next tick can recover. Pre-fix, a single dropped packet during otherwise-healthy operation would terminate keepalive permanently and re-introduce the very dead-end the heartbeat exists to prevent. (CodeRabbit, Copilot) * URL scheme validation on every href that ends up rendered into a page (`renderError` startOverHref, `clean-exit.ts`'s `resolveClientHome`, `redirect-to-client-error.ts`'s redirect target). escapeHtml does not neutralise `javascript:` URLs; defence-in-depth rejects everything but http:/https: at parse time. atproto's @atproto/oauth-provider already validates redirect_uris at PAR creation, so production hits should be unreachable — but the catch path exists precisely to spare the user a 500, and an unhandled exotic scheme would defeat that. (CodeRabbit x3) * `heartbeatEnabledFor` now also reads `req.body.no_heartbeat`, making the test toggle work symmetrically across the recovery POST handlers (which strip query params from re-renders). `renderRecoveryForm` and `renderRecoveryOtpForm` propagate the flag as a hidden field when explicitly disabled, so a future @no-heartbeat-recovery scenario can prove the heartbeat is what saves a slow recovery user. The toggle has no effect outside EPDS_TEST_HOOKS=1 + the literal value '1', so this stays a no-op in production. (Copilot) * Updated `@par-callback-error` scenario to assert the new clean-exit contract (browser lands back at the demo client with an auth error) instead of the static-HTML fallback. With `client_id` now flowing through the signed callback URL, pds-core's catch block resolves the client's metadata and redirects, so the static HTML fallback is only reached in the truly-no-clientId case which this scenario does not exercise. Removed the two now-orphan step defs (`the response body is not raw JSON`, `the response body explains that sign-in timed out`). (Blacksmith CI) * SonarQube S5332: changed `http://core:3000` to `https://` in the heartbeat unit test fixture. The literal flows from process.env to the mocked `pingParRequest` and back as a positional argument we assert on — never issued as a real request — but the lint rule doesn't introspect, and using https everywhere costs us nothing. * Trimmed the clean-exit changeset to End-users-only. The change has no client-developer action item; spec-compliant OAuth libraries already handle the error response shape we now emit consistently. Cross-references #150, #151 inline. * Design doc fixes: corrected the RFC 6749 §4.1.2.1 wording on `state` (the spec REQUIRES it on error redirects when present on the request; missing-state on the dead-PAR path is a pragmatic degradation, not spec-permitted), and added a note above the dead-end-page table flagging that some Target cells speculated about approaches that the resolved decisions replaced. (CodeRabbit x2) 927 unit tests pass (+ 4 new toggle tests + 1 new transient-on-thrown test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
(comment generated by Claude Code via the Review feedback resolved in 9fb0e69Inline threads addressed (see per-thread replies for details):
Other CI surfaces:
CodeRabbit review-summary nitpicks NOT applied:
927 unit tests pass; |
…passes out of the box The @client-branding e2e scenarios assert that the trusted demo's published OAuth metadata advertises a non-default brand_color and branding.css, and pin the assertion to the amber palette's #1a1208 background. That only works if `EPDS_CLIENT_THEME` is set to a name that exists in `packages/demo/src/lib/theme.ts`'s `presets` map — currently `ocean` or `amber`. Without it, getTheme() returns null and the metadata falls back to the unbranded defaults. The previous .env.example had `EPDS_CLIENT_THEME=amber` commented out under a "Required (uncomment) to run the client-branding e2e feature locally" note. That worked for fresh contributors who read the comment, but it left the docker-compose stack failing six @client-branding scenarios out of the box for anyone who didn't. Uncomment the default so `cp packages/demo/.env.example packages/demo/.env` (which `scripts/setup.sh` does for fresh checkouts) leaves the trusted demo themed, and the trusted-vs- untrusted split that docker-compose.yml has always tried to produce finally ends up visible. The companion change in 8bcb344 (force-dynamic on the metadata route) is a prerequisite — without it `next build` baked the unset-env state into the static prerender and `EPDS_CLIENT_THEME` at runtime was a no-op. Both changes ship together; no separate changeset for either one (operator-noticeable behaviour combined under the existing feature changesets). Drops the demo-client-metadata-runtime-theme changeset from 8bcb344 — operator-facing detail of the dev-loop fix is captured in CLAUDE / git log; release notes don't need it. Verified locally: curl -sS https://demo.epds-poc1.test.certified.app/client-metadata.json \ | jq '.brand_color, (.branding != null)' # "#f59e0b" (amber primary, was "#2563eb" default) # true (branding object now present, was absent) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When the user's OTP has expired (waited >10 min, or made the same
mistake the original PR-154 reporter did) the verify form'd surface
the message "Invalid or expired code. Please try again." inside the
red error banner — but the actual Resend button lives further down
the form, below a separate "Back to email" link. Easy to miss.
Inline a "Send a new code" action right inside the error banner so
the recovery is one click from where the user's eyes already are.
Implementation:
* `showErrorWithAction(msg, label, onClick)` builds the banner
DOM imperatively. The label uses `.textContent` (not
`.innerHTML`), so a reflected error string can never inject
HTML. The inline button has no styling beyond underline + cursor
so it reads as part of the error sentence.
* The verify-form submit handler runs the better-auth error
string through `/expir|too long/i` to decide whether to surface
the inline action. The substring catches better-auth's "Invalid
or expired code", auth-service's "OTP expired", and any future
wording with "expir" or "too long" in it. Non-expired errors
(typos, brute-force lockout) keep the plain showError path so
a plain "Invalid code" doesn't carry an inappropriate "Send a
new code" CTA.
* Click target is the existing `#btn-resend` element, so all the
rate-limiting, mailpit-clear, and OTP-state-bookkeeping the
Resend button already does is reused — the inline action is
purely an alternate entry point.
4 new tests pin the contract: textContent-only label sink (XSS
guard), the substring regex shape, the action label and click
target, and the non-expired fallback to plain showError.
Triggered by a user report against PR #154's Railway preview: the
basic clean-exit path was working (browser landed on demo with
`?error=auth_failed`) but the user reasonably read the demo's
generic "Authentication failed" banner as a dead end. Surfacing
the Resend earlier prevents the user from getting that far in the
first place when their OTP is the only thing that died.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/auth-service/src/routes/recovery.ts (1)
470-494: 💤 Low valueHeartbeat scripts are duplicated between render functions.
The PAR heartbeat script in
renderRecoveryOtpForm(lines 470-494) is nearly identical to the one inrenderRecoveryForm(lines 359-385). Consider extracting a shared helper function that generates the script string, accepting the heartbeat flag as a parameter.That said, given these are self-contained inline scripts and the duplication is local to this file, this is a minor observation that can be deferred.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/auth-service/src/routes/recovery.ts` around lines 470 - 494, Extract the duplicated PAR heartbeat inline script into a single helper function (e.g., getParHeartbeatScript or createParHeartbeatScript) that accepts the heartbeat flag (previously using JSON.stringify(opts.heartbeatEnabled)) and returns the same script string; update renderRecoveryOtpForm and renderRecoveryForm to call this helper instead of inlining the script so the ping/stop logic, setInterval, beforeunload handler, and form submit listeners remain identical and behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/auth-service/src/routes/recovery.ts`:
- Around line 470-494: Extract the duplicated PAR heartbeat inline script into a
single helper function (e.g., getParHeartbeatScript or createParHeartbeatScript)
that accepts the heartbeat flag (previously using
JSON.stringify(opts.heartbeatEnabled)) and returns the same script string;
update renderRecoveryOtpForm and renderRecoveryForm to call this helper instead
of inlining the script so the ping/stop logic, setInterval, beforeunload
handler, and form submit listeners remain identical and behavior is unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bd303926-3b7f-4bff-8374-d1ecb0c99fd2
📒 Files selected for processing (18)
.changeset/clean-exit-on-expired-signin.mdpackages/auth-service/src/__tests__/build-epds-callback-url.test.tspackages/auth-service/src/__tests__/clean-exit.test.tspackages/auth-service/src/__tests__/heartbeat-toggle.test.tspackages/auth-service/src/__tests__/login-page.test.tspackages/auth-service/src/__tests__/redirect-to-client-error.test.tspackages/auth-service/src/__tests__/render-error.test.tspackages/auth-service/src/lib/clean-exit.tspackages/auth-service/src/routes/complete.tspackages/auth-service/src/routes/login-page.tspackages/auth-service/src/routes/recovery.tspackages/pds-core/src/__tests__/epds-callback-error.test.tspackages/pds-core/src/lib/epds-callback-error.tspackages/shared/src/__tests__/render-error.test.tspackages/shared/src/__tests__/start-over-href.test.tspackages/shared/src/index.tspackages/shared/src/start-over-href.tsvitest.config.ts
✅ Files skipped from review due to trivial changes (2)
- .changeset/clean-exit-on-expired-signin.md
- vitest.config.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/auth-service/src/tests/heartbeat-toggle.test.ts
- packages/auth-service/src/routes/login-page.ts
- packages/pds-core/src/tests/epds-callback-error.test.ts
This comment has been minimized.
This comment has been minimized.
CI's e2e suite failed on `Too many failed OTP attempts locks out the token` and the related lockout scenario after 3d31876 added the inline "Send a new code" action to the OTP-expired error banner. The `'the verification form shows an "OTP expired" error'` step asserted `toHaveText(expected)` (exact equality), so the appended action label broke the match. Switch to `toContainText`. The expected string ("OTP expired", "locked out", etc.) still has to be present in the banner — we just no longer fail when extra inline UI sits next to it. Local 4 @otp-and-par-expiry scenarios still pass; the brute-force lockout scenarios that broke on CI use the same step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-in A user reported on PR #154's Railway preview: starts flow3 with prompt=login, waits 11 min on the OTP form, clicks Resend, gets a fresh OTP, types it in, lands on the demo with "Authentication failed". Technically the clean-exit was working (lands at the demo with `?error=auth_failed`, can retry), but the experience is a lie: we offered Resend as if it would complete the flow, accepted their fresh code, then told them sign-in failed. Fix: don't pretend Resend will work when the upstream PAR is already dead. * **`GET /auth/abort` route** (auth-service). Browser-driven clean exit. Reads the auth_flow cookie, runs the same `cleanExit()` helper as the unrecoverable-error paths in /auth/complete and /auth/choose-handle: redirect to the OAuth client's redirect_uri with `error=access_denied`, or fall back to the styled "Return to sign in" page when the client is unknown. Cookie is cleared because the flow is being abandoned. * **Proactive notice on the OTP form.** When the heartbeat's /auth/ping returns par_expired / flow_expired / no_cookie, the form replaces its current state with: "Your sign-in has timed out. The code we sent will no longer work. Start sign-in again from the app you came from." OTP boxes, Resend, Back, and Verify are all disabled. A single Start over button navigates to /auth/abort. No automatic redirect — the user makes the choice. Clicking elsewhere on the page (e.g. the Powered by Certified link) still works. * **Reactive abort gates** on the Resend click and Verify submit, defence in depth on top of the proactive notice. Each pings /auth/ping just-in-time and bails to /auth/abort if the flow is dead. Catches the race where the user beats the proactive notice (heartbeat hasn't fired yet, network glitch delayed the response, etc.). Resend in particular MUST gate here — otherwise better-auth would happily issue an OTP that cannot complete the flow. * Existing pre-PR-154 scenarios `@otp-and-par-expiry` and `@multiple-resend` updated to match the new contract: the user no longer sees the misleading "OTP expired → Resend → fresh OTP → fails" cycle. Both scenarios still assert "lands at demo with auth_failed" — same end state, fewer intermediate UI steps because the abort gate fires earlier. Tests: * 4 new tests on /auth/abort: cleanExit invocation shape, cookie clearing, null-clientId paths. * 7 new login-page render-shape tests covering showFlowAbortedNotice (idempotence, every form control gets disabled, textContent-only label sink, /auth/abort target), the proactive heartbeat trigger, and the reactive Resend / Verify gates running BEFORE the action they protect. * 1 new e2e scenario `@resend-after-par-dead` proving the contract: PAR dead, click Resend → land at demo with `auth_failed`, no fresh OTP email is sent. 1017 unit tests pass. All 7 OTP/PAR e2e scenarios green locally. Updated docs/design/par-expiry-and-clean-exits.md with the new "Bug found post-PR-154" section explaining what broke, why heartbeat alone wasn't enough, and the proactive-notice + reactive-gate design. Updated the clean-exit changeset to mention the new behaviour (rather than adding a separate one — same user-facing story). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eDoS-prone regex SonarCloud's quality gate failed on PR #154 with two issues introduced by 86cd60f: * Duplicated lines (3.8% > 3% threshold). The new auth-abort.test.ts shared 24 lines of buildApp + listen-on-ephemeral-port boilerplate with the existing heartbeat.test.ts — both test the same router with the same ctx shape. * Security hotspot S5852 on login-page.test.ts:688. The `[\s\S]*?` between two literal anchors creates a super-linear-runtime regex (ReDoS) — fine for a one-off runtime check on a small string but flagged because the same shape would be unsafe on user input. Fixes: * Extract `buildHeartbeatApp` + `harnessGet` helpers into `__tests__/__helpers__/heartbeat-router-harness.ts`. Both test files now import them. Vitest's test pattern is `*.test.ts`, so the helper isn't picked up as a test suite. Coverage excludes already cover `__tests__/**`. * Replace the ReDoS-prone regex with a substring-based slice + two `toContain` checks. Same assertion shape (guard runs, stopHeartbeat + showFlowAbortedNotice both called), no backtracking risk. 1017 unit tests still pass; 11 of those run against the new shared harness (5 auth-abort + 6 heartbeat). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/auth-service/src/routes/login-page.ts`:
- Around line 790-820: The showFlowAbortedNotice function currently disables OTP
inputs and buttons but leaves the `#recovery-link` anchor interactive; hide the
recovery link instead of trying to disable it so nothing remains clickable after
abort. In showFlowAbortedNotice (the function that sets flowAborted and disables
otpBoxes, btn-resend, btn-back, and the form submit), locate
document.getElementById('recovery-link') and set its style.display = 'none'
(consistent with other visibility changes in the IIFE) so the anchor is removed
from view/clickability when the abort notice is shown. Ensure the change is only
applied inside the flowAborted guard so the link remains available otherwise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b21344f-5df5-48a0-a8cf-174da5f43c08
📒 Files selected for processing (8)
.changeset/clean-exit-on-expired-signin.mddocs/design/par-expiry-and-clean-exits.mde2e/step-definitions/auth.steps.tsfeatures/passwordless-authentication.featurepackages/auth-service/src/__tests__/auth-abort.test.tspackages/auth-service/src/__tests__/login-page.test.tspackages/auth-service/src/routes/heartbeat.tspackages/auth-service/src/routes/login-page.ts
✅ Files skipped from review due to trivial changes (1)
- .changeset/clean-exit-on-expired-signin.md
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/auth-service/src/routes/heartbeat.ts
- e2e/step-definitions/auth.steps.ts
|
@Kzoeps commented on May 5, 2026, 10:29 AM GMT+1:
What do you mean by "tamper with", exactly? I think deleting or somehow expiring the cookie would be a reasonable thing to test, but just editing to an invalid value it is not something we need to spend time worrying about. Obviously if |
yep i basically edited the cookie. Makes sense to ignore this. @aspiers
so that was kinda a work around to see if it renders |
|
Tested by Karma and myself. Unfortunately there are a million different code paths so we can't possibly cover them all, but we do already have a huge body of automated tests too. Feels like this should be several steps forwards even though uncomfortable making sizable changes at this late stage. The status quo is some bad expiries and users getting trapped in a loop, so we need to move forwards with a fix. |
Adds a second @otp-expiry scenario that backdates the auth_flow row via the existing /_internal/test/expire-auth-flow hook, then submits a still-valid OTP. After PR #154's reactive abort gate the OTP form pings /auth/ping before submitting; with the auth_flow row dead the ping reports `flow_expired`, the gate navigates to /auth/abort, and cleanExit serves its Tier-2 styled "Sign-in session expired" fallback page (the OAuth client redirect path needs the dead row's clientId, which is exactly what's missing here). The scenario asserts both signals — the ping reason (proving auth_flow specifically tripped, not PAR) and the abort fallback page — so a regression that, say, swaps which timer the gate honours would still be caught. Without this guardrail nothing in CI would notice if AUTH_FLOW_TTL_MS were quietly shortened back to the OTP TTL; the existing scenario only proves the 10-min-and-resend path works, not that the 60-min boundary is enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a second @otp-expiry scenario that backdates the auth_flow row via the existing /_internal/test/expire-auth-flow hook, then submits a still-valid OTP. After PR #154's reactive abort gate the OTP form pings /auth/ping before submitting; with the auth_flow row dead the ping reports `flow_expired`, the gate navigates to /auth/abort, and cleanExit serves its Tier-2 styled "Sign-in session expired" fallback page (the OAuth client redirect path needs the dead row's clientId, which is exactly what's missing here). The scenario asserts both signals — the ping reason (proving auth_flow specifically tripped, not PAR) and the abort fallback page — so a regression that, say, swaps which timer the gate honours would still be caught. Without this guardrail nothing in CI would notice if AUTH_FLOW_TTL_MS were quietly shortened back to the OTP TTL; the existing scenario only proves the 10-min-and-resend path works, not that the 60-min boundary is enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a second @otp-expiry scenario that backdates the auth_flow row via the existing /_internal/test/expire-auth-flow hook, then submits a still-valid OTP. After PR #154's reactive abort gate the OTP form pings /auth/ping before submitting; with the auth_flow row dead the ping reports `flow_expired`, the gate navigates to /auth/abort, and cleanExit serves its Tier-2 styled "Sign-in session expired" fallback page (the OAuth client redirect path needs the dead row's clientId, which is exactly what's missing here). The scenario asserts both signals — the ping reason (proving auth_flow specifically tripped, not PAR) and the abort fallback page — so a regression that, say, swaps which timer the gate honours would still be caught. Without this guardrail nothing in CI would notice if AUTH_FLOW_TTL_MS were quietly shortened back to the OTP TTL; the existing scenario only proves the 10-min-and-resend path works, not that the 60-min boundary is enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>




Why this PR exists
A user reported being stuck in a sign-in loop after waiting >10 minutes for their OTP, clicking Resend, and submitting the new code. Instead of completing, they hit "Your sign-in took too long…" with no recovery path.
The three timers, in plain English
If you've ever wondered why a slow sign-in can fail in surprising ways, ePDS has three independent clocks running during a single OAuth flow. They live in different layers and were built independently, which is why the original failure was confusing.
The bug was the middle clock dying silently while the top clock was still alive — so the user had a fresh email code in hand but no working OAuth ticket on the server side, and there was no path forward when they submitted.
What this PR changes
Heartbeat — while you're sitting on the OTP form (or the recovery code form), the page now keeps the OAuth ticket alive in the background. Slow inboxes and slow typists no longer blow it up. Bounded by the 60-minute "still in flight" cookie — the heartbeat can't extend anything past that.
Clean exit — if the OAuth ticket does die (closed tab, walked away too long), the user is now redirected back to the app they were signing into with a standard OAuth error, instead of being stranded on a "session expired" page with no escape. The app's own retry UI takes over from there. When the originating app can be identified but its registered redirect URI fails to resolve, the page renders with a "Return to sign in" button targeted at the app's home URL. There is one residual case — a user hitting an expired
/auth/completeURL with no cookie, no flow row, and no signed callback parameters (typically a bookmarked URL hit after the 60-min auth_flow TTL) — where ePDS has no way to know which app the user came from; that page remains text-only because there is no useful URL to construct. Could be addressed in a follow-up by an operator-supplied fallback home URL.Closes #151. Substantially addresses #150 (the dead-end at
/auth/completeis replaced with a clean redirect). Does not address #152 (email-template branding lost on browser-back) — orthogonal email-side fix.White-boxing budget
Zero new pds-core internal endpoints, zero new
requestManager/provideraccess, zero new_router.stackmanipulation. Auth-service resolves OAuth client metadata via the existing@certified-app/sharedresolveClientMetadata()helper to recoverredirect_uris[0]when the PAR row has died. Pds-core's existinghandleCallbackError()is extended in place to do the same on/oauth/epds-callback. The only new trust-boundary plumbing is an additionalclient_idfield on the HMAC-signed callback URL, signed via the same''sentinel pattern as the existing optionalhandlefield, so an attacker cannot tamper to redirect a victim's flow at a different OAuth client.See
docs/design/par-expiry-and-clean-exits.mdfor the full audit.Drive-by
fix(demo)(8bcb344+3b0ec50) — the demo's/client-metadata.jsonroute was being statically prerendered atnext buildtime, so itsbrand_color,background_color, andbranding.cssfields stayed frozen at the defaults regardless of the container's runtimeEPDS_CLIENT_THEME. That broke the@client-brandinge2e scenarios on local docker-compose runs and made the trusted-vs-untrusted demo split indocker-compose.yml(which always intended to distinguish them viaEPDS_CLIENT_THEME) a no-op. Two-line fix: addexport const dynamic = 'force-dynamic'to the route (8bcb344), and uncommentEPDS_CLIENT_THEME=amberinpackages/demo/.env.exampleso a freshsetup.sh-prepared local stack passes the branding scenarios out of the box (3b0ec50).Tests
client_idsentinel contract).@otp-and-par-expiry(basic / multiple-Resend /prompt=login/ recovery),@par-heartbeat, and the updated@par-callback-errorscenario (asserts the new clean-exit contract instead of the old "no JSON leak" assertions).Test plan
pnpm format:check— cleanpnpm lint— cleanpnpm test— 927/927 passpnpm test:e2e --tags '@otp-and-par-expiry or @par-heartbeat'— 5/5 pass against a local docker-compose stackE2E_INTERNAL_SECRET=… pnpm test:e2e --tags '@par-callback-error'— passes (was the Blacksmith failure on first push)EPDS_CLIENT_THEME=amber(now the default) — all 56 scenarios pass; the previously-failing 6@client-brandingscenarios fixed by the demo theme drive-byerror=access_deniedinstead of on a "session expired" page🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests