Skip to content

Heartbeat + clean-exit so slow sign-ins recover instead of stranding the user#154

Merged
aspiers merged 21 commits intomainfrom
fix/otp-resend-after-par-expiry
May 5, 2026
Merged

Heartbeat + clean-exit so slow sign-ins recover instead of stranding the user#154
aspiers merged 21 commits intomainfrom
fix/otp-resend-after-par-expiry

Conversation

@aspiers
Copy link
Copy Markdown
Contributor

@aspiers aspiers commented May 4, 2026

Why this PR exists

A user reported being stuck in a sign-in loop after waiting >10 minutes for their OTP, clicking Resend, and submitting the new code. Instead of completing, they hit "Your sign-in took too long…" with no recovery path.

The three timers, in plain English

If you've ever wondered why a slow sign-in can fail in surprising ways, ePDS has three independent clocks running during a single OAuth flow. They live in different layers and were built independently, which is why the original failure was confusing.

Clock How long What dies if it expires
The email code 10 min The code in your email stops working. You can ask for a new one ("Resend").
The hidden OAuth ticket 5 min, refreshed each time the page is touched server-side The link between you and the app you're signing into. If this dies before you finish, the sign-in cannot complete even if your email code is still valid.
The "still in flight" cookie 60 min The browser forgets which app you were signing into.

The bug was the middle clock dying silently while the top clock was still alive — so the user had a fresh email code in hand but no working OAuth ticket on the server side, and there was no path forward when they submitted.

What this PR changes

Heartbeat — while you're sitting on the OTP form (or the recovery code form), the page now keeps the OAuth ticket alive in the background. Slow inboxes and slow typists no longer blow it up. Bounded by the 60-minute "still in flight" cookie — the heartbeat can't extend anything past that.

Clean exit — if the OAuth ticket does die (closed tab, walked away too long), the user is now redirected back to the app they were signing into with a standard OAuth error, instead of being stranded on a "session expired" page with no escape. The app's own retry UI takes over from there. When the originating app can be identified but its registered redirect URI fails to resolve, the page renders with a "Return to sign in" button targeted at the app's home URL. There is one residual case — a user hitting an expired /auth/complete URL with no cookie, no flow row, and no signed callback parameters (typically a bookmarked URL hit after the 60-min auth_flow TTL) — where ePDS has no way to know which app the user came from; that page remains text-only because there is no useful URL to construct. Could be addressed in a follow-up by an operator-supplied fallback home URL.

Closes #151. Substantially addresses #150 (the dead-end at /auth/complete is replaced with a clean redirect). Does not address #152 (email-template branding lost on browser-back) — orthogonal email-side fix.

White-boxing budget

Zero new pds-core internal endpoints, zero new requestManager / provider access, zero new _router.stack manipulation. Auth-service resolves OAuth client metadata via the existing @certified-app/shared resolveClientMetadata() helper to recover redirect_uris[0] when the PAR row has died. Pds-core's existing handleCallbackError() is extended in place to do the same on /oauth/epds-callback. The only new trust-boundary plumbing is an additional client_id field on the HMAC-signed callback URL, signed via the same '' sentinel pattern as the existing optional handle field, so an attacker cannot tamper to redirect a victim's flow at a different OAuth client.

See docs/design/par-expiry-and-clean-exits.md for the full audit.

Drive-by

  • fix(demo) (8bcb344 + 3b0ec50) — the demo's /client-metadata.json route was being statically prerendered at next build time, so its brand_color, background_color, and branding.css fields stayed frozen at the defaults regardless of the container's runtime EPDS_CLIENT_THEME. That broke the @client-branding e2e scenarios on local docker-compose runs and made the trusted-vs-untrusted demo split in docker-compose.yml (which always intended to distinguish them via EPDS_CLIENT_THEME) a no-op. Two-line fix: add export const dynamic = 'force-dynamic' to the route (8bcb344), and uncomment EPDS_CLIENT_THEME=amber in packages/demo/.env.example so a fresh setup.sh-prepared local stack passes the branding scenarios out of the box (3b0ec50).

Tests

  • 927 unit tests pass (912 baseline + 15 new across heartbeat, clean-exit, scheme validation, transient-handling, and shared crypto's new client_id sentinel contract).
  • 6 new / updated e2e scenarios all green: four @otp-and-par-expiry (basic / multiple-Resend / prompt=login / recovery), @par-heartbeat, and the updated @par-callback-error scenario (asserts the new clean-exit contract instead of the old "no JSON leak" assertions).
  • Lint, format, full unit suite all clean.

Test plan

  • pnpm format:check — clean
  • pnpm lint — clean
  • pnpm test — 927/927 pass
  • pnpm test:e2e --tags '@otp-and-par-expiry or @par-heartbeat' — 5/5 pass against a local docker-compose stack
  • E2E_INTERNAL_SECRET=… pnpm test:e2e --tags '@par-callback-error' — passes (was the Blacksmith failure on first push)
  • Full e2e on local docker-compose with EPDS_CLIENT_THEME=amber (now the default) — all 56 scenarios pass; the previously-failing 6 @client-branding scenarios fixed by the demo theme drive-by
  • CI green
  • Manual: sign in via the demo on a preview deployment, deliberately wait ~6 min on the OTP form before submitting, confirm the user is no longer stranded
  • Manual: reproduce Re-submitting email after PAR expiry produces a "Session expired" dead-end #150's exact path (timeout → browser back → re-submit email → enter OTP) and verify the user lands back at the client with error=access_denied instead of on a "session expired" page

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Sign-in timeouts now redirect back to the originating app when possible; otherwise a “Return to sign in” button (Start over) is shown.
    • In-page heartbeat keeps slow OTP/recovery sign-ins alive.
    • Inline “Send a new code” action keeps OTP inputs visible and enabled after expiry.
  • Bug Fixes

    • Resend flow no longer offers unusable sign-in codes; reduced dead-end “session expired” experiences.
  • Documentation

    • Added design doc on auth timers and clean-exit recovery.
  • Tests

    • Expanded e2e/unit coverage for heartbeat, resend, and clean-exit flows.

aspiers and others added 7 commits May 4, 2026 11:51
…(NO FIX)

Add four RED scenarios under @otp-and-par-expiry that reproduce the
"stuck in a loop" failure modes a slow user can trip during OAuth
login. All four exercise the same root cause — upstream's PAR
request_uri (5 min sliding TTL, hardcoded in @atproto/oauth-provider)
dies while the user reads their email — but they surface on three
different downstream pages depending on which call hits the dead PAR
first:

  * @otp-and-par-expiry: OTP-expired UI → Resend → submit lands on
    auth-service /auth/choose-handle's "Session expired, please start
    over" — the original bug a user reported.
  * @multiple-resend: two Resend cycles silently age out the PAR
    without ever showing the OTP-expired UI; same dead-end. Locks in
    the regression for fixes that only patch the OTP-expired branch.
  * @prompt-login: forced re-authentication with a slow user; browser
    reaches /oauth/epds-callback but never /welcome.
  * @recovery: backup-email recovery + slow user; lands on pds-core's
    "Your sign-in took too long…" page.

The existing @otp-expiry scenario deliberately keeps the PAR alive
to isolate the auth-service-side TTL fix from the PDS layer. That is
faithful to the auth-service-side regression but masks the broader
problem: in production the PAR is already dead by the time OTP
expires, since 5 min < 10 min. None of the prior expiry fixes covered
the cross-layer interaction these scenarios exercise.

Implementation:

  * One new step phrase, "the PAR request_uri is captured for later
    expiry", that snapshots the request_uri before the user navigates
    to a page where the URL no longer carries it (recovery scenario).
  * captureRequestUriFromPage() falls back to a previously stashed
    world.lastRequestUri so existing call sites still work.
  * No production-code changes — these scenarios reproduce the bug on
    origin/main and remain RED until the clean-exit fix lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New living document under docs/design/ that captures the bug a real
user reported (stuck in a sign-in loop after OTP-expired + Resend),
the three-timer interaction that causes it (OTP / PAR / auth_flow),
why the existing OTP-expiry scenarios missed it, and the strategy to
fix it.

Two layers, in priority order:

  1. Clean exit — every error a slow user can trip must end with a
     working path back to the OAuth client. Audit of all 12
     renderError() call sites on the auth path is included, grouped
     into three failure clusters (redirect-when-possible /
     PAR-dead-on-handle-page / nothing-recoverable). Today every
     one of these is a dead-end static page.
  2. Heartbeat — opportunistic UX polish that pings the existing
     /_internal/ping-request endpoint while the user is on the OTP
     and recovery forms, so the common case (slow inbox / slow
     typist) doesn't hit the dead-end at all. Bounded by auth_flow's
     60-min ceiling; no new tokens, no new credentials.

Implementation will land mostly in auth-service:

  * @certified-app/shared already exposes resolveClientMetadata(),
    so cluster A/B redirects can be built without a new pds-core
    internal endpoint and without any new requestManager/provider
    access. Net new white-boxing surface: zero.
  * Heartbeat reuses the existing /_internal/ping-request endpoint;
    only the public auth-service /auth/ping wrapper is new.
  * Recovery flow's clientId is propagated via the login-page →
    recovery-link path, not by exposing expired PAR rows from
    pds-core.

Includes the resolved decisions for all three open questions (Start
Over destination, cluster B redirect feasibility, recovery clientId
propagation) — each resolved toward the contract of "minimum friction
for the end user, fewest steps to retry".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…form

Adds a server-mediated heartbeat that slides the upstream PAR
request_uri inactivity timer (atproto's
AUTHORIZATION_INACTIVITY_TIMEOUT, 5 min, hardcoded) while a user is on
the OTP form or the recovery OTP form. Without it, a user who reads
their email for >5 min trips a downstream "session expired" page on
submit even though both their OTP and auth_flow are still well within
their own (10 min and 60 min) TTLs. See
docs/design/par-expiry-and-clean-exits.md for the broader strategy.

Implementation lives entirely in auth-service:

  * New GET /auth/ping route reads the user's epds_auth_flow cookie,
    looks up the auth_flow row, and proxies the existing
    pingParRequest() helper into pds-core's existing
    /_internal/ping-request. No new pds-core endpoint, no new
    requestManager / provider access. Bounded by the auth_flow's
    own 60-min TTL — once the auth_flow row is gone, the ping
    becomes a no-op and the browser stops pinging.

  * Browser-side: a 3-min setInterval is inlined into the login
    page's OTP step and into the recovery OTP form. The interval
    starts when the OTP step becomes visible (or, for login_hint
    flows, on initial render), pauses on form submit, resumes on
    a verify failure, and stops on page unload.

  * Test toggle: a `?no_heartbeat=1` query param disables the
    inlined interval, but only when EPDS_TEST_HOOKS=1 is also set
    server-side. Lets future e2e scenarios prove the heartbeat is
    what saves a slow user without removing it in production.

Heartbeat is opportunistic UX polish, not a security boundary. A
truly absent user (closed tab, walked away long enough) still ends
up on whatever expiry page exists today; the clean-exit work
tracked separately in the same design doc gives those pages a
working path back to the OAuth client.

Tests:

  * Unit: 6 cases on the route covering no-cookie / dead-flow /
    happy-path / dead-PAR / operational-error / Cache-Control.
  * Toggle: 9 cases covering heartbeatEnabledFor() precedence
    (production default, EPDS_TEST_HOOKS gating, no_heartbeat=1
    interaction) plus rendered-HTML smoke tests for both the
    login page and the recovery OTP form.
  * No e2e proof yet — that lands when the docker-compose stack
    is rebuilt against this branch (the running stack belongs to
    another worktree on origin/main).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a new @par-heartbeat scenario that proves the in-page heartbeat
reaches /auth/ping and that /auth/ping forwards to pds-core's
/_internal/ping-request without manual intervention. The scenario
loads the OTP form, then synchronously invokes the same fetch the
3-min setInterval would fire — same origin, same cookies, same
browser security boundary — and asserts ok:true on the response.
This is wiring liveness; routing logic is covered by the unit tests
in heartbeat.test.ts.

The existing four @otp-and-par-expiry scenarios remain RED — they
hard-delete the PAR row via /_internal/test/delete-par, which the
heartbeat cannot revive. They turn GREEN when the clean-exit work
(layer 2 in docs/design/par-expiry-and-clean-exits.md) gives the
dead-PAR pages a working path back to the OAuth client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ure instead of stranding the user

Every dead-end auth-service / pds-core page that previously rendered a
static "Session expired" or "Authentication failed" page with no
recoverable action now bounces the user back to the OAuth client's
registered redirect_uri with the RFC 6749 §4.1.2.1 error parameters
(`error`, `error_description`, `iss`, and `state` when available).
The client's own UI then handles retry. See
docs/design/par-expiry-and-clean-exits.md for the audit of the 12
call sites this covers and the three failure clusters they
collapse into.

White-boxing budget: zero net new pds-core internal endpoints, zero
new requestManager / provider access. Auth-service resolves OAuth
client metadata via the existing @certified-app/shared
resolveClientMetadata() helper to recover redirect_uris[0] when the
PAR row has died. Pds-core's existing handleCallbackError() is
extended to do the same thing on the /oauth/epds-callback path
when Step 2 itself throws (the row is gone before its
.parameters.redirect_uri can be captured); the only new piece of
trust-boundary plumbing is an additional `client_id` field on the
HMAC-signed callback URL so pds-core has a clientId in scope. The
field is signed so an attacker cannot tamper to redirect a victim's
flow at a different OAuth client.

Implementation surface:

  * New auth-service helpers: `lib/clean-exit.ts` (response emitter
    that prefers redirect-to-client and falls back to a styled
    Start Over page) and `lib/redirect-to-client-error.ts` (URL
    builder for the redirect path).
  * `complete.ts`, `choose-handle.ts`: replaced six renderError
    dead-ends with cleanExit() calls. flow.clientId is included in
    the signed callback so pds-core's catch block can recover it.
  * pds-core `lib/epds-callback-error.ts`: now async; gains a new
    `signedClientId` opt that triggers a metadata-resolved redirect
    fallback when no capturedRedirectUri was stashed.
  * `@certified-app/shared`:
    - `crypto.ts` `CallbackParams` adds an optional `client_id`
      field (signed via the same '' sentinel pattern as `handle`).
    - `client-metadata.ts` adds `redirect_uris` to the
      `ClientMetadata` interface.
    - `render-error.ts` accepts an optional `startOverHref` so the
      Start Over fallback page can show a working button when a
      client URL is in scope.

Tests:

  * 7 new pds-core tests for the signedClientId fallback path
    (happy redirect, missing redirect_uris, lookup throws,
    malformed URL, captured-vs-signed precedence, no signedClientId
    no-op). resolveClientMetadata is mocked at the module
    boundary; existing handleCallbackError tests were converted
    from sync to async.
  * 4 new shared crypto tests pinning the signed-client_id
    contract (round-trip, tamper rejection, undefined sentinel,
    omit-vs-present asymmetry).
  * The four @otp-and-par-expiry e2e scenarios go GREEN: they now
    assert "browser lands back at the demo client with an auth
    error" (the demo translates `error=access_denied` to
    `?error=auth_failed`) instead of asserting full happy-path
    completion. The hard-deleted PAR cannot be revived; clean-exit
    is the contract those scenarios exercise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the Status section to reflect that all four working layers
(reproduction scenarios, audit, heartbeat, clean-exit) have landed.
The four originally-RED @otp-and-par-expiry scenarios are now GREEN
under the clean-exit contract — they assert that the user lands
back at the OAuth client with an auth error rather than being
stranded on a static page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…EME takes effect

The demo's OAuth client metadata route was statically prerendered at
`next build` time, so its `brand_color`, `background_color`, and
`branding.css` fields were frozen at the defaults regardless of the
container's runtime EPDS_CLIENT_THEME. That broke the
@client-branding e2e scenarios on local docker-compose runs and made
the two-demo-clients split in docker-compose.yml (trusted vs
untrusted, distinguished by EPDS_CLIENT_THEME) a no-op.

Add `export const dynamic = 'force-dynamic'` to the route, matching
the sibling `page.tsx` which already carries the same flag for the
same reason. Each request now re-reads `EPDS_CLIENT_THEME` so the
trusted demo serves clay's palette and the untrusted demo serves
the defaults — the visible distinction the compose file was always
trying to produce.

No code changes anywhere else — the theme primitives in
packages/demo/src/lib/theme.ts already supported this; the route
just needed the dynamic flag to consult them at request time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 4, 2026 13:58
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
epds-demo Ready Ready Preview, Comment May 4, 2026 6:01pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 4, 2026

🦋 Changeset detected

Latest commit: 160f20d

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

Adds an opportunistic PAR heartbeat (server /auth/ping + client polling) to reduce slow-user expiries, threads client_id through callback signing/verification, and implements a two-tier clean-exit (OAuth-spec redirect to client or styled “Return to sign in” HTML fallback). Tests, E2E scenarios, docs, and changesets were added.

Changes

PAR Heartbeat and Clean-Exit Recovery

Layer / File(s) Summary
Data Shape
packages/shared/src/crypto.ts, packages/shared/src/client-metadata.ts
CallbackParams gains optional client_id; HMAC payload/signature covers client_id. ClientMetadata adds optional redirect_uris?: string[].
Core Helpers
packages/auth-service/src/lib/clean-exit.ts, packages/auth-service/src/lib/redirect-to-client-error.ts, packages/shared/src/start-over-href.ts
Adds cleanExit() (two-tier exit: client redirect via buildClientErrorRedirect else HTML), buildClientErrorRedirect, and resolveStartOverHref/sanitiseHttpUrl.
Error Rendering
packages/auth-service/src/lib/render-error.ts, packages/shared/src/render-error.ts
renderError accepts options object (title, startOverHref, startOverLabel) and injects a validated Start Over link; CSS extended.
Heartbeat Route
packages/auth-service/src/routes/heartbeat.ts, packages/auth-service/src/index.ts
New createHeartbeatRouter(ctx) exposing GET /auth/ping returning {ok:true} or failure reasons (no_cookie/flow_expired/par_expired/transient) with Cache-Control: no-store; router mounted in server.
Route Integration
packages/auth-service/src/routes/choose-handle.ts, packages/auth-service/src/routes/complete.ts
Replaces many error-page returns with cleanExit() calls (passes clientId, code, optional fallbackStatus); centralizes signed callback URL builder buildEpdsCallbackUrl and conditionally includes client_id.
Client-Side Heartbeat & UI
packages/auth-service/src/routes/login-page.ts, packages/auth-service/src/routes/recovery.ts
Adds heartbeatEnabledFor(req) and heartbeatEnabled renderer prop; OTP/recovery pages poll /auth/ping while OTP UI active (3min); adds showErrorWithAction inline action ("Send a new code"); guards resend/verify with abortIfFlowDead() to route to /auth/abort.
Preview / Demo
packages/auth-service/src/routes/preview.ts, packages/demo/src/app/client-metadata.json/route.ts
Preview routes explicitly set heartbeatEnabled toggles; demo client-metadata route forced dynamic runtime rendering.
pds-core callback error flow
packages/pds-core/src/lib/epds-callback-error.ts, packages/pds-core/src/index.ts
handleCallbackError becomes async, accepts signedClientId, first tries captured redirect then metadata-based redirect to redirect_uris[0], else HTML fallback that may include Start Over resolved from signedClientId. /oauth/epds-callback now forwards client_id into verification/error handling.
Tests & E2E
packages/auth-service/**/__tests__/*, packages/shared/**/__tests__/*, packages/pds-core/**/__tests__/*, e2e/step-definitions/*, features/passwordless-authentication.feature
New/updated unit, integration, and E2E tests for heartbeat toggle/route, clean-exit behavior, redirect builder, Start Over helpers, callback signing with client_id, PAR-expiry clean-exit scenarios, and OTP UI assertions.
Docs / Changesets
docs/design/par-expiry-and-clean-exits.md, .changeset/*.md
Adds living design doc explaining timers, failure modes, clean-exit contract, heartbeat decisions; changesets document end-user behavior changes.
Misc / Config
vitest.config.ts, packages/demo/.env.example
Raises Vitest coverage thresholds; enables EPDS_CLIENT_THEME=amber in demo example.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/Browser
    participant Auth as Auth Service
    participant Ping as Auth Service /auth/ping
    participant PDS as PDS (PAR)
    participant Client as OAuth Client

    User->>Auth: GET /oauth/authorize → render OTP (heartbeatEnabled=true)
    loop every 3min while OTP active
        User->>Ping: fetch /auth/ping (with cookie)
        Ping->>PDS: pingParRequest(requestUri)
        PDS-->>Ping: success / failure (status)
        Ping-->>User: {ok:true} or {ok:false, reason:...}
    end

    User->>Auth: submit OTP → /auth/complete
    Auth->>PDS: bridge to PAR /oauth/epds-callback
    alt PAR expired or callback error
        Auth->>Auth: handleCallbackError() (try captured redirect → try metadata redirect)
        Auth->>Client: 303 redirect ?error=...
    else redirect construction fails
        Auth-->>User: Styled HTML error with optional "Return to sign in"
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • s-adamantine
  • Kzoeps

"🐰 I thumped my foot and sent a ping,
so PAR stayed warm while users cling —
to inbox hunts and slow-typed codes,
we redirect clean when deadline bodes.
Hop, hop — a gentler sign-in road."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly summarizes the main change: adding heartbeat and clean-exit mechanisms to prevent slow sign-ins from stranding users on error pages.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/otp-resend-after-par-expiry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@railway-app
Copy link
Copy Markdown

railway-app Bot commented May 4, 2026

🚅 Deployed to the ePDS-pr-154 environment in ePDS

Service Status Web Updated (UTC)
@certified-app/auth-service ✅ Success (View Logs) Web May 4, 2026 at 6:05 pm
@certified-app/pds-core ✅ Success (View Logs) Web May 4, 2026 at 4:00 pm
@certified-app/demo untrusted ✅ Success (View Logs) Web May 4, 2026 at 2:51 pm
@certified-app/demo ✅ Success (View Logs) Web May 4, 2026 at 2:51 pm

@coveralls-official
Copy link
Copy Markdown

coveralls-official Bot commented May 4, 2026

Coverage Report for CI Build 25334769474

Coverage increased (+2.4%) to 55.53%

Details

  • Coverage increased (+2.4%) from the base build.
  • Patch coverage: 31 uncovered changes across 6 files (137 of 168 lines covered, 81.55%).
  • 17 coverage regressions across 4 files.

Uncovered Changes

File Changed Covered %
packages/auth-service/src/routes/complete.ts 22 5 22.73%
packages/auth-service/src/routes/choose-handle.ts 8 0 0.0%
packages/auth-service/src/routes/recovery.ts 3 1 33.33%
packages/pds-core/src/index.ts 2 0 0.0%
packages/auth-service/src/index.ts 1 0 0.0%
packages/demo/src/app/client-metadata.json/route.ts 1 0 0.0%

Coverage Regressions

17 previously-covered lines in 4 files lost coverage.

File Lines Losing Coverage Coverage
packages/auth-service/src/routes/choose-handle.ts 7 0.0%
packages/auth-service/src/routes/complete.ts 7 12.79%
packages/auth-service/src/routes/recovery.ts 2 10.27%
packages/pds-core/src/index.ts 1 0.0%

Coverage Stats

Coverage Status
Relevant Lines: 2934
Covered Lines: 1624
Line Coverage: 55.35%
Relevant Branches: 1804
Covered Branches: 1007
Branch Coverage: 55.82%
Branches in Coverage %: Yes
Coverage Strength: 5.73 hits per line

💛 - Coveralls

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the ePDS OAuth sign-in flow against PAR expiry by adding a browser-driven heartbeat during OTP entry and by redirecting users back to the OAuth client with standard error parameters when recovery is no longer possible. It also includes a small demo fix so runtime theme settings affect the published client metadata.

Changes:

  • Added /auth/ping plus OTP/recovery-page heartbeat wiring to keep PARs alive while users wait for email codes.
  • Reworked dead-end expiry/error paths to use clean exits back to OAuth clients, including signed client_id fallback handling in /oauth/epds-callback.
  • Added tests, e2e coverage, design docs, and a demo metadata route fix for runtime theming.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/shared/src/render-error.ts Adds optional “Start over” CTA rendering to shared HTML error pages.
packages/shared/src/crypto.ts Extends signed callback payload with optional client_id.
packages/shared/src/client-metadata.ts Documents/supports redirect_uris in shared client metadata typing.
packages/shared/src/tests/crypto.test.ts Adds callback-signing tests for client_id sentinel behavior.
packages/pds-core/src/lib/epds-callback-error.ts Adds async fallback redirect logic using signed client_id metadata lookup.
packages/pds-core/src/index.ts Threads client_id through callback verification and error handling.
packages/pds-core/src/tests/epds-callback-error.test.ts Expands callback error-path coverage for signed-client fallback.
packages/demo/src/app/client-metadata.json/route.ts Forces runtime rendering for demo client metadata/theme fields.
packages/auth-service/src/routes/recovery.ts Wires heartbeat flag into recovery OTP rendering.
packages/auth-service/src/routes/preview.ts Disables heartbeat in preview-only surfaces.
packages/auth-service/src/routes/login-page.ts Adds heartbeat toggle logic and OTP-page keepalive script.
packages/auth-service/src/routes/heartbeat.ts Introduces /auth/ping route to proxy PAR keepalive server-side.
packages/auth-service/src/routes/complete.ts Replaces dead-end errors with clean-exit behavior and signs client_id.
packages/auth-service/src/routes/choose-handle.ts Replaces handle-flow dead ends with clean-exit behavior and signs client_id.
packages/auth-service/src/lib/render-error.ts Threads new start-over options through auth-service error renderer.
packages/auth-service/src/lib/redirect-to-client-error.ts Adds helper to build OAuth error redirects from client metadata.
packages/auth-service/src/lib/clean-exit.ts Centralizes redirect-or-fallback clean-exit response handling.
packages/auth-service/src/index.ts Mounts the new heartbeat router.
packages/auth-service/src/tests/login-page.test.ts Updates login page test fixtures for heartbeat option.
packages/auth-service/src/tests/heartbeat.test.ts Adds unit tests for /auth/ping.
packages/auth-service/src/tests/heartbeat-toggle.test.ts Adds tests for heartbeat toggle/rendering behavior.
features/passwordless-authentication.feature Adds e2e scenarios for PAR expiry clean exits and heartbeat wiring.
e2e/step-definitions/auth.steps.ts Adds steps for PAR capture/expiry and clean-exit assertions.
docs/design/par-expiry-and-clean-exits.md Documents timer interactions, failure clusters, and implementation choices.
.changeset/par-heartbeat-keeps-slow-signins-alive.md User-facing changeset for heartbeat behavior.
.changeset/demo-client-metadata-runtime-theme.md Operator-facing changeset for demo runtime theming fix.
.changeset/clean-exit-on-expired-signin.md User/client-facing changeset for clean-exit behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/auth-service/src/lib/redirect-to-client-error.ts
Comment thread packages/pds-core/src/lib/epds-callback-error.ts Outdated
Comment thread packages/auth-service/src/routes/recovery.ts
Comment thread packages/auth-service/src/routes/heartbeat.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
packages/pds-core/src/lib/epds-callback-error.ts (1)

104-208: 🏗️ Heavy lift

Refactor handleCallbackError to reduce branching complexity.

Line 104 currently packs classification, logging, two redirect tiers, and HTML fallback into one path-heavy function. Splitting tier logic into small helpers will improve testability and address the complexity gate.

Refactor sketch
-export async function handleCallbackError(opts: HandleCallbackErrorOpts): Promise<void> {
-  // classify + log + tier1a + tier1b + html fallback
-}
+function buildErrorRedirect(
+  redirectUri: string,
+  code: 'access_denied' | 'server_error',
+  description: string,
+  pdsUrl: string,
+  state?: string,
+): URL | null { /* parse + set params + return null on invalid */ }
+
+async function tryRedirectFromSignedClient(
+  signedClientId: string,
+  code: 'access_denied' | 'server_error',
+  description: string,
+  pdsUrl: string,
+  logger: Pick<Logger, 'warn' | 'error'>,
+): Promise<string | null> { /* resolve metadata + return redirect uri */ }
+
+export async function handleCallbackError(opts: HandleCallbackErrorOpts): Promise<void> {
+  // 1) classify+log
+  // 2) tier1a via helper
+  // 3) tier1b via helper
+  // 4) html fallback
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/pds-core/src/lib/epds-callback-error.ts` around lines 104 - 208,
handleCallbackError is too branchy—split the Tier-1a and Tier-1b redirect logic
and the logging into small helpers to reduce complexity and improve testability:
extract the captured-redirect flow into a helper (e.g.,
handleCapturedRedirect(capturedRedirectUri, capturedState, pdsUrl, res, logger,
code, description)) that validates the URL, sets Cache-Control, appends query
params and performs the 303 redirect; extract the signed-client fallback into
another helper (e.g., handleSignedClientFallback(signedClientId, pdsUrl, res,
logger, code, description, resolveClientMetadata)) that resolves metadata,
validates the fallback redirect URI and redirects or returns a signal to fall
back to HTML; keep classifyCallbackError usage and the expiry/error logging in
the main function but call these helpers and short-circuit on successful
redirect so the remaining HTML fallback is reached only when helpers return
false or throw handled errors.
docs/design/par-expiry-and-clean-exits.md (2)

201-210: ⚡ Quick win

Align the dead-end target table with the final contract.

Line 203 and Line 209 still describe fallback paths (/oauth/authorize restart and /_internal/par...-style lookup/defer) that conflict with the resolved decisions later in this doc (Line 257-260, Line 293-313). Please update this table to the final agreed behavior so future work doesn’t reintroduce rejected paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/par-expiry-and-clean-exits.md` around lines 201 - 210, Update the
dead-end target table rows to match the final contract decisions described later
in the doc: replace any mentions of falling back to restarting via
`/oauth/authorize` and any `/_internal/par…`-style lookup/defer behavior with
the resolved behavior (redirect to client with appropriate error when
redirect_uri is recoverable; otherwise show Start Over) for the affected
entries—specifically update the `/auth/complete` (no cookie / no flow /
better-auth session error) rows and the `/auth/choose-handle` rows (no cookie/no
flow, session errors, no session user, PAR ping fails on render/POST) so the
table’s Target column exactly mirrors the decisions in the later contract text.

340-354: ⚡ Quick win

Make the status block explicitly time-scoped.

This section uses “now GREEN” plus hard counts and commit claims, which will stale quickly in a living doc. Add an “as of ” stamp or CI run reference to preserve trustworthiness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/par-expiry-and-clean-exits.md` around lines 340 - 354, The status
block currently uses ephemeral phrases like "now GREEN" and hard counts/commit
mentions (e.g., commits `b1fc940`, `2e4d327`, test tags `@par-heartbeat`,
`@otp-and-par-expiry`) which will age; update the paragraph to be explicitly
time-scoped by appending an "as of <date>" stamp or a CI run identifier (e.g., a
specific workflow run ID or timestamp) after the status statement, and where you
reference counts of scenarios/tests/commits keep the same details but add the
same date/CI reference to each claim so readers can verify the state against
that snapshot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/par-expiry-and-clean-exits.md`:
- Around line 265-268: Update the wording around `state` in the PAR expiry
paragraph to correct the RFC claim: explicitly state that RFC 6749 §4.1.2.1
requires `state` in the error response if it was present in the authorization
request, but because the PAR expiry loses the original `state` (it lived in the
PAR) the implementation pragmatically degrades and issues an error redirect
without `state`; clarify this is a limitation of our system (PAR expiry) and not
permitted by the spec, and note that clients will treat such redirects as
anonymous errors and restart.

In `@packages/auth-service/src/lib/clean-exit.ts`:
- Around line 115-124: resolveClientHome currently returns metadata.client_uri
verbatim which can lead to unsafe schemes being rendered as startOverHref;
update resolveClientHome to validate and sanitize metadata.client_uri by
attempting to parse it with the URL constructor, ensure the parsed URL.protocol
is only "http:" or "https:" and that it has a hostname (or other required
components), and only return metadata.client_uri when it passes these checks;
otherwise fall back to the existing URL(clientId).origin logic or return null.
Apply this validation inside resolveClientHome before returning
metadata.client_uri so startOverHref only ever receives safe http/https URLs.

In `@packages/auth-service/src/lib/redirect-to-client-error.ts`:
- Around line 89-109: The helper currently accepts any URL scheme because it
only checks new URL(redirectUri); after constructing the URL in the try block
(or immediately after parsing), validate that url.protocol is either 'http:' or
'https:' and if not, log a warning (same shape as the existing logger.warn
including err/clientId/redirectUri context) and return null; update the logic
around the try/catch in redirect-to-client-error.ts so non-web schemes (e.g.
'javascript:') are rejected before setting searchParams and returning
url.toString().

In `@packages/auth-service/src/routes/heartbeat.ts`:
- Around line 60-68: The current heartbeat handler treats every pingParRequest()
failure as 'par_expired', which makes transient failures terminal; update the
error handling in the block that calls pingParRequest(flow.requestUri, pdsUrl,
internalSecret) so that you only set HeartbeatResponse.reason = 'par_expired'
when ping.status === 410 (or the explicit indicator of an expired PAR), and for
other failures (network timeouts, 5xx, non-410 status, or ping.err present)
return a non-terminal reason such as 'par_unreachable' or 'transient_error' (and
keep logging the details with logger.debug({ status: ping.status, err: ping.err,
flowId }, ...)). Ensure the response shape (HeartbeatResponse) is updated if
necessary to accept the new reason values and do not change the flow state to
terminal in the non-410 cases.

In `@packages/shared/src/render-error.ts`:
- Around line 66-68: The current startOverHtml construction renders any escaped
startOverHref (via escapeHtml) but doesn't validate the URL scheme, allowing
dangerous schemes like javascript:; update the logic that builds startOverHtml
so it only renders a link when startOverHref is a valid absolute URL whose
protocol is exactly "http:" or "https:" (use the URL constructor in a try/catch
to parse and verify url.protocol === 'http:' || url.protocol === 'https:'),
otherwise set startOverHtml to an empty string; reference the
startOverHtml/startOverHref/startOverLabel variables and the existing escapeHtml
call when applying this validation.

---

Nitpick comments:
In `@docs/design/par-expiry-and-clean-exits.md`:
- Around line 201-210: Update the dead-end target table rows to match the final
contract decisions described later in the doc: replace any mentions of falling
back to restarting via `/oauth/authorize` and any `/_internal/par…`-style
lookup/defer behavior with the resolved behavior (redirect to client with
appropriate error when redirect_uri is recoverable; otherwise show Start Over)
for the affected entries—specifically update the `/auth/complete` (no cookie /
no flow / better-auth session error) rows and the `/auth/choose-handle` rows (no
cookie/no flow, session errors, no session user, PAR ping fails on render/POST)
so the table’s Target column exactly mirrors the decisions in the later contract
text.
- Around line 340-354: The status block currently uses ephemeral phrases like
"now GREEN" and hard counts/commit mentions (e.g., commits `b1fc940`, `2e4d327`,
test tags `@par-heartbeat`, `@otp-and-par-expiry`) which will age; update the
paragraph to be explicitly time-scoped by appending an "as of <date>" stamp or a
CI run identifier (e.g., a specific workflow run ID or timestamp) after the
status statement, and where you reference counts of scenarios/tests/commits keep
the same details but add the same date/CI reference to each claim so readers can
verify the state against that snapshot.

In `@packages/pds-core/src/lib/epds-callback-error.ts`:
- Around line 104-208: handleCallbackError is too branchy—split the Tier-1a and
Tier-1b redirect logic and the logging into small helpers to reduce complexity
and improve testability: extract the captured-redirect flow into a helper (e.g.,
handleCapturedRedirect(capturedRedirectUri, capturedState, pdsUrl, res, logger,
code, description)) that validates the URL, sets Cache-Control, appends query
params and performs the 303 redirect; extract the signed-client fallback into
another helper (e.g., handleSignedClientFallback(signedClientId, pdsUrl, res,
logger, code, description, resolveClientMetadata)) that resolves metadata,
validates the fallback redirect URI and redirects or returns a signal to fall
back to HTML; keep classifyCallbackError usage and the expiry/error logging in
the main function but call these helpers and short-circuit on successful
redirect so the remaining HTML fallback is reached only when helpers return
false or throw handled errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f5e441cc-a167-49f6-8eb6-7ed93ec7bd77

📥 Commits

Reviewing files that changed from the base of the PR and between ffc17bd and 8bcb344.

📒 Files selected for processing (27)
  • .changeset/clean-exit-on-expired-signin.md
  • .changeset/demo-client-metadata-runtime-theme.md
  • .changeset/par-heartbeat-keeps-slow-signins-alive.md
  • docs/design/par-expiry-and-clean-exits.md
  • e2e/step-definitions/auth.steps.ts
  • features/passwordless-authentication.feature
  • packages/auth-service/src/__tests__/heartbeat-toggle.test.ts
  • packages/auth-service/src/__tests__/heartbeat.test.ts
  • packages/auth-service/src/__tests__/login-page.test.ts
  • packages/auth-service/src/index.ts
  • packages/auth-service/src/lib/clean-exit.ts
  • packages/auth-service/src/lib/redirect-to-client-error.ts
  • packages/auth-service/src/lib/render-error.ts
  • packages/auth-service/src/routes/choose-handle.ts
  • packages/auth-service/src/routes/complete.ts
  • packages/auth-service/src/routes/heartbeat.ts
  • packages/auth-service/src/routes/login-page.ts
  • packages/auth-service/src/routes/preview.ts
  • packages/auth-service/src/routes/recovery.ts
  • packages/demo/src/app/client-metadata.json/route.ts
  • packages/pds-core/src/__tests__/epds-callback-error.test.ts
  • packages/pds-core/src/index.ts
  • packages/pds-core/src/lib/epds-callback-error.ts
  • packages/shared/src/__tests__/crypto.test.ts
  • packages/shared/src/client-metadata.ts
  • packages/shared/src/crypto.ts
  • packages/shared/src/render-error.ts

Comment thread docs/design/par-expiry-and-clean-exits.md Outdated
Comment thread packages/auth-service/src/lib/clean-exit.ts Outdated
Comment thread packages/auth-service/src/lib/redirect-to-client-error.ts
Comment thread packages/auth-service/src/routes/heartbeat.ts
Comment thread packages/shared/src/render-error.ts Outdated
@blacksmith-sh

This comment has been minimized.

…e + Blacksmith)

A bundle of small, mechanically-independent fixes raised on PR #154:

  * heartbeat: only treat a 404 from pds-core's
    /_internal/ping-request as terminal `par_expired`. Anything else
    (5xx, network blip, thrown error with no status) is reported as
    `transient` so the browser keeps polling and the next tick can
    recover. Pre-fix, a single dropped packet during otherwise-healthy
    operation would terminate keepalive permanently and re-introduce
    the very dead-end the heartbeat exists to prevent.
    (CodeRabbit, Copilot)

  * URL scheme validation on every href that ends up rendered into a
    page (`renderError` startOverHref, `clean-exit.ts`'s
    `resolveClientHome`, `redirect-to-client-error.ts`'s redirect
    target). escapeHtml does not neutralise `javascript:` URLs;
    defence-in-depth rejects everything but http:/https: at parse
    time. atproto's @atproto/oauth-provider already validates
    redirect_uris at PAR creation, so production hits should be
    unreachable — but the catch path exists precisely to spare the
    user a 500, and an unhandled exotic scheme would defeat that.
    (CodeRabbit x3)

  * `heartbeatEnabledFor` now also reads `req.body.no_heartbeat`,
    making the test toggle work symmetrically across the recovery
    POST handlers (which strip query params from re-renders).
    `renderRecoveryForm` and `renderRecoveryOtpForm` propagate the
    flag as a hidden field when explicitly disabled, so a future
    @no-heartbeat-recovery scenario can prove the heartbeat is what
    saves a slow recovery user. The toggle has no effect outside
    EPDS_TEST_HOOKS=1 + the literal value '1', so this stays a
    no-op in production. (Copilot)

  * Updated `@par-callback-error` scenario to assert the new
    clean-exit contract (browser lands back at the demo client with
    an auth error) instead of the static-HTML fallback. With
    `client_id` now flowing through the signed callback URL,
    pds-core's catch block resolves the client's metadata and
    redirects, so the static HTML fallback is only reached in the
    truly-no-clientId case which this scenario does not exercise.
    Removed the two now-orphan step defs (`the response body is not
    raw JSON`, `the response body explains that sign-in timed out`).
    (Blacksmith CI)

  * SonarQube S5332: changed `http://core:3000` to `https://` in the
    heartbeat unit test fixture. The literal flows from process.env
    to the mocked `pingParRequest` and back as a positional argument
    we assert on — never issued as a real request — but the lint
    rule doesn't introspect, and using https everywhere costs us
    nothing.

  * Trimmed the clean-exit changeset to End-users-only. The change
    has no client-developer action item; spec-compliant OAuth
    libraries already handle the error response shape we now emit
    consistently. Cross-references #150, #151 inline.

  * Design doc fixes: corrected the RFC 6749 §4.1.2.1 wording on
    `state` (the spec REQUIRES it on error redirects when present
    on the request; missing-state on the dead-PAR path is a
    pragmatic degradation, not spec-permitted), and added a note
    above the dead-end-page table flagging that some Target cells
    speculated about approaches that the resolved decisions
    replaced. (CodeRabbit x2)

927 unit tests pass (+ 4 new toggle tests + 1 new transient-on-thrown test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aspiers
Copy link
Copy Markdown
Contributor Author

aspiers commented May 4, 2026

(comment generated by Claude Code via the pr-comment-resolving skill)

Review feedback resolved in 9fb0e69

Inline threads addressed (see per-thread replies for details):

  • 3 of 4 Copilot threads fixed (heartbeat transient handling, recovery flow heartbeat toggle, scheme validation on /oauth/epds-callback's redirect target via the same helper that fixed the auth-service one).
  • All 5 CodeRabbit threads fixed (RFC state wording in the design doc, scheme validation across render-error.ts / clean-exit.ts / redirect-to-client-error.ts, heartbeat transient handling).
  • 1 Copilot thread deferred (multi-redirect-uri clients receive the OAuth error on redirect_uris[0] instead of the in-flight choice). The PAR is dead by the time this branch runs and the in-flight URI choice is unrecoverable; threading redirect_uri through the signed callback URL alongside client_id would give us the perfect answer but doubles the trust-boundary surface, so deferring until/unless a real multi-redirect-uri client trips it.

Other CI surfaces:

  • SonarQube S5332 — heartbeat unit test fixture changed from http://core:3000 to https://. The literal flows from process.env to the mocked pingParRequest and back as a positional argument we assert on (no real request issued), but the hotspot is gone.
  • Blacksmith — the @par-callback-error scenario was failing because clean-exit now redirects back to the demo client (the new contract) instead of rendering the static HTML page the old assertions checked for. Updated the scenario to assert the new contract; same shape as the four @otp-and-par-expiry scenarios. Old step defs (the response body is not raw JSON, the response body explains that sign-in timed out) removed.

CodeRabbit review-summary nitpicks NOT applied:

  • handleCallbackError complexity refactor — declined. The function is a simple linear cascade of three response strategies (captured-redirect → metadata-resolved-redirect → static HTML). Splitting it into helpers introduces three more named symbols and an extra layer of indirection without making any branch easier to test (each branch is already covered by 7+ unit tests). Refactor for refactor's sake.
  • Design doc status section dating — declined. The doc explicitly notes it is a living document at the top, and every claim in the status section is a concrete commit SHA + test count + tag name that's verifiable against a single git log -p. Adding an "as of " stamp on top of that gives no information git log doesn't already.

927 unit tests pass; @otp-and-par-expiry (4) + @par-heartbeat (1) + @par-callback-error (1) all green against a local docker-compose stack.

…passes out of the box

The @client-branding e2e scenarios assert that the trusted demo's
published OAuth metadata advertises a non-default brand_color and
branding.css, and pin the assertion to the amber palette's #1a1208
background. That only works if `EPDS_CLIENT_THEME` is set to a name
that exists in `packages/demo/src/lib/theme.ts`'s `presets` map —
currently `ocean` or `amber`. Without it, getTheme() returns null
and the metadata falls back to the unbranded defaults.

The previous .env.example had `EPDS_CLIENT_THEME=amber` commented
out under a "Required (uncomment) to run the client-branding e2e
feature locally" note. That worked for fresh contributors who read
the comment, but it left the docker-compose stack failing six
@client-branding scenarios out of the box for anyone who didn't.

Uncomment the default so `cp packages/demo/.env.example
packages/demo/.env` (which `scripts/setup.sh` does for fresh
checkouts) leaves the trusted demo themed, and the trusted-vs-
untrusted split that docker-compose.yml has always tried to produce
finally ends up visible.

The companion change in 8bcb344 (force-dynamic on the metadata
route) is a prerequisite — without it `next build` baked the
unset-env state into the static prerender and `EPDS_CLIENT_THEME`
at runtime was a no-op. Both changes ship together; no separate
changeset for either one (operator-noticeable behaviour combined
under the existing feature changesets).

Drops the demo-client-metadata-runtime-theme changeset from
8bcb344 — operator-facing detail of the dev-loop fix is captured
in CLAUDE / git log; release notes don't need it.

Verified locally:

  curl -sS https://demo.epds-poc1.test.certified.app/client-metadata.json \
    | jq '.brand_color, (.branding != null)'
  # "#f59e0b"  (amber primary, was "#2563eb" default)
  # true       (branding object now present, was absent)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 4, 2026 14:48
@railway-app railway-app Bot temporarily deployed to ePDS / ePDS-pr-154 May 4, 2026 14:48 Destroyed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/auth-service/src/routes/recovery.ts
Comment thread packages/auth-service/src/lib/clean-exit.ts
When the user's OTP has expired (waited >10 min, or made the same
mistake the original PR-154 reporter did) the verify form'd surface
the message "Invalid or expired code. Please try again." inside the
red error banner — but the actual Resend button lives further down
the form, below a separate "Back to email" link. Easy to miss.

Inline a "Send a new code" action right inside the error banner so
the recovery is one click from where the user's eyes already are.

Implementation:

  * `showErrorWithAction(msg, label, onClick)` builds the banner
    DOM imperatively. The label uses `.textContent` (not
    `.innerHTML`), so a reflected error string can never inject
    HTML. The inline button has no styling beyond underline + cursor
    so it reads as part of the error sentence.

  * The verify-form submit handler runs the better-auth error
    string through `/expir|too long/i` to decide whether to surface
    the inline action. The substring catches better-auth's "Invalid
    or expired code", auth-service's "OTP expired", and any future
    wording with "expir" or "too long" in it. Non-expired errors
    (typos, brute-force lockout) keep the plain showError path so
    a plain "Invalid code" doesn't carry an inappropriate "Send a
    new code" CTA.

  * Click target is the existing `#btn-resend` element, so all the
    rate-limiting, mailpit-clear, and OTP-state-bookkeeping the
    Resend button already does is reused — the inline action is
    purely an alternate entry point.

4 new tests pin the contract: textContent-only label sink (XSS
guard), the substring regex shape, the action label and click
target, and the non-expired fallback to plain showError.

Triggered by a user report against PR #154's Railway preview: the
basic clean-exit path was working (browser landed on demo with
`?error=auth_failed`) but the user reasonably read the demo's
generic "Authentication failed" banner as a dead end. Surfacing
the Resend earlier prevents the user from getting that far in the
first place when their OTP is the only thing that died.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/auth-service/src/routes/recovery.ts (1)

470-494: 💤 Low value

Heartbeat scripts are duplicated between render functions.

The PAR heartbeat script in renderRecoveryOtpForm (lines 470-494) is nearly identical to the one in renderRecoveryForm (lines 359-385). Consider extracting a shared helper function that generates the script string, accepting the heartbeat flag as a parameter.

That said, given these are self-contained inline scripts and the duplication is local to this file, this is a minor observation that can be deferred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/auth-service/src/routes/recovery.ts` around lines 470 - 494, Extract
the duplicated PAR heartbeat inline script into a single helper function (e.g.,
getParHeartbeatScript or createParHeartbeatScript) that accepts the heartbeat
flag (previously using JSON.stringify(opts.heartbeatEnabled)) and returns the
same script string; update renderRecoveryOtpForm and renderRecoveryForm to call
this helper instead of inlining the script so the ping/stop logic, setInterval,
beforeunload handler, and form submit listeners remain identical and behavior is
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/auth-service/src/routes/recovery.ts`:
- Around line 470-494: Extract the duplicated PAR heartbeat inline script into a
single helper function (e.g., getParHeartbeatScript or createParHeartbeatScript)
that accepts the heartbeat flag (previously using
JSON.stringify(opts.heartbeatEnabled)) and returns the same script string;
update renderRecoveryOtpForm and renderRecoveryForm to call this helper instead
of inlining the script so the ping/stop logic, setInterval, beforeunload
handler, and form submit listeners remain identical and behavior is unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bd303926-3b7f-4bff-8374-d1ecb0c99fd2

📥 Commits

Reviewing files that changed from the base of the PR and between fcab1c7 and 3d31876.

📒 Files selected for processing (18)
  • .changeset/clean-exit-on-expired-signin.md
  • packages/auth-service/src/__tests__/build-epds-callback-url.test.ts
  • packages/auth-service/src/__tests__/clean-exit.test.ts
  • packages/auth-service/src/__tests__/heartbeat-toggle.test.ts
  • packages/auth-service/src/__tests__/login-page.test.ts
  • packages/auth-service/src/__tests__/redirect-to-client-error.test.ts
  • packages/auth-service/src/__tests__/render-error.test.ts
  • packages/auth-service/src/lib/clean-exit.ts
  • packages/auth-service/src/routes/complete.ts
  • packages/auth-service/src/routes/login-page.ts
  • packages/auth-service/src/routes/recovery.ts
  • packages/pds-core/src/__tests__/epds-callback-error.test.ts
  • packages/pds-core/src/lib/epds-callback-error.ts
  • packages/shared/src/__tests__/render-error.test.ts
  • packages/shared/src/__tests__/start-over-href.test.ts
  • packages/shared/src/index.ts
  • packages/shared/src/start-over-href.ts
  • vitest.config.ts
✅ Files skipped from review due to trivial changes (2)
  • .changeset/clean-exit-on-expired-signin.md
  • vitest.config.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/auth-service/src/tests/heartbeat-toggle.test.ts
  • packages/auth-service/src/routes/login-page.ts
  • packages/pds-core/src/tests/epds-callback-error.test.ts

@blacksmith-sh

This comment has been minimized.

CI's e2e suite failed on `Too many failed OTP attempts locks out
the token` and the related lockout scenario after 3d31876 added the
inline "Send a new code" action to the OTP-expired error banner.
The `'the verification form shows an "OTP expired" error'` step
asserted `toHaveText(expected)` (exact equality), so the appended
action label broke the match.

Switch to `toContainText`. The expected string ("OTP expired",
"locked out", etc.) still has to be present in the banner — we
just no longer fail when extra inline UI sits next to it.

Local 4 @otp-and-par-expiry scenarios still pass; the brute-force
lockout scenarios that broke on CI use the same step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-in

A user reported on PR #154's Railway preview: starts flow3 with
prompt=login, waits 11 min on the OTP form, clicks Resend, gets a
fresh OTP, types it in, lands on the demo with "Authentication
failed". Technically the clean-exit was working (lands at the demo
with `?error=auth_failed`, can retry), but the experience is a lie:
we offered Resend as if it would complete the flow, accepted their
fresh code, then told them sign-in failed.

Fix: don't pretend Resend will work when the upstream PAR is
already dead.

  * **`GET /auth/abort` route** (auth-service). Browser-driven
    clean exit. Reads the auth_flow cookie, runs the same
    `cleanExit()` helper as the unrecoverable-error paths in
    /auth/complete and /auth/choose-handle: redirect to the OAuth
    client's redirect_uri with `error=access_denied`, or fall back
    to the styled "Return to sign in" page when the client is
    unknown. Cookie is cleared because the flow is being
    abandoned.

  * **Proactive notice on the OTP form.** When the heartbeat's
    /auth/ping returns par_expired / flow_expired / no_cookie, the
    form replaces its current state with: "Your sign-in has timed
    out. The code we sent will no longer work. Start sign-in
    again from the app you came from." OTP boxes, Resend, Back,
    and Verify are all disabled. A single Start over button
    navigates to /auth/abort. No automatic redirect — the user
    makes the choice. Clicking elsewhere on the page (e.g. the
    Powered by Certified link) still works.

  * **Reactive abort gates** on the Resend click and Verify
    submit, defence in depth on top of the proactive notice. Each
    pings /auth/ping just-in-time and bails to /auth/abort if the
    flow is dead. Catches the race where the user beats the
    proactive notice (heartbeat hasn't fired yet, network glitch
    delayed the response, etc.). Resend in particular MUST gate
    here — otherwise better-auth would happily issue an OTP that
    cannot complete the flow.

  * Existing pre-PR-154 scenarios `@otp-and-par-expiry` and
    `@multiple-resend` updated to match the new contract: the
    user no longer sees the misleading "OTP expired → Resend →
    fresh OTP → fails" cycle. Both scenarios still assert "lands
    at demo with auth_failed" — same end state, fewer
    intermediate UI steps because the abort gate fires earlier.

Tests:

  * 4 new tests on /auth/abort: cleanExit invocation shape,
    cookie clearing, null-clientId paths.
  * 7 new login-page render-shape tests covering
    showFlowAbortedNotice (idempotence, every form control gets
    disabled, textContent-only label sink, /auth/abort target),
    the proactive heartbeat trigger, and the reactive Resend /
    Verify gates running BEFORE the action they protect.
  * 1 new e2e scenario `@resend-after-par-dead` proving the
    contract: PAR dead, click Resend → land at demo with
    `auth_failed`, no fresh OTP email is sent.

1017 unit tests pass. All 7 OTP/PAR e2e scenarios green locally.

Updated docs/design/par-expiry-and-clean-exits.md with the new
"Bug found post-PR-154" section explaining what broke, why
heartbeat alone wasn't enough, and the proactive-notice +
reactive-gate design.

Updated the clean-exit changeset to mention the new behaviour
(rather than adding a separate one — same user-facing story).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eDoS-prone regex

SonarCloud's quality gate failed on PR #154 with two issues
introduced by 86cd60f:

  * Duplicated lines (3.8% > 3% threshold). The new
    auth-abort.test.ts shared 24 lines of buildApp +
    listen-on-ephemeral-port boilerplate with the existing
    heartbeat.test.ts — both test the same router with the same
    ctx shape.

  * Security hotspot S5852 on login-page.test.ts:688. The
    `[\s\S]*?` between two literal anchors creates a
    super-linear-runtime regex (ReDoS) — fine for a one-off
    runtime check on a small string but flagged because the
    same shape would be unsafe on user input.

Fixes:

  * Extract `buildHeartbeatApp` + `harnessGet` helpers into
    `__tests__/__helpers__/heartbeat-router-harness.ts`. Both
    test files now import them. Vitest's test pattern is
    `*.test.ts`, so the helper isn't picked up as a test
    suite. Coverage excludes already cover `__tests__/**`.

  * Replace the ReDoS-prone regex with a substring-based slice
    + two `toContain` checks. Same assertion shape (guard runs,
    stopHeartbeat + showFlowAbortedNotice both called), no
    backtracking risk.

1017 unit tests still pass; 11 of those run against the new
shared harness (5 auth-abort + 6 heartbeat).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ePDS / ePDS-pr-154 May 4, 2026 18:00 Destroyed
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 4, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/auth-service/src/routes/login-page.ts`:
- Around line 790-820: The showFlowAbortedNotice function currently disables OTP
inputs and buttons but leaves the `#recovery-link` anchor interactive; hide the
recovery link instead of trying to disable it so nothing remains clickable after
abort. In showFlowAbortedNotice (the function that sets flowAborted and disables
otpBoxes, btn-resend, btn-back, and the form submit), locate
document.getElementById('recovery-link') and set its style.display = 'none'
(consistent with other visibility changes in the IIFE) so the anchor is removed
from view/clickability when the abort notice is shown. Ensure the change is only
applied inside the flowAborted guard so the link remains available otherwise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b21344f-5df5-48a0-a8cf-174da5f43c08

📥 Commits

Reviewing files that changed from the base of the PR and between 3d31876 and 86cd60f.

📒 Files selected for processing (8)
  • .changeset/clean-exit-on-expired-signin.md
  • docs/design/par-expiry-and-clean-exits.md
  • e2e/step-definitions/auth.steps.ts
  • features/passwordless-authentication.feature
  • packages/auth-service/src/__tests__/auth-abort.test.ts
  • packages/auth-service/src/__tests__/login-page.test.ts
  • packages/auth-service/src/routes/heartbeat.ts
  • packages/auth-service/src/routes/login-page.ts
✅ Files skipped from review due to trivial changes (1)
  • .changeset/clean-exit-on-expired-signin.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/auth-service/src/routes/heartbeat.ts
  • e2e/step-definitions/auth.steps.ts

Comment thread packages/auth-service/src/routes/login-page.ts
@Kzoeps
Copy link
Copy Markdown
Contributor

Kzoeps commented May 5, 2026

Missing buttons to restart sign in again here:

image

Steps to reproduce:

  1. Login using untrusted.
  2. Get to sign in using code
  3. tamper with auth-flow cookie
  4. End up at the above page without a way to go back to sign in

Edge case and could be ignore but just wanted to highlight the missing sign in button since this was the only way to emulate client_id missing.

@aspiers
Copy link
Copy Markdown
Contributor Author

aspiers commented May 5, 2026

@Kzoeps commented on May 5, 2026, 10:29 AM GMT+1:

  1. tamper with auth-flow cookie
  2. End up at the above page without a way to go back to sign in

Edge case and could be ignore but just wanted to highlight the missing sign in button since this was the only way to emulate client_id missing.

What do you mean by "tamper with", exactly? I think deleting or somehow expiring the cookie would be a reasonable thing to test, but just editing to an invalid value it is not something we need to spend time worrying about.

Obviously if client_id is missing then it's either difficult or impossible to know where to redirect the user.

@Kzoeps
Copy link
Copy Markdown
Contributor

Kzoeps commented May 5, 2026

@Kzoeps commented on May 5, 2026, 10:29 AM GMT+1:

  1. tamper with auth-flow cookie
  2. End up at the above page without a way to go back to sign in

Edge case and could be ignore but just wanted to highlight the missing sign in button since this was the only way to emulate client_id missing.

What do you mean by "tamper with", exactly? I think deleting or somehow expiring the cookie would be a reasonable thing to test, but just editing to an invalid value it is not something we need to spend time worrying about.

Obviously if client_id is missing then it's either difficult or impossible to know where to redirect the user.

yep i basically edited the cookie. Makes sense to ignore this.

@aspiers
I just didnt have a way of testing this:

When the originating app can be identified but its registered redirect URI fails to resolve, the page renders with a "Return to sign in" button targeted at the app's home URL

so that was kinda a work around to see if it renders

@aspiers
Copy link
Copy Markdown
Contributor Author

aspiers commented May 5, 2026

Tested by Karma and myself. Unfortunately there are a million different code paths so we can't possibly cover them all, but we do already have a huge body of automated tests too. Feels like this should be several steps forwards even though uncomfortable making sizable changes at this late stage. The status quo is some bad expiries and users getting trapped in a loop, so we need to move forwards with a fix.

@aspiers aspiers merged commit 72d9113 into main May 5, 2026
15 checks passed
@aspiers aspiers deleted the fix/otp-resend-after-par-expiry branch May 5, 2026 15:20
aspiers added a commit that referenced this pull request May 5, 2026
Adds a second @otp-expiry scenario that backdates the auth_flow row
via the existing /_internal/test/expire-auth-flow hook, then submits
a still-valid OTP. After PR #154's reactive abort gate the OTP form
pings /auth/ping before submitting; with the auth_flow row dead the
ping reports `flow_expired`, the gate navigates to /auth/abort, and
cleanExit serves its Tier-2 styled "Sign-in session expired" fallback
page (the OAuth client redirect path needs the dead row's clientId,
which is exactly what's missing here).

The scenario asserts both signals — the ping reason (proving
auth_flow specifically tripped, not PAR) and the abort fallback
page — so a regression that, say, swaps which timer the gate honours
would still be caught.

Without this guardrail nothing in CI would notice if AUTH_FLOW_TTL_MS
were quietly shortened back to the OTP TTL; the existing scenario
only proves the 10-min-and-resend path works, not that the 60-min
boundary is enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aspiers added a commit that referenced this pull request May 5, 2026
Adds a second @otp-expiry scenario that backdates the auth_flow row
via the existing /_internal/test/expire-auth-flow hook, then submits
a still-valid OTP. After PR #154's reactive abort gate the OTP form
pings /auth/ping before submitting; with the auth_flow row dead the
ping reports `flow_expired`, the gate navigates to /auth/abort, and
cleanExit serves its Tier-2 styled "Sign-in session expired" fallback
page (the OAuth client redirect path needs the dead row's clientId,
which is exactly what's missing here).

The scenario asserts both signals — the ping reason (proving
auth_flow specifically tripped, not PAR) and the abort fallback
page — so a regression that, say, swaps which timer the gate honours
would still be caught.

Without this guardrail nothing in CI would notice if AUTH_FLOW_TTL_MS
were quietly shortened back to the OTP TTL; the existing scenario
only proves the 10-min-and-resend path works, not that the 60-min
boundary is enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aspiers added a commit that referenced this pull request May 5, 2026
Adds a second @otp-expiry scenario that backdates the auth_flow row
via the existing /_internal/test/expire-auth-flow hook, then submits
a still-valid OTP. After PR #154's reactive abort gate the OTP form
pings /auth/ping before submitting; with the auth_flow row dead the
ping reports `flow_expired`, the gate navigates to /auth/abort, and
cleanExit serves its Tier-2 styled "Sign-in session expired" fallback
page (the OAuth client redirect path needs the dead row's clientId,
which is exactly what's missing here).

The scenario asserts both signals — the ping reason (proving
auth_flow specifically tripped, not PAR) and the abort fallback
page — so a regression that, say, swaps which timer the gate honours
would still be caught.

Without this guardrail nothing in CI would notice if AUTH_FLOW_TTL_MS
were quietly shortened back to the OTP TTL; the existing scenario
only proves the 10-min-and-resend path works, not that the 60-min
boundary is enforced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

renderError pages have no recovery CTA

3 participants