Skip to content

H-6219: Restore TOTP MFA and fix AAL2 login redirect#8622

Merged
TimDiekmann merged 8 commits intomainfrom
t/h-6219-fix-and-restore-totp-mfa-settings
Apr 15, 2026
Merged

H-6219: Restore TOTP MFA and fix AAL2 login redirect#8622
TimDiekmann merged 8 commits intomainfrom
t/h-6219-fix-and-restore-totp-mfa-settings

Conversation

@TimDiekmann
Copy link
Copy Markdown
Member

@TimDiekmann TimDiekmann commented Apr 13, 2026

🌟 What is the purpose of this PR?

Re-enables the TOTP MFA settings UI that was shipped disabled in H-2421 (hidden behind {false && …} with a @todo H-6219 restore this note) and fixes the underlying issue that had broken it on staging.

The blocker was Kratos emitting redirect_browser_to: app.hash.ai/self-service/login/browser?aal=aal2 after a password-only login from a TOTP-enabled user. Kratos builds that URL from SERVE_PUBLIC_BASE_URL (set to the frontend origin) but no route on the frontend actually serves Kratos-native paths, so following it verbatim dead-ended on 404. The AAL2 form was never shown.

🔗 Related links

🚫 Blocked by

None.

🔍 What does this change?

Frontend

  • apps/hash-frontend/src/pages/settings/security.page.tsx
    • Reactivates the TOTP enrolment / disable / backup-code UI block.
    • TOTP disable now also submits lookup_secret_disable: true. With required_aal: highest_available a leftover backup-code credential kept forcing AAL2 at next login with no TOTP to answer it.
    • Removes the pre-existing placebo "enter a current TOTP code to disable" step — Kratos v26 ignores totp_code for an already-enrolled identity, so it gated nothing. The disable form now shows a plain "yes, disable 2FA" confirmation and a warning that backup codes will also be removed.
    • Surfaces previously-silent failures on the clipboard-copy, backup-code regeneration, and "I've saved my codes" paths so the user sees an error instead of getting a half-broken account.
    • Backup-code extraction regex now accepts Kratos v26's no-dash 8-char codes (XXXXXXXX) in addition to the old XXXX-XXXX layout. Without this the UI rendered all 12 codes as one block and the AAL2-login test submitted the whole comma-separated string.
  • apps/hash-frontend/src/pages/shared/ory-kratos.ts
    • New flowMetadata record mapping each Kratos self-service flow to uiPath + kratosBrowserPath.
    • New uiPathForKratosBrowserRedirect helper that rewrites Kratos-native /self-service/*/browser URLs to the matching frontend route.
  • apps/hash-frontend/src/pages/shared/use-kratos-flow-error-handler.ts
    • session_aal2_required, session_refresh_required, browser_location_change_required now route through the rewriter before calling router.replace.
    • The old ad-hoc router.push(\/${flowType}`)fallbacks useflowMetadata[flowType].uiPath. Fixes a latent bug where flowType: "registration"would have pushed to a non-existent/registration` route.
  • apps/hash-frontend/src/pages/signin.page.tsx
    • Flow-reuse guard compares flow.requested_aal against the URL ?aal param. Without this, the AAL1 flow loaded at /signin persisted through the client-side navigation to /signin?aal=aal2 and the AAL2 form never rendered.
    • The two remaining raw redirect_browser_to follows (login-success continue_with branch, toSession 403 fallback) now go through uiPathForKratosBrowserRedirect too.

Backend (API proxy)

  • apps/hash-api/src/index.ts
    • Replaces the single authRouteRateLimiter on the Kratos /auth/* proxy with tiered limiters: non-GET credentials (30/10s), GETs other than whoami (60/10s), whoami exempted. The redundant whoamis the frontend emits during MFA flows were consuming the old 12/10s IP budget and surfacing as 429s.
    • userIdentifierRateLimiter only applies when body.identifier is actually present (its documented purpose), instead of falling back to IP.
    • Safe ipKey keyGenerator replaces the req.ip! assertion used by the previous default generator.

Test harness

  • tests/hash-playwright/tests/shared/totp-utils.ts: rewrites decodeBase32 so the accumulator is truncated after every emitted byte. The old implementation overflowed JavaScript number precision past 53 bits and silently produced a half-zero key, so generated TOTP codes were rejected by Kratos.
  • tests/hash-playwright/tests/shared/get-kratos-verification-code.ts: parses Mailslurper's timezone-less dateSent strings as UTC (they were being interpreted as local time, causing the 5-second window filter to drop every legitimate email on CEST machines).
  • tests/hash-playwright/tests/shared/signup-utils.ts: adds a per-process suffix to test emails and shortnames. resetDb() is still a no-op and POST /users/delete intentionally preserves the user's web principal (so shortnames survive), which broke re-runs with the static names.
  • tests/hash-playwright/tests/shared/runtime.ts: replaces the blanket status-code regex for the 401 tolerance with a URL-aware credit-counter. Each tolerated 4xx response on a specific Kratos path grants exactly one Failed to load resource console suppression, so real failures from other endpoints still flag.
  • tests/hash-playwright/tests/mfa.spec.ts: unskips all five TOTP tests; the disable test additionally asserts the regenerate-backup-codes-button is gone so a regression in the lookup_secret_disable step would fail the assertion.

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

  • The SERVE_PUBLIC_BASE_URL misconfiguration that caused the redirect mismatch is worked around in the frontend rather than corrected at the source. Proper config fix tracked in H-6419.
  • Disabling 2FA currently relies only on the logged-in privileged session — no explicit re-authentication. A proper SSO-compatible session-refresh step is tracked in H-6417 (same note applies to the password-change flow).
  • SSO-only users cannot change their password today (pre-existing limitation of handlePasswordSubmit). Documented and tracked in H-6418.
  • The test runtime's URL-aware 4xx tolerance consumes suppression credits by status code only, so a rare interleaving of a real and an expected failure at the same status can mask the real one. Acceptable trade-off against blanket global tolerance.

🐾 Next steps

  • H-6417 — Forcible session refresh for sensitive settings changes (SSO-compatible)
  • H-6418 — Password setup / reset flow for SSO-only users
  • H-6419 — Fix Kratos `SERVE_PUBLIC_BASE_URL` / frontend-proxy mismatch

🛡 What tests cover this?

  • tests/hash-playwright/tests/mfa.spec.ts — five E2E tests covering: enable TOTP, AAL2 login with TOTP, AAL2 login with backup code, disable TOTP, rejection of wrong code. All green locally.

❓ How to test this?

  1. Check out the branch, deploy FE + BE together.
  2. Sign in with an existing user, go to /settings/security, click "Enable TOTP".
  3. Scan the QR code, enter the 6-digit code, save the generated backup codes.
  4. Log out, log back in with password alone — confirm the AAL2 form appears, enter a TOTP code or toggle to backup-code mode.
  5. From /settings/security, click "Disable TOTP" and confirm — log out, log back in with password; no AAL2 prompt should appear.

📹 Demo

To add once manual staging verification is done.

Re-enables the TOTP settings UI that shipped disabled in H-2421
(wrapped in `{false && …}` with a `@todo H-6219 restore this` note)
and fixes the underlying cause that had broken it on staging.

The blocker was Kratos's `browser_location_change_required` response
pointing at `app.hash.ai/self-service/login/browser?aal=aal2` — a URL
Kratos builds from `SERVE_PUBLIC_BASE_URL` but which no route on the
frontend actually serves. Following the redirect verbatim dead-ends
on 404. The frontend Kratos error handler now rewrites these native
browser paths to the matching UI route (`/signin?aal=aal2`, etc.) and
the signin page re-creates the flow at the right AAL level. Proper
fix tracked in H-6419 (config change) so this workaround can be
rolled back later.

Along the way: loosened the Kratos proxy rate-limiting so normal
page loads don't hit 12/10s, fixed a handful of test-harness bugs
(broken base32 decoder, mailslurper timestamp parsing, no-op
`resetDb`) that were hiding the real MFA defects.
@github-actions github-actions Bot added area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-api Affects the HASH API (app) type/eng > frontend Owned by the @frontend team type/eng > backend Owned by the @backend team area/tests New or updated tests area/tests > playwright New or updated Playwright tests area/apps labels Apr 13, 2026
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hash Ready Ready Preview, Comment Apr 14, 2026 8:57am
3 Skipped Deployments
Project Deployment Actions Updated (UTC)
hashdotdesign Ignored Ignored Preview Apr 14, 2026 8:57am
hashdotdesign-tokens Ignored Ignored Preview Apr 14, 2026 8:57am
petrinaut Skipped Skipped Apr 14, 2026 8:57am

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.50%. Comparing base (a70891f) to head (de205e1).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8622      +/-   ##
==========================================
- Coverage   62.50%   62.50%   -0.01%     
==========================================
  Files        1318     1318              
  Lines      134219   134229      +10     
  Branches     5518     5520       +2     
==========================================
  Hits        83895    83895              
- Misses      49409    49419      +10     
  Partials      915      915              
Flag Coverage Δ
apps.hash-api 0.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread apps/hash-frontend/src/pages/settings/security.page.tsx
Review-agent follow-ups on the MFA restoration:

Silent-failure fixes
- security.page.tsx: the backup-code copy-button no longer swallows
  clipboard write failures; the TOTP-enable, TOTP-disable step-3 and
  "I've saved my codes" paths all surface errors instead of letting
  an undefined flow drop out unnoticed.
- signin.page.tsx: the two remaining raw `redirect_browser_to`
  follows (the login-success `continue_with` path and the
  toSession AAL2 403 fallback) now go through
  `uiPathForKratosBrowserRedirect`, closing the same 404 class the
  error handler already covered.

Rate-limit tightening
- `kratosProxyMutationRateLimiter` now skips only GETs, so PUT /
  PATCH / DELETE (should Kratos ever use them) count against the
  limit rather than bypassing both tiers.

Docs / cosmetics
- Fix `ipKey` JSDoc string mismatch (now matches the
  "ip-unavailable" literal).
- Flesh out the `@todo H-6417` note so it survives the ticket.
- Document that `flowMetadata.settingsWithPassword` intentionally
  shares paths with `settings`; document the URL-fragment drop in
  `uiPathForKratosBrowserRedirect`; drop the concrete date example
  from the Mailslurper parser comment.
- Tighten the 422 tolerance comment in `runtime.ts`.

Lint fixes flagged by CI
- `runtime.ts`: rename the destructured `s` to match id-length.
- `totp-utils.ts`: switch `decodeBase32` from bitwise operators to
  arithmetic so the `no-bitwise` lint passes (matches the style of
  the sibling `generateTotpCode`). Behaviour verified against the
  observed Kratos secret.

Test strengthening
- `mfa.spec.ts`: the disable test now asserts the
  "Regenerate backup codes" button is gone after disabling, which
  catches regressions in the new `lookup_secret_disable` step.
Kratos v26 (confirmed: `oryd/kratos:v26.2` base image in
`apps/hash-external-services/kratos/Dockerfile`) ignores the
`totp_code` field when submitted for an already-enrolled identity
— the settings flow only exposes `totp_unlink` for enrolled users.
The code input in the disable form gated nothing.

Replace it with a plain confirmation ("yes, disable two-factor
authentication") plus a warning that backup codes will be cleared
too. Drops the `disableTotpCode` state, its input field, and the
no-op submit step. The disable handler is now a straight
`totp_unlink` → `lookup_secret_disable` sequence.

Playwright test updated to skip the (now non-existent) code entry.
The password-change flow runs a Kratos refresh-login with the
current password for its privileged-session guarantee. That has
the same two open problems the TOTP-disable flow flags:

- SSO-only users can't produce a current password, so the feature
  is unavailable to them (tracked separately in H-6418).
- Even for password users, the refresh is implicit on the current
  session's privilege window rather than an explicit reauth — the
  SSO-compatible forced refresh is H-6417.

Add a pointer to both tickets so the next person touching this
handler sees the same constraints the disable handler already
documents.
`.env.test` pins the exact email strings in
`USER_EMAIL_ALLOW_LIST` — any suffix applied to the test emails
makes them fall outside the allowlist, which sends the user to the
waitlist page instead of the signup-completion page, and every MFA
test then times out waiting for "Thanks for confirming your
account" in CI.

The suffix was only needed to cope with locally-persistent webs
after `POST /users/delete` (which soft-deletes and preserves the
web principal). CI always starts with a fresh database, so the
collision doesn't happen there.

Revert the test helper to the plain static identifiers; a comment
records the local re-run caveat and points at the standard
workaround (re-seed the stack between runs).
@vercel vercel Bot temporarily deployed to Preview – petrinaut April 13, 2026 18:07 Inactive
Two complementary changes to make the MFA tests locally repeatable
without a full stack re-seed, while still matching the email
allowlist in .env.test for CI:

1. Before each createUserAndCompleteSignup call, delete any
   leftover Kratos identity via the admin /users/delete endpoint
   (resolves silently on 404). This prevents "identifier already
   exists" on the second local run.

2. Randomise the shortname per test invocation (same Date.now +
   random pattern as signup.spec.ts). The web principal from the
   previous run stays behind by design, so using the same static
   shortname would fail with "already taken". A fresh random suffix
   avoids the collision; orphan webs accumulate but don't block.

The email itself stays static (e.g. mfa-enable-totp@example.com)
to satisfy USER_EMAIL_ALLOW_LIST in .env.test.
@vercel vercel Bot temporarily deployed to Preview – petrinaut April 14, 2026 08:28 Inactive
Comment thread tests/hash-playwright/tests/shared/signup-utils.ts
@vercel vercel Bot temporarily deployed to Preview – petrinaut April 14, 2026 08:43 Inactive
@TimDiekmann TimDiekmann marked this pull request as ready for review April 14, 2026 08:44
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 14, 2026

PR Summary

Medium Risk
Touches authentication/MFA flows and backend rate limiting; mistakes could cause login failures, unintended redirects, or weaken/over-tighten brute-force protections.

Overview
Restores and hardens TOTP-based MFA end-to-end. The security settings page re-enables TOTP enrolment/disable and backup-code management, updates backup-code parsing for newer Kratos formats, ensures disabling TOTP also removes lookup_secret backup codes, and surfaces previously-silent failure cases (backup code generation/confirmation and clipboard copy).

Fixes Kratos AAL2 login redirects and flow routing. Adds flowMetadata + uiPathForKratosBrowserRedirect to rewrite Kratos redirect_browser_to URLs away from /self-service/*/browser paths, uses this rewrite in the shared Kratos error handler and signin flow, and adjusts signin flow reuse so navigating to /signin?aal=aal2 reliably renders the AAL2 form.

Adjusts auth proxy protections and test reliability. The /auth proxy rate limiting is split into separate read/mutation limits (with /sessions/whoami exempt) and the per-identifier limiter now only applies when an identifier is present. Playwright MFA tests are unskipped and the test harness is updated (base32 decode for TOTP correctness, mailslurper timestamp parsing, deterministic user cleanup/unique shortnames, and URL-scoped tolerance for expected 4xx console noise).

Reviewed by Cursor Bugbot for commit c5ca691. Bugbot is set up for automated code reviews on this repo. Configure here.

@graphite-app graphite-app Bot requested review from a team April 14, 2026 08:44
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 14, 2026

🤖 Augment PR Summary

Summary: Restores the TOTP MFA settings UI and fixes AAL2 login redirects so Kratos AAL-upgrade flows reliably land on the correct frontend routes.

Changes:

  • Re-enables the Security page TOTP enable/disable and backup-code UX, including clearing leftover backup-code credentials on disable and surfacing previously-silent failures.
  • Adds a centralized flowMetadata map and a uiPathForKratosBrowserRedirect helper to rewrite Kratos /self-service/*/browser redirects to actual frontend routes.
  • Updates the Kratos flow error handler and sign-in page to use the redirect rewriter and to ensure AAL2 flows render when navigating to /signin?aal=aal2 without a remount.
  • Adjusts API-proxy rate limiting for Kratos endpoints (tiered GET vs mutation limits; exempts /sessions/whoami; safer IP keying).
  • Fixes Playwright MFA harness issues (base32 decoding precision, Mailslurper timestamp parsing, unique test identities) and re-enables the TOTP E2E tests.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread apps/hash-frontend/src/pages/signin.page.tsx Outdated
Comment thread apps/hash-frontend/src/pages/shared/ory-kratos.ts
If Kratos omits requested_aal on AAL1 flows (the field is optional
in the Ory client types), comparing undefined === "aal1" would
always be false, causing the effect to recreate the flow on every
render.

Switch to a boolean comparison: both sides are true only when AAL2
is explicitly requested/present. undefined, "aal1", or any other
value all collapse to "not AAL2" and match the default case.
@TimDiekmann TimDiekmann enabled auto-merge April 15, 2026 08:11
@TimDiekmann TimDiekmann added this pull request to the merge queue Apr 15, 2026
Merged via the queue into main with commit ae9a7c0 Apr 15, 2026
49 checks passed
@TimDiekmann TimDiekmann deleted the t/h-6219-fix-and-restore-totp-mfa-settings branch April 15, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-api Affects the HASH API (app) area/apps area/tests > playwright New or updated Playwright tests area/tests New or updated tests type/eng > backend Owned by the @backend team type/eng > frontend Owned by the @frontend team

Development

Successfully merging this pull request may close these issues.

2 participants