H-6219: Restore TOTP MFA and fix AAL2 login redirect#8622
Conversation
Re-enables the TOTP settings UI that shipped disabled in H-2421 (wrapped in `{false && …}` with a `@todo H-6219 restore this` note) and fixes the underlying cause that had broken it on staging. The blocker was Kratos's `browser_location_change_required` response pointing at `app.hash.ai/self-service/login/browser?aal=aal2` — a URL Kratos builds from `SERVE_PUBLIC_BASE_URL` but which no route on the frontend actually serves. Following the redirect verbatim dead-ends on 404. The frontend Kratos error handler now rewrites these native browser paths to the matching UI route (`/signin?aal=aal2`, etc.) and the signin page re-creates the flow at the right AAL level. Proper fix tracked in H-6419 (config change) so this workaround can be rolled back later. Along the way: loosened the Kratos proxy rate-limiting so normal page loads don't hit 12/10s, fixed a handful of test-harness bugs (broken base32 decoder, mailslurper timestamp parsing, no-op `resetDb`) that were hiding the real MFA defects.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8622 +/- ##
==========================================
- Coverage 62.50% 62.50% -0.01%
==========================================
Files 1318 1318
Lines 134219 134229 +10
Branches 5518 5520 +2
==========================================
Hits 83895 83895
- Misses 49409 49419 +10
Partials 915 915
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Review-agent follow-ups on the MFA restoration: Silent-failure fixes - security.page.tsx: the backup-code copy-button no longer swallows clipboard write failures; the TOTP-enable, TOTP-disable step-3 and "I've saved my codes" paths all surface errors instead of letting an undefined flow drop out unnoticed. - signin.page.tsx: the two remaining raw `redirect_browser_to` follows (the login-success `continue_with` path and the toSession AAL2 403 fallback) now go through `uiPathForKratosBrowserRedirect`, closing the same 404 class the error handler already covered. Rate-limit tightening - `kratosProxyMutationRateLimiter` now skips only GETs, so PUT / PATCH / DELETE (should Kratos ever use them) count against the limit rather than bypassing both tiers. Docs / cosmetics - Fix `ipKey` JSDoc string mismatch (now matches the "ip-unavailable" literal). - Flesh out the `@todo H-6417` note so it survives the ticket. - Document that `flowMetadata.settingsWithPassword` intentionally shares paths with `settings`; document the URL-fragment drop in `uiPathForKratosBrowserRedirect`; drop the concrete date example from the Mailslurper parser comment. - Tighten the 422 tolerance comment in `runtime.ts`. Lint fixes flagged by CI - `runtime.ts`: rename the destructured `s` to match id-length. - `totp-utils.ts`: switch `decodeBase32` from bitwise operators to arithmetic so the `no-bitwise` lint passes (matches the style of the sibling `generateTotpCode`). Behaviour verified against the observed Kratos secret. Test strengthening - `mfa.spec.ts`: the disable test now asserts the "Regenerate backup codes" button is gone after disabling, which catches regressions in the new `lookup_secret_disable` step.
Kratos v26 (confirmed: `oryd/kratos:v26.2` base image in
`apps/hash-external-services/kratos/Dockerfile`) ignores the
`totp_code` field when submitted for an already-enrolled identity
— the settings flow only exposes `totp_unlink` for enrolled users.
The code input in the disable form gated nothing.
Replace it with a plain confirmation ("yes, disable two-factor
authentication") plus a warning that backup codes will be cleared
too. Drops the `disableTotpCode` state, its input field, and the
no-op submit step. The disable handler is now a straight
`totp_unlink` → `lookup_secret_disable` sequence.
Playwright test updated to skip the (now non-existent) code entry.
The password-change flow runs a Kratos refresh-login with the current password for its privileged-session guarantee. That has the same two open problems the TOTP-disable flow flags: - SSO-only users can't produce a current password, so the feature is unavailable to them (tracked separately in H-6418). - Even for password users, the refresh is implicit on the current session's privilege window rather than an explicit reauth — the SSO-compatible forced refresh is H-6417. Add a pointer to both tickets so the next person touching this handler sees the same constraints the disable handler already documents.
`.env.test` pins the exact email strings in `USER_EMAIL_ALLOW_LIST` — any suffix applied to the test emails makes them fall outside the allowlist, which sends the user to the waitlist page instead of the signup-completion page, and every MFA test then times out waiting for "Thanks for confirming your account" in CI. The suffix was only needed to cope with locally-persistent webs after `POST /users/delete` (which soft-deletes and preserves the web principal). CI always starts with a fresh database, so the collision doesn't happen there. Revert the test helper to the plain static identifiers; a comment records the local re-run caveat and points at the standard workaround (re-seed the stack between runs).
Two complementary changes to make the MFA tests locally repeatable without a full stack re-seed, while still matching the email allowlist in .env.test for CI: 1. Before each createUserAndCompleteSignup call, delete any leftover Kratos identity via the admin /users/delete endpoint (resolves silently on 404). This prevents "identifier already exists" on the second local run. 2. Randomise the shortname per test invocation (same Date.now + random pattern as signup.spec.ts). The web principal from the previous run stays behind by design, so using the same static shortname would fail with "already taken". A fresh random suffix avoids the collision; orphan webs accumulate but don't block. The email itself stays static (e.g. mfa-enable-totp@example.com) to satisfy USER_EMAIL_ALLOW_LIST in .env.test.
PR SummaryMedium Risk Overview Fixes Kratos AAL2 login redirects and flow routing. Adds Adjusts auth proxy protections and test reliability. The Reviewed by Cursor Bugbot for commit c5ca691. Bugbot is set up for automated code reviews on this repo. Configure here. |
🤖 Augment PR SummarySummary: Restores the TOTP MFA settings UI and fixes AAL2 login redirects so Kratos AAL-upgrade flows reliably land on the correct frontend routes. Changes:
🤖 Was this summary useful? React with 👍 or 👎 |
If Kratos omits requested_aal on AAL1 flows (the field is optional in the Ory client types), comparing undefined === "aal1" would always be false, causing the effect to recreate the flow on every render. Switch to a boolean comparison: both sides are true only when AAL2 is explicitly requested/present. undefined, "aal1", or any other value all collapse to "not AAL2" and match the default case.
🌟 What is the purpose of this PR?
Re-enables the TOTP MFA settings UI that was shipped disabled in H-2421 (hidden behind
{false && …}with a@todo H-6219 restore thisnote) and fixes the underlying issue that had broken it on staging.The blocker was Kratos emitting
redirect_browser_to: app.hash.ai/self-service/login/browser?aal=aal2after a password-only login from a TOTP-enabled user. Kratos builds that URL fromSERVE_PUBLIC_BASE_URL(set to the frontend origin) but no route on the frontend actually serves Kratos-native paths, so following it verbatim dead-ended on 404. The AAL2 form was never shown.🔗 Related links
🚫 Blocked by
None.
🔍 What does this change?
Frontend
apps/hash-frontend/src/pages/settings/security.page.tsxlookup_secret_disable: true. Withrequired_aal: highest_availablea leftover backup-code credential kept forcing AAL2 at next login with no TOTP to answer it.totp_codefor an already-enrolled identity, so it gated nothing. The disable form now shows a plain "yes, disable 2FA" confirmation and a warning that backup codes will also be removed.XXXXXXXX) in addition to the oldXXXX-XXXXlayout. Without this the UI rendered all 12 codes as one block and the AAL2-login test submitted the whole comma-separated string.apps/hash-frontend/src/pages/shared/ory-kratos.tsflowMetadatarecord mapping each Kratos self-service flow touiPath+kratosBrowserPath.uiPathForKratosBrowserRedirecthelper that rewrites Kratos-native/self-service/*/browserURLs to the matching frontend route.apps/hash-frontend/src/pages/shared/use-kratos-flow-error-handler.tssession_aal2_required,session_refresh_required,browser_location_change_requirednow route through the rewriter before callingrouter.replace.router.push(\/${flowType}`)fallbacks useflowMetadata[flowType].uiPath. Fixes a latent bug whereflowType: "registration"would have pushed to a non-existent/registration` route.apps/hash-frontend/src/pages/signin.page.tsxflow.requested_aalagainst the URL?aalparam. Without this, the AAL1 flow loaded at/signinpersisted through the client-side navigation to/signin?aal=aal2and the AAL2 form never rendered.redirect_browser_tofollows (login-successcontinue_withbranch, toSession 403 fallback) now go throughuiPathForKratosBrowserRedirecttoo.Backend (API proxy)
apps/hash-api/src/index.tsauthRouteRateLimiteron the Kratos/auth/*proxy with tiered limiters: non-GET credentials (30/10s), GETs other than whoami (60/10s), whoami exempted. The redundant whoamis the frontend emits during MFA flows were consuming the old 12/10s IP budget and surfacing as 429s.userIdentifierRateLimiteronly applies whenbody.identifieris actually present (its documented purpose), instead of falling back to IP.ipKeykeyGenerator replaces thereq.ip!assertion used by the previous default generator.Test harness
tests/hash-playwright/tests/shared/totp-utils.ts: rewritesdecodeBase32so the accumulator is truncated after every emitted byte. The old implementation overflowed JavaScript number precision past 53 bits and silently produced a half-zero key, so generated TOTP codes were rejected by Kratos.tests/hash-playwright/tests/shared/get-kratos-verification-code.ts: parses Mailslurper's timezone-lessdateSentstrings as UTC (they were being interpreted as local time, causing the 5-second window filter to drop every legitimate email on CEST machines).tests/hash-playwright/tests/shared/signup-utils.ts: adds a per-process suffix to test emails and shortnames.resetDb()is still a no-op andPOST /users/deleteintentionally preserves the user's web principal (so shortnames survive), which broke re-runs with the static names.tests/hash-playwright/tests/shared/runtime.ts: replaces the blanket status-code regex for the 401 tolerance with a URL-aware credit-counter. Each tolerated 4xx response on a specific Kratos path grants exactly oneFailed to load resourceconsole suppression, so real failures from other endpoints still flag.tests/hash-playwright/tests/mfa.spec.ts: unskips all five TOTP tests; the disable test additionally asserts theregenerate-backup-codes-buttonis gone so a regression in thelookup_secret_disablestep would fail the assertion.Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
SERVE_PUBLIC_BASE_URLmisconfiguration that caused the redirect mismatch is worked around in the frontend rather than corrected at the source. Proper config fix tracked in H-6419.handlePasswordSubmit). Documented and tracked in H-6418.🐾 Next steps
🛡 What tests cover this?
tests/hash-playwright/tests/mfa.spec.ts— five E2E tests covering: enable TOTP, AAL2 login with TOTP, AAL2 login with backup code, disable TOTP, rejection of wrong code. All green locally.❓ How to test this?
/settings/security, click "Enable TOTP"./settings/security, click "Disable TOTP" and confirm — log out, log back in with password; no AAL2 prompt should appear.📹 Demo
To add once manual staging verification is done.