fix(auth): harden email verification pipeline so silent failures fail loud by seanhanca · Pull Request #329 · livepeer/naap

seanhanca · 2026-05-21T18:32:27Z

Why

We just shipped operational fixes for a missing-Resend-key incident on operator.livepeer.org: user signups completed in DB but verification emails were never sent in production. Diagnosing it took a while because every failure mode along the email path was silently swallowed into console.error. This PR makes those same failure modes loud, and replaces a Vercel Fluid Compute foot-gun (per-instance cooldown Map) with a Redis-backed implementation.

Summary

/api/health fails closed on broken email config. Returns 503 in production-like environments (VERCEL_ENV=production or DEPLOY_ENV=production) when RESEND_API_KEY is missing or EMAIL_FROM is still on the resend.dev sandbox sender. Catches the exact regression we just lived through, on the next deploy. (apps/web-next/src/app/api/health/route.ts)
Structured error reporter ready for Sentry. New lib/monitoring.ts emits a tagged [ALERT] {...json...} log line — pickable up by Vercel Log Drains today (Datadog/Logflare pattern-match), and additionally forwards to Sentry.captureException when @sentry/nextjs is installed and SENTRY_DSN is set. The Sentry SDK stays optional via dynamic import; no-op when absent. (apps/web-next/src/lib/monitoring.ts)
Email failures now go through the reporter, not console.error. Both verification and password-reset paths in lib/email.ts, plus the silent .catch in register(), route through reportError with tagged area + kind so they're trivially alertable. (apps/web-next/src/lib/email.ts, apps/web-next/src/lib/api/auth.ts)
validateEmailConfig is now pure so /api/health can call it per-request without spamming logs. Boot-time warnings moved to a new logEmailConfigWarnings() hook that runs once on module load and reports the production-critical case via reportError. (apps/web-next/src/lib/email.ts)
Resend cooldown moved off the in-process Map. New lib/auth/resend-cooldown.ts uses @naap/cache (Redis) with hashed-email keys + per-purpose namespacing, so the cooldown holds across serverless instances on Fluid Compute. Falls back to bounded in-memory when Redis is unavailable (local dev parity). (apps/web-next/src/lib/auth/resend-cooldown.ts, apps/web-next/src/lib/api/auth.ts)
Tests:
- 14 new unit tests covering monitoring (structured payload, log-injection guard, Sentry no-op), email config validation (pure, sandbox detection), and cooldown (memory fallback, Redis path, case/whitespace normalization, purpose isolation).
- New Playwright @pre-release smoke (tests/auth-email-smoke.spec.ts) that asserts /api/health reports email configured and register / resend-verification do not 5xx. Picked up by the nightly e2e-ga workflow against production / preview.
Docs. .env.local.example now spells out that RESEND_API_KEY + a verified-domain EMAIL_FROM are required in production, and that the sandbox sender only delivers to the Resend account owner.

Why these four, why now

Background: the missing-email incident root-caused to no RESEND_API_KEY in the Vercel project. The code was wired correctly; the failure modes were just invisible. Each item below removes one specific way that the failure could hide again:

Hardening	What it would have done last time
`/api/health` returns 503	Deploy gate / uptime monitor would have flagged the broken state on the deploy that first lost the key
Structured `[ALERT]` log + Sentry hook	First failed send would have paged on-call instead of being lost in `console.error`
Redis cooldown	Existing 15-min throttle on duplicate signups now actually works on Vercel; today's `Map` reset on every cold start
Playwright smoke	Nightly e2e-ga catches a misconfigured deployment within 24h regardless of monitoring coverage

Test plan

Already run locally:

npx tsc --noEmit — no new errors from these files (pre-existing errors on main for @naap/crypto, @naap/plugin-sdk, etc. are unchanged)
npx vitest run — 662/663 pass; the single failure (integration.test.ts from PR feat(leaderboard): union membership strategy + improved data sources UI #325) is a pre-existing flaky external-network test, not touched here
npx next lint over changed files — clean
New unit tests pass: 14/14
Playwright @pre-release smoke against a preview deploy — will run in CI; locally skipped because it requires a deployed env

Manual production smoke we just ran end-to-end as part of the incident response (independent of this PR):

Resend domain operator.livepeer.org verified for sending
vercel env add RESEND_API_KEY + EMAIL_FROM set on production + preview
New deployment naap-platform-741o2zvcz live on operator.livepeer.org
Resend API smoke send delivered ("last_event": "delivered")

Risk / rollback

No new runtime dependencies. @sentry/nextjs is optional via dynamic import — adding it is a separate PR.
No behavior change on the happy path. When configured correctly, all changed paths return the same values they did before.
Redis is only consulted when tryAcquireCooldown is called (existing-email registration flow). Failure to reach Redis falls back to in-memory — strictly no-worse than today.
Rollback: revert this single commit.

Follow-ups (not in this PR)

Actually install @sentry/nextjs and add instrumentation.ts so the dynamic-import path in lib/monitoring.ts lights up. Tracking separately so this PR stays scoped.
Consider extending /api/health/services similarly so it surfaces Redis + Resend reachability, not just upstream service /healthz.

Made with Cursor

Summary by CodeRabbit

New Features
- Health check now reports email configuration status.
- Lightweight structured error reporting and monitoring for failures.
- Cross-instance rate limiting for email resend requests.
Documentation
- Expanded environment configuration guidance for email (production behavior and sandbox limits).
Tests
- Added unit and integration tests for email validation, monitoring, resend-cooldown, and an auth-email smoke test.

@Pre-release

… loud The verification email pipeline previously swallowed both boot-time misconfiguration (missing RESEND_API_KEY / sandbox EMAIL_FROM) and runtime send failures into console.error. This let a regression go unnoticed for weeks: signups completed in DB but no verification email was ever sent in production. This change keeps the existing behavior on the happy path and makes failure observable + recoverable across instances. Changes - /api/health: returns 503 in production-like environments when Resend is unconfigured or still on the resend.dev sandbox, so platform monitors and deploy gates fail closed. - lib/monitoring.ts: structured error reporter that emits a tagged `[ALERT]` JSON line and forwards to Sentry when `@sentry/nextjs` is installed and `SENTRY_DSN` is set. Dependency stays optional via dynamic import; no-op when absent. - lib/email.ts: send failures and boot-time misconfig now flow through reportError (verification + password-reset paths). `validateEmailConfig` is now pure (safe for /api/health to call repeatedly); a new `logEmailConfigWarnings` hook runs once on module load. - lib/auth/resend-cooldown.ts: replaces the in-process `Map` used to throttle resend with a Redis-backed (@naap/cache) cooldown keyed by hashed email + purpose, so the cooldown holds across Vercel Fluid Compute instances. Falls back to in-memory when Redis is unavailable. - Tests: 14 new unit tests (monitoring, email config, cooldown memory + Redis paths) + Playwright @Pre-release smoke that asserts /api/health reports email configured and register/resend-verification do not 5xx. - Docs: env.local.example clarifies that RESEND_API_KEY and EMAIL_FROM are required in production and that the sandbox sender only delivers to the Resend account owner. Co-authored-by: Cursor <cursoragent@cursor.com>

vercel · 2026-05-21T18:32:33Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
naap-platform	Ready	Preview, Comment	May 21, 2026 6:44pm

github-actions · 2026-05-21T18:32:38Z

⚠️ This PR is very large (783 lines changed). Please split it into smaller, focused PRs if possible.

coderabbitai · 2026-05-21T18:32:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6176daaf-cd84-4794-b3ba-d4cc8472b6c4

📥 Commits

Reviewing files that changed from the base of the PR and between 45191b5 and c2f2b12.

📒 Files selected for processing (4)

apps/web-next/src/lib/__tests__/monitoring.test.ts
apps/web-next/src/lib/auth/__tests__/resend-cooldown.test.ts
apps/web-next/src/lib/auth/resend-cooldown.ts
apps/web-next/tests/auth-email-smoke.spec.ts

📝 Walkthrough

Walkthrough

This PR introduces structured error monitoring, cross-instance email resend throttling, and email configuration health checks. It adds a monitoring module with reportError() for JSON logging and optional Sentry forwarding, a cooldown utility using Redis-backed throttling with in-memory fallback, refactors email configuration to separate validation and logging concerns, integrates both into auth registration, enhances the health endpoint with email status, and validates the integration with smoke tests.

Changes

Email monitoring, cooldown, and health integration

Layer / File(s)	Summary
Monitoring infrastructure `apps/web-next/src/lib/monitoring.ts`, `apps/web-next/src/lib/__tests__/monitoring.test.ts`	New module with `ErrorContext` type, `reportError()` for structured JSON logging with `[ALERT]` prefix and optional Sentry forwarding, input sanitization to prevent log injection, and test helpers for state reset.
Resend cooldown utility `apps/web-next/src/lib/auth/resend-cooldown.ts`, `apps/web-next/src/lib/auth/__tests__/resend-cooldown.test.ts`	New module introducing `tryAcquireCooldown()` for cross-instance email throttling via Redis cache with per-purpose tracking and in-memory fallback when cache unavailable, including TTL enforcement and soft entry cap for memory.
Email configuration and error reporting `apps/web-next/src/lib/email.ts`, `apps/web-next/src/lib/__tests__/email.test.ts`	Refactored email module to split concerns: pure `validateEmailConfig()` checker, new `logEmailConfigWarnings()` for cold-start warnings, and both `sendVerificationEmail()` and `sendPasswordResetEmail()` updated to use `reportError()` instead of console logging for missing config and send failures.
Auth registration with cooldown and monitoring `apps/web-next/src/lib/api/auth.ts`	Updated `register` function to replace in-memory resend throttling with `tryAcquireCooldown()` and swap console logging for `reportError()` with structured tags in resend and post-registration verification-email error paths.
Health endpoint with email config status `apps/web-next/src/app/api/health/route.ts`	Extended health check to include email configuration status (`configured`, `warnings`, `criticalInThisEnv`), production-like environment detection, and HTTP 503 response when email is critical but unconfigured.
Integration tests and documentation `apps/web-next/tests/auth-email-smoke.spec.ts`, `apps/web-next/.env.local.example`	Added Playwright smoke tests verifying `/api/health` email status, endpoint availability, and signup success, plus updated `.env.local.example` with production guidance for `RESEND_API_KEY` configuration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

livepeer/naap#149: Refactors email configuration telemetry and Resend verification flow in apps/web-next/src/lib/email.ts and apps/web-next/src/lib/api/auth.ts, directly overlapping with the cooldown and monitoring wiring in this PR.
livepeer/naap#195: Also adjusts auth resend/verification throttling behavior and may overlap with the tryAcquireCooldown design and integration.
livepeer/naap#181: Modifies email configuration validation and startup logging for RESEND_API_KEY and sandbox sender handling, overlapping with the refactored email startup checks here.

Suggested labels

scope/backend, size/M

Suggested reviewers

eliteprox

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly captures the main intent: hardening the email verification pipeline to make silent failures observable through structured error reporting and health checks.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/auth-email-hardening

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/web-next/src/lib/__tests__/monitoring.test.ts`:
- Around line 11-19: The tests mutate process.env.SENTRY_DSN and don't restore
it, risking cross-test contamination; modify the setup/teardown around
__resetMonitoringForTests() so beforeEach captures the current
process.env.SENTRY_DSN (e.g., save to a local variable) and afterEach restores
it (reassign or delete if originally undefined), and ensure consoleErrorSpy is
still restored in afterEach; update the beforeEach/afterEach surrounding the
existing __resetMonitoringForTests, process.env.SENTRY_DSN mutation, and
consoleErrorSpy usage accordingly.

In `@apps/web-next/src/lib/auth/resend-cooldown.ts`:
- Around line 94-103: tryAcquireCooldown currently does a non-atomic
cache.cacheGet followed by cache.cacheSet which allows race conditions; change
it to perform an atomic acquire (SET NX with TTL) instead of GET+SETEX: use the
cache client's atomic "set if not exists" with expiration (Redis SET ... NX EX)
or reuse the project's distributed lock/SWR lock utility to attempt to create
the key once and only set TTL when acquired (referencing tryAcquireCooldown,
cache.cacheGet, cache.cacheSet, PREFIX, ttlSeconds, ttlMs). If the atomic set
succeeds return true, otherwise return false; preserve existing TTL calculation
and error handling while removing the non-atomic check-then-set path.

In `@apps/web-next/tests/auth-email-smoke.spec.ts`:
- Around line 23-26: The skip guard currently uses baseURL.includes('localhost')
and misses other local addresses; update the test.skip checks (the three
occurrences using test.skip(!!baseURL && baseURL.includes('localhost'), ...)) to
detect local environments by parsing baseURL (new URL(baseURL).hostname) and
skipping if the hostname is 'localhost', '127.0.0.1' or '::1' (or otherwise
matches a local host check), so all three occurrences consistently skip for
those hostnames.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f6f1a1cf-d453-4fe1-9785-20d0ce7e4205

📥 Commits

Reviewing files that changed from the base of the PR and between 3d95a5d and 45191b5.

📒 Files selected for processing (10)

apps/web-next/.env.local.example
apps/web-next/src/app/api/health/route.ts
apps/web-next/src/lib/__tests__/email.test.ts
apps/web-next/src/lib/__tests__/monitoring.test.ts
apps/web-next/src/lib/api/auth.ts
apps/web-next/src/lib/auth/__tests__/resend-cooldown.test.ts
apps/web-next/src/lib/auth/resend-cooldown.ts
apps/web-next/src/lib/email.ts
apps/web-next/src/lib/monitoring.ts
apps/web-next/tests/auth-email-smoke.spec.ts

Three valid findings: 1. resend-cooldown: switch from non-atomic cacheGet+cacheSet to an atomic Redis SET NX PX round-trip via @naap/cache's getRedis(). Removes the tiny TOCTOU window where two concurrent acquirers on different instances could both think the slot was free. 2. monitoring.test.ts: save and restore process.env.SENTRY_DSN in beforeEach/afterEach so the SENTRY_DSN mutation doesn't leak into sibling tests in the same vitest worker. 3. auth-email-smoke: broaden the local-environment skip guard to cover 127.0.0.1, 0.0.0.0, and [::1] in addition to "localhost", parsing baseURL via URL.hostname instead of substring match. Also updates the cooldown unit tests to exercise the new SET NX PX path plus the throw-then-fallback-to-memory path. Co-authored-by: Cursor <cursoragent@cursor.com>

qianghan · 2026-05-21T18:50:43Z

Review cycle status

Code review (CodeRabbit):

All 3 inline issues addressed in commit c2f2b120:
1. monitoring.test.ts: env vars now saved + restored across tests
2. resend-cooldown.ts: atomic Redis SET NX PX round-trip (replaces non-atomic GET+SETEX); falls back to in-memory when Redis absent
3. auth-email-smoke.spec.ts: skip guard now parses URL.hostname and matches localhost / 127.0.0.1 / 0.0.0.0 / [::1]
All 3 review threads auto-resolved by CodeRabbit (isResolved: true).
CodeRabbit is rate-limited from posting a follow-up review for ~51 min so the stale CHANGES_REQUESTED state is still attached — it will clear automatically, or an admin can dismiss it.

Copilot:

Tried gh pr edit --add-reviewer copilot-pull-request-reviewer and the REST/GraphQL equivalents: GitHub returns 422 "not a collaborator" — Copilot can only be triggered from the GitHub UI in this repo. Worth a UI click if you want a second AI pass before merge.

CI: 18 success / 5 skipped / 0 failures (incl. Lint & TypeCheck, Build, Quality Gates, CodeQL, Vercel preview deploy).

Tests added: 15 vitest unit tests + 3 Playwright @pre-release smoke tests.

Local verification (run on c2f2b120):

npx tsc --noEmit — clean for changed files
npx vitest run — 663/664 (1 pre-existing flaky network test unrelated)
New unit suites — 15/15

Merge gate: needs @livepeer/core CODEOWNERS approval.

seanhanca · 2026-05-21T19:43:42Z

@coderabbitai review

coderabbitai · 2026-05-21T19:43:47Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions Bot added the size/XL Extra large PR (500+ lines) label May 21, 2026

github-actions Bot added the scope/shell Shell app changes label May 21, 2026

vercel Bot deployed to Preview May 21, 2026 18:34 View deployment

coderabbitai Bot requested changes May 21, 2026

View reviewed changes

Comment thread apps/web-next/src/lib/__tests__/monitoring.test.ts

Comment thread apps/web-next/src/lib/auth/resend-cooldown.ts

Comment thread apps/web-next/tests/auth-email-smoke.spec.ts

vercel Bot deployed to Preview May 21, 2026 18:44 View deployment

seanhanca enabled auto-merge (squash) May 21, 2026 18:51

coderabbitai Bot approved these changes May 21, 2026

View reviewed changes

seanhanca merged commit 326a04d into main May 21, 2026
24 checks passed

seanhanca deleted the fix/auth-email-hardening branch May 21, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auth): harden email verification pipeline so silent failures fail loud#329

fix(auth): harden email verification pipeline so silent failures fail loud#329
seanhanca merged 2 commits into
mainfrom
fix/auth-email-hardening

seanhanca commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qianghan commented May 21, 2026

Uh oh!

seanhanca commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seanhanca commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

Why these four, why now

Test plan

Risk / rollback

Follow-ups (not in this PR)

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qianghan commented May 21, 2026

Review cycle status

Uh oh!

seanhanca commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seanhanca commented May 21, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading