Skip to content

Harden for production: auth, error handling, UI cleanup#2

Merged
michaelzwang13 merged 4 commits into
mainfrom
harden/production-prep
May 22, 2026
Merged

Harden for production: auth, error handling, UI cleanup#2
michaelzwang13 merged 4 commits into
mainfrom
harden/production-prep

Conversation

@michaelzwang13
Copy link
Copy Markdown
Owner

Summary

Production-hardening pass over the hackathon build. No new product features — tightens security, error handling, and the frontend surface.

Frontend — UI consolidation & guards (3902fab)

  • Deleted the design-exploration mockups (Departure Board, Terminal, Schematic) and the design-picker Home page — all carried mock data.
  • Page1 → Directory: talent directory backed by real GET /roles; mock task counts and activity log removed.
  • Page4 → Status: hired-employee health backed by real GET /agents; mock decoration removed.
  • Added ProtectedRoute guards, RootRedirect, and a root ErrorBoundary.
  • Trimmed DockNav to real routes; raw error text replaced with friendly messages.
  • Moved the hardcoded ngrok host out of vite.config.ts into an ALLOWED_HOSTS env var.

Backend — auth & API hardening (0eaea02)

  • Password login: bcrypt-hashed passwords (migration 003), 8–72 char policy. Login returns one generic error for unknown-email vs. bad-password (no account enumeration).
  • /users locked down: GET/DELETE /users/{id} require auth + self-ownership; the unauthenticated user-list endpoint is removed.
  • Signed OAuth state: 10-min-expiry HMAC tokens — the API key no longer travels to Slack/Google/GitHub or their logs.
  • Error handling: global exception handlers log internally but return generic messages to clients; upstream httpx failures map to 502. Structured logging added.
  • Rate limiting (slowapi): signup 10/hour, login 10/minute.
  • CORS tightened — env-driven origins, narrowed methods/headers.
  • gateway.send_email no longer reports success on a 4xx/5xx response.
  • migrations/schema.sql: consolidated idempotent snapshot for fresh databases.

Testing

  • Backend: 97 tests pass (was 78; +19 new for password auth, /users lockdown, and OAuth state).
  • Frontend: tsc build and eslint clean.

New dependencies

bcrypt, slowapi (backend).

Migration required

Run backend/migrations/schema.sql (or 003_add_password.sql on an existing DB) in the Supabase SQL Editor — adds users.password_hash.

Still to come (not in this PR)

  • Frontend Vitest suite.
  • AWS CDK deployment (EC2 + Docker, ECR, Secrets Manager, ALB/ACM/Route53).

🤖 Generated with Claude Code

michaelzwang13 and others added 3 commits May 21, 2026 20:04
- Delete design-exploration mockups (Departure Board, Terminal, Schematic)
  and the design-picker Home page — all carried mock data.
- Page1 → Directory: talent directory backed by GET /roles, mock task
  counts and activity log removed.
- Page4 → Status: hired-employee health backed by GET /agents, mock
  channel decoration removed.
- Add ProtectedRoute guard, RootRedirect, and a root ErrorBoundary.
- Trim DockNav to real routes; / redirects by auth state.
- Move the hardcoded ngrok host out of vite.config into ALLOWED_HOSTS env.
- Replace raw error text in the Signal Feed with friendly messages.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Root .gitignore: ignore .vscode/, add trailing newline.
- app/.gitignore: normalize node_modules/ entry.
- CLAUDE.md: rewritten guidance; backend test count now 97.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Authentication
- Real password login: bcrypt-hashed passwords (migration 003), 8-72
  char policy. Signup and login both require a password.
- Login returns a single generic error for unknown email vs. bad
  password, so the endpoint can't enumerate accounts.
- /users locked down: GET/DELETE /users/{id} now require auth and only
  expose the caller's own record; the unauthenticated user-list
  endpoint is removed.
- Missing API key returns 401 instead of 422.

OAuth
- Signed, 10-minute-expiry state tokens (app/utils/oauth_state.py).
  The user API key no longer travels to Slack/Google/GitHub or their
  logs; state doubles as CSRF protection.

Error handling & hardening
- Global exception handlers: internal errors are logged server-side
  but clients only see a generic message; upstream httpx failures map
  to 502. Structured logging configured.
- Rate limiting via slowapi: signup 10/hour, login 10/minute.
- CORS tightened — env-driven origins, narrowed methods/headers,
  dropped the stale localhost:3000 default.
- gateway.send_email no longer reports success on a 4xx/5xx response.

Schema
- migrations/schema.sql: consolidated idempotent snapshot for fresh
  Supabase databases (001 + 002 + 003).

Frontend
- Login page gains a password field; api.ts signup/login updated.

Tests: 97 passing (was 78). New deps: bcrypt, slowapi.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@michaelzwang13
Copy link
Copy Markdown
Owner Author

@claude do a review

@claude
Copy link
Copy Markdown

claude Bot commented May 22, 2026

Claude finished @michaelzwang13's task in 3m 43s —— View job


Review — harden/production-prep

Solid pass overall. The security posture is meaningfully better than main: bcrypt + uniform-error login, /users self-ownership, signed/expiring OAuth state, generic exception responses, rate limits, env-driven CORS. Below are the bugs/gaps I'd want to address before treating this as truly production-ready, plus smaller nits.


🔴 Bugs worth fixing before merge

1. Password length is validated as characters, not bytes — bcrypt can still truncate.
backend/app/schemas/user.py:6

PasswordField = Field(min_length=8, max_length=72)

Pydantic's max_length counts characters. bcrypt truncates at 72 bytes. A 72-char password with multi-byte UTF-8 (emoji, accents) can exceed 72 bytes — exactly the silent-truncation case the docstring in backend/app/utils/passwords.py:6 claims is prevented.

A user could sign up with "a"*40 + "🔒"*16 (72 chars / 104 bytes); the chars past byte 72 get silently dropped, so multiple distinct passwords hash identically. Suggestion: add a field_validator that rejects when len(password.encode("utf-8")) > 72 (or hash sha256(password).hexdigest() then bcrypt, which sidesteps the 72-byte limit entirely). Fix this →

2. OAuth start URL still leaks the API key into backend/proxy logs.
backend/app/routers/auth.py:42,118,198 accept api_key via Query(...); app/src/lib/api.ts:178 builds …/auth/<service>?api_key=….

The PR correctly stopped sending the api_key to Slack/Google/GitHub (signed state — great), but the key still travels:

  • in the backend's HTTP access logs,
  • through any reverse proxy that logs query strings (ngrok, ALB),
  • into the user's browser history.

Putting credentials in a URL is an anti-pattern even when the destination is trusted. Cheap fix: have the frontend POST /auth/<service>/start with X-Api-Key, get back the redirect URL with a short-lived signed token in the query (same oauth_state trick), then window.location.href = .... Fix this →

3. Rate limiter buckets every client into the proxy's IP behind ngrok/ALB.
backend/app/ratelimit.py:14 uses slowapi.util.get_remote_address, which reads the TCP peer. Once anything sits in front of the app (ngrok today; an ALB tomorrow per the roadmap), every client looks like the same IP, so the limit becomes a global throttle and credential-stuffing from a single IP at the proxy is more effective. Either configure slowapi with an X-Forwarded-For-aware key_func (with a trusted-proxy list) or add a code comment so production deploys swap it before exposing the API. Bonus: login is rate-limited by IP, not by email — a low-and-slow attacker with multiple IPs sidesteps the limit on a single account.

4. ProtectedRoute preserves location.state.from, but Login ignores it.
app/src/components/ProtectedRoute.tsx:11 carefully sets state={{ from: location.pathname }}. app/src/pages/Login.tsx:73 unconditionally navigate('/agents'), so deep-linked users always land on /agents after sign-in instead of where they originally clicked. Read useLocation().state.from and use it (with /agents as the default).


🟡 Smaller observations

5. Exception handler vs. FastAPI's HTTPException default. backend/app/main.py:58 registers a catch-all Exception handler. FastAPI normally lets its HTTPExceptionHandler handle HTTPExceptions first, so 401/404/409 should still carry their detail. The 97 passing tests imply this works — just worth a one-liner test that asserts client.get("/users/unknown") (no auth) returns "Missing API key", not "Internal server error", to lock the behavior in.

6. OAuth callback error paths reflect the upstream error query param straight into a redirect URL (routers/auth.py:73,98,...). Not exploitable today — the value lands in a query string the frontend reads with URLSearchParams — but URL-encoding via urllib.parse.quote(error) is cheap insurance against future "just embed it in the DOM" frontend code.

7. tasks.py:43-50 catches bare Exception and re-raises 502. If anything inside dispatch_task ever raises HTTPException (e.g. for a 404 from the agent), it'll be re-wrapped to a 502 with the generic message. Narrow to (httpx.RequestError, httpx.HTTPStatusError), or re-raise HTTPException before logging.

8. Mock data still ships in the Signal Feed. app/src/pages/Agents.tsx:33-52 keeps MOCK_SLACK/MOCK_GMAIL/MOCK_GITHUB as initial state, shown labelled "DEMO" until OAuth completes. The PR description says the cleanup removed "mock task counts and activity log" — clarify in CLAUDE.md / ROADMAP that demo fallbacks on /agents are intentional, so future contributors don't think they were missed.

9. ErrorBoundary only offers "Reload". app/src/components/ErrorBoundary.tsx:60. If the crash is deterministic and tied to a specific route, reloading reproduces it. Adding a second "GO HOME" button that does window.location.href = '/' gives the user an escape hatch.

10. Default CORS_ALLOWED_ORIGINS silently falls back to localhost. backend/app/main.py:33 — the comment says "production MUST set this," but nothing enforces it. Either log a WARNING at startup when the env var is unset, or refuse to boot when ENV=production and the default is in use.


🟢 Things I liked

  • oauth_state.verify_state uses hmac.compare_digest (timing-safe) and gracefully handles malformed input.
  • The migrations/schema.sql snapshot is idempotent and works alongside the per-step 00x_*.sql files — good migration ergonomics.
  • The RATE_LIMIT_ENABLED=false env-toggle in conftest.py:25 is exactly the right escape hatch for the test suite.
  • verify_password catches malformed-hash rows and returns False instead of crashing — defensive in the right place.
  • Generic login error + uniform "user not found / wrong password" path is correctly implemented (routers/users.py:28-30).

Verdict

Approving direction; would like (1)–(4) addressed before this gets called "production-hardened" rather than "production-leaning." Everything else is fair follow-up.

Tasks

  • Read PR diff and changed files
  • Review backend auth & security hardening
  • Review frontend UI consolidation & guards
  • Review tests & migrations
  • Post review feedback
    · branch: harden/production-prep

1. Password length was validated in characters, not bytes, so a long
   multi-byte password could exceed bcrypt's 72-byte limit and be
   silently truncated into a collision. Passwords are now SHA-256'd and
   base64-encoded before bcrypt — a fixed 44-byte, null-free input —
   so length no longer matters. test_passwords.py covers the case.

2. The OAuth start endpoints took the API key as a URL query param,
   leaking it into backend/proxy access logs and browser history. They
   now authenticate via the X-Api-Key header and return the consent URL
   as JSON; the frontend's startOAuth() fetches it and redirects. The
   key never appears in a URL.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@michaelzwang13 michaelzwang13 merged commit 732cc7a into main May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant