Skip to content

feat: allow API keys to log into admin panel with limited scope#160

Merged
mcowger merged 10 commits intomcowger:mainfrom
darkspadez:main
Apr 15, 2026
Merged

feat: allow API keys to log into admin panel with limited scope#160
mcowger merged 10 commits intomcowger:mainfrom
darkspadez:main

Conversation

@darkspadez
Copy link
Copy Markdown
Contributor

The following PR allows for users to login with their api key and view their logs, statistics, and enable/disable trace. Also allows them to rotate their key.

claude and others added 7 commits April 14, 2026 11:14
Lets an api_keys holder log in to the admin panel using their secret and
see a scoped view of their own activity, with self-service for comment
edits, secret rotation, and per-key trace capture. The ADMIN_KEY continues
to grant full access.

- New `authenticate` / `requireAdmin` preHandlers attach a `Principal`
  (admin | limited) to every management request. Limited users are
  resolved from the in-memory config.keys map via secret match.
- Scoped routes (usage, logs, summary, concurrency, errors, debug logs)
  force-inject `apiKey = principal.keyName` with exact-match semantics
  for limited users. Admin behavior is unchanged.
- Admin-only routes (providers, models, keys, quotas, mcp, config,
  system-logs, metrics, performance, restart, oauth, logging, test) are
  gated by `requireAdmin`.
- DebugManager refactored to `enabledGlobal` + `enabledKeys: Set<string>`;
  capture decision is per-key. An `AsyncLocalStorage` request context
  carries the active keyName through the inference pipeline so
  `DebugManager` resolves it automatically — no per-call-site plumbing.
- Cooldowns: per-provider clear now rejects limited users whose
  `allowedProviders` list excludes that provider; clear-all is admin-only.
- Self-service endpoints: `GET /self/me`, `POST /self/rotate`,
  `PATCH /self/comment`, `POST /self/debug/toggle`. Rotating the secret
  preserves all historical data because scoping is keyed on the stable
  key name, not the secret.
- Verify endpoint now returns `{ role, keyName?, allowedProviders?,
  allowedModels?, quotaName?, comment? }` so the frontend can render a
  role-appropriate UI without follow-up calls.
- Audit log lines for cooldown clears, rotations, and trace toggles.

- Migration 0027/0030 adds `api_key` column to `inference_errors` and
  `debug_logs` with requestId-joined backfill, plus indexes on
  `request_usage(api_key, start_time)`, `inference_errors(api_key)`,
  `debug_logs(api_key)`.

- AuthContext exposes `principal`, `isAdmin`, `isLimited`. Login page
  relabeled to accept either credential form.
- `ProtectedRoute` learns `requireAdmin` and redirects limited users off
  admin-only pages. Admin-only routes (providers, models, keys, config,
  mcp, quotas, system-logs) gated.
- Sidebar renders conditional sections: limited users see Dashboard,
  Logs, Traces, Errors, My Key. Identity chip at the bottom shows key
  name and role.
- New `MyKey` page: view key metadata, edit comment, toggle per-key trace
  capture, rotate secret (shows new secret once).
- Cooldown-clear buttons on the dashboard show a blast-radius warning
  modal; "Clear All" hidden from limited users.
- Debug/Errors pages hide bulk-delete and global filter from limited
  users and show a scope indicator in the header.
- Logs page hides the apiKey filter for limited users (server
  force-scopes regardless).

- Admin-auth test updated to assert the expanded verify response shape
  and adds a case for api-key-secret login producing a limited principal.

- The `api_keys.secret` column is still present (encrypted at rest when
  ENCRYPTION_KEY is set). Dropping it requires a code-level backfill to
  guarantee `secretHash` for all rows regardless of encryption config;
  deferred to a separate PR.
- _principal: replace plain secret `===` with constant-time hash compare
  (SHA-256 + timingSafeEqual), and walk the full keys list so the
  rejection path doesn't leak a count-of-keys-before-match timing signal.
- debug: restore the `enabled` field in the limited-user GET /debug
  response as OR of global + per-key, so older frontend callers keep
  working.
- usage-storage: saveError now prefers request-context keyName (seeded
  by v1 auth middleware via AsyncLocalStorage) over the last-resort DB
  lookup, reducing null api_key rows when request_usage insertion lags
  behind the error path.
- error responses: make every 403 from a scoped route return the same
  { error: { message, type, code } } shape instead of a bare string.
- usage: compute scopedKeyName once for the concurrency timeline query
  instead of calling it twice on the same request.
- MyKey rotate: after the server returns the new secret, call login()
  to refresh localStorage and the AuthContext principal so subsequent
  requests use the new secret (otherwise the old secret is evicted on
  the first 401, locking the user out of the modal showing the secret).
- Logs: gate the "Delete All" button behind isAdmin (previously visible
  to limited users, backend rejected the call).
- AuthContext: add a monotonic verify sequence so a slow mount-time
  verify promise can't clobber a newer login()/rotate() principal if
  they resolve out of order.
- minor: drop redundant `isLimited && principal?.role === 'limited'`
  checks in Debug/Errors/Logs in favor of `isLimited && principal?.keyName`.

Intentional non-fixes:

- Migration journal timestamps (0027/0030): hand-set round-number values
  follow existing repo precedent (0021_add_secret_hash uses
  1774000000000; 0023_add_secret_hash + 0024 in migrations_pg similar).
  Values are monotonically after all existing entries so migration
  ordering is correct. Regenerating via `bunx drizzle-kit generate`
  requires up-to-date schema snapshots and a running environment we
  don't have here.
- Migration 0027 correlated subquery backfill: Plexus instances are
  typically small; one-shot UPDATEs run during startup migration are
  fine for expected table sizes. Batching adds significant migration
  complexity for unclear benefit; deferred.
Introduces an "Overall" tab (prepended to the Dashboard for limited
principals only) that rolls identity, allowed providers/models, quota
status, and usage totals onto a single page. Backed by a new
`/v0/management/self/quota` endpoint that mirrors the admin quota-status
shape and handles the no-quota-assigned case gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clear summary, provider, and model data when the time range changes so
switching ranges shows "Loading…" instead of rendering the prior range's
numbers under the new label. Distinguish quota fetch failures from
"no quota assigned" with a dedicated error state, so quota-gated users
aren't told their key is unrestricted when the request actually failed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SQLite: 0027_green_emma_frost (api_key cols on debug_logs, inference_errors)
PG: 0031_last_wallow (same, sequenced after 0030_acoustic_katie_power)
@mcowger
Copy link
Copy Markdown
Owner

mcowger commented Apr 14, 2026

Thanks for this PR — the overall approach is solid and the self-service UX for limited users is well thought out. I've rebased the branch onto main and regenerated the migrations properly (SQLite 0027_green_emma_frost, PG 0031_last_wallow) to resolve the numbering collision with 0030_acoustic_katie_power. No logic changes — just clean drizzle-kit output.

Two things need to be addressed before this can merge:


1. Events SSE endpoint leaks all keys' activity to limited users

File: packages/backend/src/routes/management/usage.ts — the GET /v0/management/events endpoint

The endpoint sits inside the authenticate-scoped plugin, so it correctly rejects unauthenticated requests. However, it attaches global listeners on usageStorage (started, completed, error) and forwards every event to the SSE client without filtering. A limited user who subscribes to this stream sees real-time request activity — model, provider, token counts, latency — for all API keys in the system, not just their own.

This breaks the isolation guarantee that is the whole point of limited-principal scoping. An adversarial key holder could passively monitor the usage patterns, provider availability, and error rates of every other key.

Fix: Call scopedKeyName(request) (already exported from _principal.ts) when the handler sets up its listeners, and skip emitting any event whose apiKey field doesn't match the scoped key. Admin principals (where scopedKeyName returns null) should continue to receive all events unchanged.


2. Cooldown clear is accessible to all limited users regardless of allowedProviders

File: packages/backend/src/routes/management/cooldowns.ts — the DELETE /v0/management/cooldowns/:provider endpoint

The current guard is:

if (allowed.length > 0 && !allowed.includes(provider)) {
  return reply.code(403)...
}

When allowedProviders is an empty array (meaning "no provider restriction"), allowed.length > 0 is false and the check is skipped entirely — so the limited user can clear the cooldown for any provider. But clearing a provider cooldown is a system-level action with real blast radius: it forces the router to retry a provider that may be cooling down for good reason (rate limit, outage, quota exhaustion). This is not a safe self-service operation for key holders.

Fix: Make both cooldown-clear endpoints (DELETE /cooldowns/:provider and DELETE /cooldowns) admin-only by adding requireAdmin to those specific routes, or by moving them into the adminOnly plugin in management.ts. The limited-user scoping logic in the handler body can then be removed entirely.


Everything else looks good. Happy to re-review once these two are addressed.

…aring cooldowns

The management events SSE stream broadcast every key's started/updated/completed
records to any authenticated principal, leaking cross-key request activity to
limited users. Scope the stream by apiKey for limited principals; admins are
unaffected.

Cooldown clear endpoints were gated by an allowedProviders check that fell
through to "allow" when the list was empty, letting any limited user force a
provider retry. Move both clear endpoints behind requireAdmin — limited users
retain read-only access to GET /cooldowns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@darkspadez
Copy link
Copy Markdown
Contributor Author

@mcowger all ready for ya!

@mcowger
Copy link
Copy Markdown
Owner

mcowger commented Apr 15, 2026

@claude review this please, with a valley girl style.

@claude
Copy link
Copy Markdown

claude bot commented Apr 15, 2026

Claude finished @mcowger's task in 2m 29s —— View job


Valley Girl Code Review™ ✨

Like, omigod, I am SO done reading all 36 files. Here's my full take bestie:

  • Read the changed files and gather context
  • Review backend security changes (the SSE endpoint + cooldowns fix)
  • Review new files (_principal.ts, self.ts, MyKey.tsx)
  • Review frontend changes
  • Post the most fetch review you've ever seen

Overall Vibe ✨

Okay so like, this PR is giving MAJOR energy and the concept is totally cute? Limited users being able to see their own stuff is SO self-service-core and I am here for it. But bestie, 7,356 additions and 355 deletions across 36 files is literally giving me anxiety. That is a LOT of glow-up for one PR. Like, we could have totally split this into smaller looks and it would have been SO much more reviewable. Just saying.


The Security Tea ☕ (This is, like, the most important part???)

Issue 1: The SSE Gossip Leakpackages/backend/src/routes/management/usage.ts:385

Oh. My. God. The original GET /v0/management/events endpoint was literally broadcasting EVERYONE's tea to every limited user who connected. Like imagine walking into a room and hearing all of someone else's receipts — model names, token counts, provider errors — for ALL the keys?? That is SO not okay and literally breaks the whole point of scoping. It's giving "accidentally replied-all to the company listserv" energy.

mcowger called this out and darkspadez said it's fixed. I cannot verify the fix is perfect from here since the actual PR diff isn't in this checkout, but the fix pattern mcowger described — filtering by scopedKeyName(request) and passing all events through when the result is null (admin) — is like, the obvious correct answer here, for sure. Make sure the filter is applied to ALL three event types (started, updated, completed, created) and not just, like, one of them. That would be SO embarrassing.

Issue 2: The Cooldown Power Trippackages/backend/src/routes/management/cooldowns.ts

Okay so the base code here is literally 23 lines and it's giving "open bar to everyone" vibes — no restriction whatsoever on the DELETE endpoints. mcowger's review said the PR added an allowedProviders check but with a bug where an empty array meant "allow everything." Like, as IF! The empty-array edge case is such a classic gotcha and I'm glad it got flagged.

The correct fix (making DELETE /cooldowns and DELETE /cooldowns/:provider admin-only) is the right call. Clearing cooldowns is a system-level power move — it's not a self-service snack, it's a whole meal that only admins should be ordering. Verify the fix →


Architecture Glow-Up 💅

The _principal.ts pattern (155 lines, new file) is actually giving really solid separation-of-concerns energy. Like, having one place that answers "who is this user and what are they allowed to see" is SO much cleaner than scattering auth checks everywhere. That's the kind of abstraction that doesn't feel forced — it's literally being used in multiple places right away. Iconic.

The new self.ts endpoint (264 lines) for self-service key management (view logs, rotate key, toggle trace) is a cute feature and makes total sense. One note though: key rotation is a MAJOR power move — make sure it's not just authenticated but also rate-limited or requires the current key as confirmation. Like, you wouldn't want someone who briefly had access to a tab to rotate someone's key. That would be, like, a whole incident.


Frontend Notes 💄

MyKey.tsx at 291 lines is serving the feature but that's getting kinda chunky? If it's doing both stats display AND key management (rotate + trace toggle), those might want to be split into components eventually. Not a blocker, just giving future-you a heads up.

OverallTab.tsx at 518 lines is... a lot, bestie. That component is doing heavy lifting and could probably use some decomposition into smaller chart/card components. But again — not blocking, just a vibe check.

AuthContext.tsx getting modified to support both admin-key and API-key login is the right call, but make sure the login flow clearly signals to the user which mode they're in. Like, "you're logged in as a LIMITED user" should be super obvious in the UI so they don't wonder why they can't see the Config tab. The Sidebar.tsx changes (+75/-40) hopefully handle this — make sure routes that limited users can't access are hidden, not just 403'd.


Minor Notes That Are Still Giving Me Concern 🤔

  • request.query as any appears multiple times in usage.ts — that's fine for now but you're throwing away all the type safety Fastify gives you for free. Like, why have TypeScript if you're just gonna as any it? Consider adding Fastify schema validation with schema: { querystring: { ... } } for at least the main query params.

  • The migrations (0027_green_emma_frost, 0031_last_wallow) look clean since mcowger already rebased them. No notes there, those are totally serving correct.

  • packages/backend/src/utils/auth.ts got 4 lines added. Would love to know what those 4 lines are doing — hope it's not just a comment, lol.


The Verdict 💋

This PR is mostly giving what it's supposed to give. The concept is solid, the scoping design is thoughtful, and the self-service UX is a real improvement. The two security issues mcowger flagged were the real tea here, and they appear to have been addressed.

Before merging I'd just wanna triple-check:

  1. SSE filter covers ALL event types — not just completed
  2. DELETE cooldown endpoints are truly admin-gated — not just "filtered" for limited users
  3. Key rotation has some rate limiting or confirmation — to prevent foot-guns

Like, once those three things are verified? This is SO ready to ship. For sure. ✨

mcowger added 2 commits April 15, 2026 13:09
Dropped the stale 0027_green_emma_frost (SQLite) and 0031_last_wallow (PG)
artifacts and let drizzle-kit regenerate clean outputs:
  SQLite → 0027_clear_switch
  PG     → 0031_colossal_sauron
main had already claimed SQLite 0027 (thin_norrin_radd) and PG 0031
(eminent_alex_power), so the PR's schema additions now land at:
  SQLite → 0028_dapper_pestilence
  PG     → 0032_fast_lethal_legion

Also includes main's 0027/0031 files so the migration chain is complete.
@mcowger mcowger merged commit 6185741 into mcowger:main Apr 15, 2026
mcowger added a commit to darkspadez/plexus that referenced this pull request Apr 15, 2026
Replace PR-branch migration numbers (SQLite 0027/0028, PG 0030/0031) with
the correct next-available indices (SQLite 0029, PG 0033) now that main has
advanced past those slots via PR mcowger#160. Also sync the postgres
user-quota-definitions enum values (monthly, cost) that were missing from
the fork's schema files.
github-actions bot pushed a commit that referenced this pull request Apr 16, 2026
… with limited scope

# Conflicts:
#	packages/backend/drizzle/migrations/meta/_journal.json
#	packages/backend/drizzle/migrations_pg/meta/_journal.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants