fix: dashboard stability, infrastructure improvements, auth hardening#399
fix: dashboard stability, infrastructure improvements, auth hardening#399
Conversation
- Add auth grace period (5s) to prevent redirect loops during Privy token refresh - Replace router.push with plain <a> tags for login/navigation stability - Add returnTo parameter to all login redirects consistently - Add dashboard error.tsx and loading.tsx boundaries for RSC failures - Add TabErrorBoundary for infrastructure dashboard containers tab - Preserve user menu state during transient auth loss - Proxy: detect malformed JWTs early, redirect to /login with returnTo - Proxy: add /dashboard/build and /api/auth/create-anonymous-session to public paths - Replace Next Link with <a> in sandboxes table (fixes /dashboard/milady routes) - Use soft refresh event instead of hard reload after anon migration - Add proxy auth unit tests
…ntainer actions API - Fix allocation drift: use actual sandbox record count instead of stale allocated_count column - Fix health classification: downgrade heartbeat-stale severity when Docker reports healthy+running - Add collapsible incidents panel with per-container actions (logs, restart, stop, inspect) - Add container actions API endpoint (POST /api/v1/admin/infrastructure/containers/actions) - Fix page header infinite re-render: stabilize context value references with primitive comparison - Fix useSetPageHeader to extract primitives so effect deps are stable across re-renders
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip Migrating from UI to YAML configuration.Use the |
PR Review — dashboard stability, infrastructure improvements, auth hardeningOverall this is a solid PR with well-thought-out solutions to the auth refresh flickering and allocation drift problems. A few items worth addressing: Security
The if (since && !/^(\d+[smhd]|[\d]{4}-\d{2}-\d{2}T[\d:.Z+-]+)$/.test(since)) {
return NextResponse.json({ success: false, error: "Invalid since format" }, { status: 400 });
}Internal error messages returned to the client (line 315) return NextResponse.json({ success: false, error: `Action failed: ${message}` }, { status: 500 });SSH error messages can include internal hostnames, IPs, key fingerprints, and connection details. Even for admin-only endpoints, it's worth sanitising: log the full error server-side, return a generic message (or only the docker exit reason) to the client. Correctness
The action only pulls the new image; it never recreates the container. The response note acknowledges this, but the action name will cause operators to think recreation happened. Consider renaming it to
const imageOutput = await ssh.exec(`docker inspect --format '{{.Config.Image}}' ...`);
const image = imageOutput.trim();
// ...
await ssh.exec(`docker pull ${shellQuote(image)} 2>&1`);
if (!image || !/^[a-zA-Z0-9_./:@-]+$/.test(image)) {
return NextResponse.json({ success: false, error: "Could not determine a valid container image" }, { status: 400 });
}Auth Grace Period
if (authenticated) {
hasBeenAuthenticated.current = true;
}Mutating a ref during render is technically safe but can be surprising in Strict Mode (double-render). It works here because setting it
isAnonymous={!shouldAllowProtectedContent}
TestsNo tests for the container actions API
Minor
What's good
|
- Fix biome lint: remove unused imports (Pause, Terminal), fix import ordering, fix formatting - Address review: validate 'since' param format in container logs endpoint - Address review: validate docker image name format before pull - Address review: sanitize error messages returned to client (don't leak SSH internals)
PR Review — #399: Dashboard stability, infrastructure improvements, auth hardeningOverall this is a solid, well-structured PR. The auth hardening, context stabilization, and infrastructure dashboard improvements are all meaningful improvements. A few issues worth addressing before merge: 🔴 Bugs / Correctness Issues1.
The action is named
This is misleading — an operator clicking "pull-recreate" will expect the container to be restarted with the new image. Either rename the action to 2.
function SortableHeader({ field, label }: { field: SortField; label: string }) { ... }This function is defined inside the render body of 3. setTimeout(() => loadInfraSnapshot(), 2000)Hardcoded 2-second delay before refreshing snapshot is fragile — if SSH is slow the data will still be stale. Either accept the stale read and let the user manually refresh, or implement a proper polling mechanism with a loading state. At minimum use 🟡 Security Considerations4.
5. Auth grace period allows protected content for 5 seconds after auth loss
const AUTH_LOSS_GRACE_MS = 5000;This is intentional for Privy token refresh jitter, but 5 seconds of protected content remaining visible to a logged-out user is worth documenting. Consider whether the grace period should apply to read-only views vs. sensitive admin pages, or if 2–3 seconds would suffice. 6. if (!/^(\d+[smhd]|[\d]{4}-\d{2}-\d{2}T[\d:.Z+-]+)$/.test(since)) {
🟡 Code Quality7. Redundant timeout constants const SSH_ACTION_TIMEOUT_MS = 30_000;
const SSH_LOGS_TIMEOUT_MS = 30_000;Both are identical. Consolidate or explain why they differ. 8. interface InfraSummary {
// ... typed fields ...
[key: string]: unknown;
}The index signature defeats the purpose of having typed fields — TypeScript won't catch typos. Remove it and add any additional fields explicitly. 9. A token like ✅ What's Well Done
SummaryItems that should be addressed before merge:
Items that are lower priority but worth tracking: |
PR #399 ReviewOverall this is a solid, well-structured PR addressing real problems (auth flicker, stale allocation counts, navigation instability). A few issues worth addressing before merge. Issues[Medium] Security — Raw docker inspect leaks env vars through unmasked JSON viewFile: The container details dialog renders a "Raw Docker Inspect JSON" collapsible at the bottom using // masking only covers the env var section above, not this:
JSON.stringify(detailsData, null, 2)Recommendation: Strip [Medium] Bug — "+N more" incidents badge is dead code (never renders)File: The "+N more" message is inside the {incidentsExpanded && (
// ...
{hasMore && !incidentsExpanded && ( // ← always false
<p>+{infraSnapshot.incidents.length - COLLAPSED_LIMIT} more</p>
)}
)}Users with >3 incidents and the panel collapsed get no indication there are hidden incidents. Recommendation: Move the "+N more" text outside the [Low] Bug —
|
…eak, naming, types - Extract SortableHeader to module scope (prevents remount on every render) - Fix '+N more' incidents badge dead code (was inside incidentsExpanded guard) - Mask env vars in raw Docker inspect JSON view (was leaking unmasked) - Add DockerInspectData interface, remove all 'as any' casts - Rename pull-recreate → pull-image (action only pulls, doesn't recreate) - Add nodeId validation regex to container actions API - Consolidate SSH_ACTION_TIMEOUT_MS and SSH_LOGS_TIMEOUT_MS into single constant - Remove index signature from InfraSummary interface - Track post-action setTimeout in ref with cleanup on unmount - Tighten 'since' parameter regex validation - Add auth grace period documentation comments (layout.tsx) - Add ref mutation comment for hasBeenAuthenticated - Document isAnonymous/authGraceActive intentional behavior - Add TODO to sidebar hardNavigate escape hatch - Narrow catch in milady agent page (rethrow unexpected errors)
PR Review: fix: dashboard stability, infrastructure improvements, auth hardeningOverall this is well-structured work with clear intent behind each change. A few issues worth addressing before merge: Security
The ISO timestamp regex allows invalid dates like
const isSensitive = /key|secret|password|token|api/i.test(key ?? "");This misses common patterns like Also, Invalid auth token cache — verify it's bounded The test at L1132 confirms caching of invalid auth results to short-circuit repeated bad-token requests. This is a good optimization, but please confirm the cache has a max-size bound. An unbounded in-memory cache keyed by token strings could become a slow memory leak under a flood of unique malformed tokens. Logic / CorrectnessHeartbeat severity downgrade may hide real failures ( const runtimeHealthy = runtime?.state === "running" && runtime?.health === "healthy";
severity: runtimeHealthy ? "warning" : "critical",A container can pass Docker health checks while being functionally dead (deadlock, crashed event loop, hung process). A stale heartbeat means the application stopped reporting — that's independently alarming regardless of what Docker thinks. I'd suggest keeping
The PR description says "pull-recreate via SSH" but the implemented action is note: "Image pulled successfully. Restart the container to use the new image."Either rename the action to make the two-step workflow explicit in the UI, or implement the full Allocation drift may count stale DB records ( const actualAllocatedCount = dbContainers.length;If React / Code QualityRef mutation during render ( if (authenticated) {
hasBeenAuthenticated.current = true;
}The comment acknowledges this is intentional and idempotent, but it's still a React anti-pattern. Moving it into a useEffect(() => {
if (authenticated) hasBeenAuthenticated.current = true;
}, [authenticated]);Auth grace timer may clear prematurely ( The TestsNo test for The new container actions API has tests for auth and routing, but not for the actual SSH command construction (logs, restart, pull-image). At minimum a unit test confirming that Missing coverage: valid → invalid → valid token transitions The proxy auth tests cover static invalid tokens and caching, but don't test the dynamic case where a token is valid, expires mid-session, and then refreshes. This is exactly the scenario the Minor
Good additions overall — the allocation drift fix, collapsible incidents panel, and auth hardening are solid improvements. Addressing the heartbeat severity logic and the |
Summary
Comprehensive dashboard stability and infrastructure monitoring improvements.
Auth & Proxy Hardening
Navigation & Layout Fixes
PageHeaderProvidercontext value with primitive field comparisonuseSetPageHeaderto extract primitives so effect deps are stableInfrastructure Dashboard
allocated_countcolumn fromdocker_nodesPOST /api/v1/admin/infrastructure/containers/actions) — logs, restart, stop, start, inspect, pull-recreate via SSHContainers Table
Tests
Files Changed (20 files, +2383/-655)