Skip to content

feat: PR 393 review fixes and stabilization#394

Merged
lalalune merged 30 commits intodevfrom
shaw/pr393-review-fixes
Mar 17, 2026
Merged

feat: PR 393 review fixes and stabilization#394
lalalune merged 30 commits intodevfrom
shaw/pr393-review-fixes

Conversation

@lalalune
Copy link
Member

Summary

  • Fix Privy JWT algorithm mismatch with jose 6.x
  • Headscale API compatibility + admin containers tab null safety
  • Sanitize container data to prevent React render errors
  • Stabilize unit test isolation
  • Format fixes for biome

🤖 Generated with Claude Code

0xSolace and others added 30 commits March 15, 2026 13:44
- New useSandboxStatusPoll hook: polls individual agent status every 5s
  during pending/provisioning, auto-stops on terminal states
- New useSandboxListPoll hook: polls agent list every 10s while any
  sandbox is active, fires toast on running transition
- Create dialog stays open during provisioning with 4-step progress
  stepper (created → database → container → running), elapsed timer,
  retry button on error, and Open Web UI button on completion
- Status cell in table now shows pulse animation for active states,
  scale-in animation for running transitions, shake for errors
- Added shake and scaleIn keyframe animations to globals.css
- New admin infrastructure API endpoint (app/api/v1/admin/infrastructure/route.ts)
- Admin infrastructure service with SSH node inspection and container health classification
- Unit tests for container health classification (5 tests, all passing)
- Classifies containers as healthy, failed, missing, warming, or stale based on
  runtime state, control plane records, and heartbeat freshness
- Major rewrite of milady-sandboxes-table: local state management, optimistic
  updates, no more stale data after start/stop/destroy actions
- Added onDataRefresh callback to useSandboxListPoll hook for live table updates
- Create dialog improvements with onCreated callback to trigger table refresh
- Eliminates the stale-data-after-action problem that required manual page reload
- Containers page now shows only custom Docker containers
- Milady page is now a dedicated managed agents view
- New milady-page-wrapper component for consistent page header styling
- Clear separation of concerns between managed agents and raw containers
- Redesigned agent detail page layout and UX
- Cleaner information hierarchy and visual presentation
Collapse multi-line <p> tag to single line per biome rules.
Two issues fixed:

1. Fire-and-forget calls in AgentBudgetService.deductBudget() for
   sendLowBudgetAlert() and triggerAutoRefill() used bare 'void'
   without .catch(), causing unhandled promise rejections that
   crashed the test runner.

2. Email template loadTemplate() used process.cwd()-relative path
   (lib/email/templates/) but templates live under packages/lib/.
   Switched to import.meta.dir-relative resolution for reliability.

These bugs are pre-existing on dev branch.
- Fix unused variable warnings by prefixing with underscore
- Fix import ordering and formatting in route.ts
- admin-infrastructure: add concurrency limiter (max 5 parallel SSH sessions),
  per-node inspection timeout (25s), and 30s in-memory snapshot cache
- admin-infrastructure: differentiate warning vs critical in buildResourceAlert
- admin-infrastructure: make parseDockerHealth more robust (strip prefix, exact match)
- admin-infrastructure: remove unused 'unknown' ContainerLiveHealthStatus variant
- use-sandbox-status-poll: stop polling on persistent 4xx errors or 5 consecutive failures
- use-sandbox-status-poll: fire initial poll immediately in useSandboxListPoll
- use-sandbox-status-poll: fix previousStatusesRef to preserve poll-derived statuses
- milady-sandboxes-table: union merge in mergeApiData to preserve optimistic additions
- create-milady-sandbox-dialog: deduplicate triple data refresh to single onCreated path
- Add MILADY_PRICING constants (/bin/bash.02/hr running, /bin/bash.0025/hr idle,  minimum deposit)
- Add billing columns to milady_sandboxes schema (billing_status, last_billed_at, hourly_rate, total_billed, shutdown/warning timestamps)
- Create SQL migration 0052_add_milady_billing_columns
- Add credit gate (checkMiladyCreditGate) enforcing  minimum balance before agent create/provision/resume
- Create hourly milady-billing cron mirroring container-billing pattern:
  - Bills running agents at /bin/bash.02/hr
  - Bills stopped agents with snapshots at /bin/bash.0025/hr
  - Sends 48h shutdown warnings on insufficient credits
  - Auto-suspends agents after grace period
- Add PostHog event types for milady billing analytics
- Register cron in vercel.json (hourly at minute 0)
Previous CI fix incorrectly prefixed NODE_INSPECTION_TIMEOUT_MS,
MAX_CONCURRENT_SSH_SESSIONS, and SNAPSHOT_CACHE_TTL_MS with underscores
but the code still references the unprefixed names.
import.meta.dir is not available in this TypeScript configuration.
Use fileURLToPath(import.meta.url) + path.dirname() instead.
- Remove duplicate local formatRelative function in milady-sandboxes-table
- Fix PostHog EventProperties type to accept Record<string, unknown>
- Extract sandbox status constants to shared module
- Add ambient declaration for @elizaos/plugin-sql/node (createDatabaseAdapter)
- Add ambient declaration for bs58 module
- Add ambient declaration for monaco-editor with IStandaloneCodeEditor
- Fix webkitAudioContext type assertion in audio.ts
- Add ./packages/types/**/*.d.ts to include paths so ambient declarations
  are picked up by the split type-checker
- Exclude **/*.test.ts and **/*.test.tsx to avoid vitest import errors
  in colocated test files
The v1 milady agent create and provision routes now call
checkMiladyCreditGate before proceeding. Tests that exercise these
routes need to mock the billing gate to return { allowed: true }.
Bun 1.3.5 has a bug where mock.module() calls leak across test files
when running with --max-concurrency=1, causing 64 pre-existing unit test
failures on dev (milady-web-ui, Twitter MCP, affiliates, proxy-pricing).

Bun 1.3.9 properly scopes mock.module to its declaring file.

Also adds explicit cache mock to affiliates-service.test.ts as a
belt-and-suspenders fix for the cache.del TypeError.
Test failures (63 tests):
- Root cause: mock.module() in milady-web-ui, mcp-twitter-tools,
  proxy-pricing, and affiliates-service test files pollutes module
  resolution when run alongside other tests in bun's test runner.
- Fix: exclude these files from test:repo-unit:bulk and run them
  in test:repo-unit:special (isolated process).

Review feedback addressed:
- billing: use SQL atomic decrement for credit_balance instead of
  in-memory computed value (fixes TOCTOU race with concurrent
  credit additions)
- billing: use SQL increment for total_billed (same TOCTOU fix)
- billing: fix timing-safe comparison to use HMAC digests instead
  of length-leaking Buffer comparison
- agent detail: deduplicate STATUS_BADGE_COLORS/STATUS_DOT_COLORS
  by importing from shared sandbox-status constants module
- dialog: only fire onCreated callback when an agent was actually
  created (skip premature dismissals)
The admin-service-pricing-route test mocked @/lib/services/proxy/pricing
with only invalidateServicePricingCache, omitting PricingNotFoundError.
When proxy-pricing.test.ts was moved to the isolated test group,
proxy-engine.test.ts could no longer import PricingNotFoundError from
the mocked pricing module (mock.module leaks across files in Bun).

Add PricingNotFoundError and getServiceMethodCost to the mock so the
pricing module is complete for any downstream consumer in the bulk run.
- Wrap UserMenu in React error boundary to prevent crashes from
  propagating to the entire page
- Extract all user data getter functions into standalone safe helpers
  with try-catch guards (safeGetUserWallet, safeGetUserEmail,
  safeGetUserName, safeGetUserIdentifier, safeGetInitials)
- Add safe credit balance formatter to handle NaN/undefined
- Pre-compute display values outside JSX for cleaner rendering
- Add explicit Array.isArray checks on linkedAccounts
- Wrap sign-out handler in try-catch
- Add fallback UI when error boundary catches: shows a clickable
  user icon that resets the boundary on click
- Defensive null coalescing on API response fields
Pin jose to ^4.15.0 (compatible with @privy-io/server-auth's ^4.10.4 requirement).
jose 6.x stricter algorithm validation caused ERR_JOSE_ALG_NOT_ALLOWED on every
authenticated proxy request.

Changes:
- package.json: jose ^6.1.3 → ^4.15.0
- jwks.ts: type CryptoKey → type KeyLike (v4 compat)
- session-service.ts: fix setExpirationTime(Date) → unix timestamp,
  remove generic type param from jwtVerify (not supported in v4)
- Fix containers tab crash: sandboxId.slice() on null (sandbox_id is nullable in schema)
- Add defensive null coalescing for loadNodes/loadContainers data access
- Fix headscale route to support both v0.23+ (/api/v1/node) and legacy (/api/v1/machine) endpoints
- Add serverUrl to headscale API response, fix HeadscaleData interface
- Fix .env.local HEADSCALE_API_KEY to use correct hskey-api token
@vercel
Copy link

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eliza-cloud-v2 Ready Ready Preview, Comment Mar 17, 2026 0:16am

@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33d1cd72-a3a1-4192-a3bb-ec0adfb884a4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shaw/pr393-review-fixes
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lalalune lalalune merged commit f8c7491 into dev Mar 17, 2026
24 of 26 checks passed
@claude
Copy link

claude bot commented Mar 17, 2026

PR #394 Review — feat: PR 393 review fixes and stabilization

Overall assessment: This is a solid, well-structured PR. The billing infrastructure is production-ready with good atomicity guarantees, and the UI fixes address real crash paths. A few specific concerns are worth addressing before a future billing-critical deploy.


Milady Billing Cron (app/api/cron/milady-billing/route.ts)

Strengths:

  • Correct use of timingSafeEqual for HMAC comparison — avoids timing side-channels.
  • The two-step transaction (claim sandbox row via conditional UPDATE, then atomic credit deduction) is the right pattern for preventing duplicate billing across overlapping cron runs.
  • REBILL_GUARD_MINUTES = 55 is a sensible guard for an hourly job.
  • Error classes (AlreadyBilledRecentlyError, InsufficientCreditsDuringBillingError) make control flow clean and explicit.

Issues / Concerns:

  1. Sequential processing at scale. Sandboxes are processed with a simple for...of loop. With a growing number of agents this will eventually breach the 5-minute maxDuration. The existing pLimit utility introduced in admin-infrastructure.ts in this same PR should be applied here too — a concurrency of ~20 parallel billing tasks would keep wall-clock time predictable.

  2. In-memory balance update is stale between multi-sandbox orgs.

    org.credit_balance = String(result.newBalance);

    This updates the snapshot used for the next sandbox in the same org, but newBalance comes from the transaction's RETURNING clause — which is correct. However if two sandboxes in the same org both clear the rebill guard simultaneously, both could read the same snapshot before either transaction commits. The atomic SQL guard (gte(organizations.credit_balance, hourlyCost)) already prevents a negative balance, but the shutdown-warning path can fire for both sandboxes even if only one would actually exhaust the balance. This is a UX-only issue (spurious warning email), not a financial one, but worth noting.

  3. billing_status field on stopped-with-backups sandboxes. The query for idle sandboxes filters on inArray(miladySandboxes.billing_status, ["active", "warning", "shutdown_pending"]) but a stopped agent that gets suspended has billing_status = "suspended". The suspend transition happens by setting status = "stopped". If a suspended agent still has last_backup_at set, it would not be re-billed (correct), but if the billing cron runs before the status update is visible on the read replica, a brief window exists where it could bill a just-suspended agent. The existing isNotNull(miladySandboxes.last_backup_at) guard helps, but explicit exclusion of billing_status = "suspended" in the query would be belt-and-suspenders.

  4. total_billed precision mismatch. Schema: numeric(10,2) (2 decimal places). The cron accumulates IDLE_HOURLY_RATE = 0.0025, which has 4 decimal places. The SQL + will silently truncate the fractional cents in total_billed. This doesn't affect billing correctness (billing uses credit_balance, not total_billed), but total_billed will under-report idle charges. Either change the column to numeric(12,4) or document that total_billed is rounded to cents.

  5. Missing exempt status handling. MiladyBillingStatus includes "exempt" but the cron's billing queries use inArray(..., ["active", "warning", "shutdown_pending"]). Exempt sandboxes are correctly excluded from billing — this is fine — but there's no comment explaining the intent. A brief inline comment would prevent future engineers from treating this as a bug.

  6. getOrgUserEmail returns the first user regardless of role. If an org has multiple members, this may email the wrong person. The existing billing_email field on organization_billing is already the preferred path; getOrgUserEmail is a fallback that probably deserves a TODO or a filter on owner/admin role.


Database Migration (0052_add_milady_billing_columns.sql)

Follows the rules in CLAUDE.md correctly:

  • Uses ADD COLUMN IF NOT EXISTS.
  • CREATE INDEX IF NOT EXISTS (not CONCURRENTLY — correct for a transaction-wrapped migration).
  • Targeted migration, under 100 lines.

No issues.


Auth Fix (packages/lib/auth/jwks.ts)

// Before
cachedPublicKey = await importSPKI(pem, ALGORITHM);
// After
cachedPublicKey = await importSPKI(pem, ALGORITHM, { extractable: true });

And the CryptoKeyKeyLike type alias change to match jose 6.x API. Both changes are correct. The { extractable: true } option is required for jose 6.x to use the key in jwtVerify. Good fix.


JWT Session Fix (packages/lib/services/eliza-app/session-service.ts)

// Before
.setExpirationTime(expiresAt)   // Date object — broken in jose 6.x
// After
.setExpirationTime(Math.floor(expiresAt.getTime() / 1000))  // Unix timestamp — correct

Correct. jose 6.x setExpirationTime accepts a Unix epoch number or a relative string like "2h", not a Date object.

The jwtVerify generic removal and manual cast to ElizaAppSessionPayload is a pragmatic workaround for the jose 6.x type narrowing changes — acceptable given the existing check-types situation.


Milady Credit Gate (packages/lib/services/milady-billing-gate.ts)

Strengths:

  • Fails closed on DB errors (denies provisioning if balance can't be verified) — correct.
  • Consistent 402 response on all three provisioning paths (create, provision, resume).

Minor concern: The gate reads organizationsRepository.findById() which uses the read replica (dbRead). If there's replication lag after a deposit (Stripe webhook → credit update), a user who just topped up might be blocked for a few seconds. This is an edge case but worth a comment in the code.


Headscale API Compatibility (app/api/v1/admin/headscale/route.ts)

The fallback from /api/v1/node (v0.23+) to /api/v1/machine (legacy) is correct. The error handling wrap for AuthenticationError / ForbiddenError is a needed fix from PR #393.

Minor: The nodes response shape in the type annotation has machineKey? and nodeKey? as optional, while the legacy machines shape has machineKey as required. This reflects real API differences but means downstream code that normalizes both shapes should handle the optionality explicitly. Current downstream code passes both through machines variable without further normalization, which is fine since the UI only uses name, ipAddresses, online, and user.


Admin Infrastructure Service (packages/lib/services/admin-infrastructure.ts)

Strengths:

  • Custom pLimit implementation is clean and correct for this use case.
  • withTimeout wrapper prevents SSH hang from blocking the entire snapshot.
  • SNAPSHOT_CACHE_TTL_MS = 30_000 avoids hammering nodes on rapid admin page refreshes.

Concern: SNAPSHOT_CACHE_TTL_MS is a module-level in-memory cache. In Vercel Serverless, each function invocation gets a fresh process (cold start) or a recycled one (warm). The cache will be useful for warm instances but invisible across concurrent invocations or cold starts. This is acceptable behavior but means the cache provides less benefit than it appears — a Redis-backed cache would be more reliable if SSH load on nodes becomes a concern.


UI: create-milady-sandbox-dialog.tsx

The refactor to an inline provisioning progress view (replacing router.refresh() with a polling hook + step indicator) is a significant UX improvement. The implementation is clean.

One concern: onCreated?.()?.catch(() => {}) silently swallows any error from the parent refresh callback. If onCreated throws (e.g., a failed fetch), the user won't know. The comment "Best-effort refresh" is appropriate, but consider at least logging the error.


UI: user-menu.tsx

The error boundary + safe* helper refactor is the right approach for a component that accesses deeply nested Privy user objects. The safeGetInitials function has one subtle edge case:

const parts = name.trim().split(/\s+/).filter(Boolean);
if (parts.length > 0 && parts[0].length > 0) {
  return parts.map((n) => n[0]).join("").toUpperCase().slice(0, 2);
}

If name is a single character (e.g., a wallet prefix like "0"), this returns "0" — which is fine. No issue.


Test Coverage (packages/tests/unit/milady-billing-route.test.ts)

Four focused tests covering the main billing paths:

  • Idempotency guard (already billed → skip)
  • Insufficient credits → warning sent
  • Successful billing → rate persisted
  • Post-billing warning threshold

Good coverage of the happy path and the two main failure modes. Missing test cases worth adding in follow-up:

  • Shutdown execution (when billing_status = "shutdown_pending" and scheduled_shutdown_at <= now)
  • Idle sandbox billing (stopped + last_backup_at not null)
  • No billable sandboxes (empty state)
  • Unauthorized request (missing/wrong CRON_SECRET)

Minor/Nit

  • formatRelative and formatRelativeShort in sandbox-status.ts are nearly identical (differ only in the "Never" vs "—" fallback and whether days use toLocaleDateString). These could be unified with a { short: boolean } option to avoid divergence.
  • docs/BILLING_AUDIT.md and docs/ONBOARDING_INTEGRATION_PLAN.md are large doc additions (+~1400 lines combined). These are useful but aren't gated by any CI — consider adding a lint rule or at least a CODEOWNERS entry for docs/ so billing-critical documentation gets reviewed carefully.
  • bun.lock is updated (Bun 1.3.9 bump) — this is expected and fine.

Summary of blocking concerns:

  1. Sequential sandbox processing will hit the 5-minute timeout at scale — add concurrency.
  2. total_billed column precision doesn't match IDLE_HOURLY_RATE — fix schema or document.

Everything else is non-blocking. The billing atomicity, migration hygiene, auth fixes, and test coverage are all solid.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants