Skip to content

Resource budget for background refreshes: adaptive scheduler, probe limits, and diagnostics #487

@zergzorg

Description

@zergzorg

Problem

This is not a request for shorter polling intervals, and not a duplicate of the existing point fixes around memory leaks, orphaned ccusage processes, or provider 429s.

OpenUsage is a menu bar app that is meant to run for days. That makes the background resource budget itself worth tracking: CPU wakeups, hidden WebView work, overlapping probes, external child processes, local HTTP API threads, and slow provider diagnostics.

Related but narrower issues/PRs:

This issue is about an app-wide resource budget and observability layer so OpenUsage can stay lightweight during long background sessions while still refreshing reliably.

Current observations

On my machine with OpenUsage 0.6.24, idle CPU is low in a short sample, so this is not a claim that the current release is always leaking:

OpenUsage main process sample:
- CPU: 0.0-0.2% during a short idle `top` sample
- Physical footprint from `sample`: 24.2 MB, peak 28.0 MB
- WebKit child processes are expected for Tauri/WebKit

But there are several code paths where background work is intentionally kept alive or currently unbounded:

  • src-tauri/src/app_nap.rs disables App Nap for the app lifetime.
  • src-tauri/src/webkit_config.rs sets WebKit inactiveSchedulingPolicy to None, so hidden WebView timers can continue running.
  • src/hooks/app/use-probe-auto-update.ts schedules auto refreshes in the frontend WebView.
  • src/hooks/use-now-ticker.ts, src/components/panel-footer.tsx, and src/components/provider-card.tsx can generate timer-driven React work for countdown/reset/cooldown display.
  • src/hooks/app/use-panel.ts uses ResizeObserver/MutationObserver and calls window resize logic when plugin-derived view state changes.
  • src/hooks/app/use-tray-icon.ts can rasterize tray icons via canvas after probe results.
  • src-tauri/src/lib.rs starts each selected plugin probe with spawn_blocking; there is no obvious app-wide cap preventing overlapping batches for the same provider.
  • src-tauri/src/plugin_engine/runtime.rs creates a fresh QuickJS runtime per probe, but there does not appear to be a hard per-probe deadline/memory budget for plugin JS itself.
  • src-tauri/src/local_http_api/server.rs starts a local HTTP server and spawns one OS thread per accepted connection.
  • src-tauri/src/plugin_engine/host_api.rs still has several external process calls without a shared timeout wrapper (security, sqlite3, /bin/ps, lsof, shell env lookup, runner discovery). ccusage has stronger timeout/process-group handling now, but runner discovery and other helper commands are less uniformly bounded.

Proposed direction

Add a measurable background resource budget for long-running sessions. Possible pieces:

  1. Move auto-refresh scheduling out of the hidden WebView and into Rust/Tauri, or otherwise allow WebView/App Nap suspension except while a native refresh is actually running.
  2. Avoid UI-only timers while the panel is hidden; countdowns can update when the panel becomes visible.
  3. Add a per-provider in-flight guard so the same provider cannot have overlapping probes across batches.
  4. Add a global probe concurrency cap, e.g. 3-4 providers at a time instead of unbounded parallel spawn_blocking work.
  5. Add a real plugin probe deadline. A simple timeout around spawn_blocking is not enough if the blocking thread cannot be cancelled; this likely needs a QuickJS interrupt/deadline or an isolated runner.
  6. Cache ccusage runner discovery for the app session or a TTL, and add a timeout to --version checks.
  7. Introduce a shared run_command_with_timeout(max_output_bytes) helper for host API external commands.
  8. Make the local HTTP API bounded: connection limit or worker pool, write timeout, and ideally no unbounded thread-per-connection behavior.
  9. Debounce/batch writes of usage-api-cache.json instead of rewriting the full file after every successful plugin output.
  10. Add lightweight diagnostics visible to users/maintainers: enabled plugins, last probe duration per provider, failure/timeout count, current/last ccusage runner, child process count, local API enabled/bound status, and recent RSS/CPU snapshot if practical.

Acceptance criteria

A good first version does not need to implement everything above. It should make resource usage measurable and bounded:

  • After 24 hours with default settings, OpenUsage has no orphaned child processes and no zombie children owned by the main process.
  • The same provider cannot have multiple concurrent probes from overlapping batches.
  • Slow/hung plugin work is bounded by a documented timeout or is visibly reported as still running.
  • Idle hidden-panel behavior produces no UI-only one-second React timers.
  • Local HTTP API stress does not create unbounded OS threads.
  • Diagnostics make it possible to tell which provider or helper command caused a slow/background-heavy session.
  • Docs or a maintainer note describe expected idle CPU/RAM/wakeup ranges for a normal configuration.

Suggested measurement plan

Idle baseline:

PID=$(pgrep -fn "OpenUsage|openusage")
top -l 5 -s 1 -pid "$PID" -stats pid,cpu,mem,threads,ports,command
ps -M "$PID" | wc -l
lsof -nP -a -p "$PID" -iTCP

Child/zombie watch during refreshes:

PID=$(pgrep -fn "OpenUsage|openusage")
while true; do
  ps -Ao pid,ppid,stat,comm,args | awk -v p="$PID" '$2==p || /ccusage|bunx|npx|npm|pnpm|sqlite3|security/'
  sleep 0.2
 done

Local HTTP API stress:

for i in {1..500}; do curl -s http://127.0.0.1:6736/v1/usage >/dev/null & done; wait

macOS profiling:

  • Instruments → Time Profiler and Energy Log against OpenUsage plus the WebKit WebContent process.
  • Compare wakeups/CPU with the panel hidden for 15-20 minutes before and after scheduler/timer changes.
  • React Profiler or Web Inspector Performance recording while the panel is open to count commits from PanelFooter, ProviderCard, MetricLineRenderer, ResizeObserver, MutationObserver, and tray icon rasterization.

Why this matters

OpenUsage is used as a resident menu bar utility. Even small background wakeups, unbounded helper commands, or unbounded local API threads matter more here than they would in a foreground-only app. A resource budget would also make future provider/plugin additions safer because regressions would show up as measurable probe duration, child-process, wakeup, or memory changes rather than vague “high memory usage” reports.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions