Resource budget for background refreshes: adaptive scheduler, probe limits, and diagnostics

### Problem

This is not a request for shorter polling intervals, and not a duplicate of the existing point fixes around memory leaks, orphaned `ccusage` processes, or provider 429s.

OpenUsage is a menu bar app that is meant to run for days. That makes the background resource budget itself worth tracking: CPU wakeups, hidden WebView work, overlapping probes, external child processes, local HTTP API threads, and slow provider diagnostics.

Related but narrower issues/PRs:

- #400: current open report for severe memory growth while running in the background.
- #399 and #375: closed high-memory reports, apparently hard to reproduce and possibly related to slow/stuck provider probes.
- #431 / #433: fixed orphaned `ccusage` process accumulation with process group kill/timeout handling.
- #377 / #378: fixed Claude 429 handling with `Retry-After` support and cached usage.
- #94: stale but still relevant idea to stop polling exhausted providers until reset time.
- #192 / #382 / #134: requests for more frequent polling were rejected/closed because OpenUsage should not hammer provider APIs.

This issue is about an app-wide resource budget and observability layer so OpenUsage can stay lightweight during long background sessions while still refreshing reliably.

### Current observations

On my machine with OpenUsage 0.6.24, idle CPU is low in a short sample, so this is not a claim that the current release is always leaking:

```text
OpenUsage main process sample:
- CPU: 0.0-0.2% during a short idle `top` sample
- Physical footprint from `sample`: 24.2 MB, peak 28.0 MB
- WebKit child processes are expected for Tauri/WebKit
```

But there are several code paths where background work is intentionally kept alive or currently unbounded:

- `src-tauri/src/app_nap.rs` disables App Nap for the app lifetime.
- `src-tauri/src/webkit_config.rs` sets WebKit `inactiveSchedulingPolicy` to `None`, so hidden WebView timers can continue running.
- `src/hooks/app/use-probe-auto-update.ts` schedules auto refreshes in the frontend WebView.
- `src/hooks/use-now-ticker.ts`, `src/components/panel-footer.tsx`, and `src/components/provider-card.tsx` can generate timer-driven React work for countdown/reset/cooldown display.
- `src/hooks/app/use-panel.ts` uses ResizeObserver/MutationObserver and calls window resize logic when plugin-derived view state changes.
- `src/hooks/app/use-tray-icon.ts` can rasterize tray icons via canvas after probe results.
- `src-tauri/src/lib.rs` starts each selected plugin probe with `spawn_blocking`; there is no obvious app-wide cap preventing overlapping batches for the same provider.
- `src-tauri/src/plugin_engine/runtime.rs` creates a fresh QuickJS runtime per probe, but there does not appear to be a hard per-probe deadline/memory budget for plugin JS itself.
- `src-tauri/src/local_http_api/server.rs` starts a local HTTP server and spawns one OS thread per accepted connection.
- `src-tauri/src/plugin_engine/host_api.rs` still has several external process calls without a shared timeout wrapper (`security`, `sqlite3`, `/bin/ps`, `lsof`, shell env lookup, runner discovery). `ccusage` has stronger timeout/process-group handling now, but runner discovery and other helper commands are less uniformly bounded.

### Proposed direction

Add a measurable background resource budget for long-running sessions. Possible pieces:

1. Move auto-refresh scheduling out of the hidden WebView and into Rust/Tauri, or otherwise allow WebView/App Nap suspension except while a native refresh is actually running.
2. Avoid UI-only timers while the panel is hidden; countdowns can update when the panel becomes visible.
3. Add a per-provider in-flight guard so the same provider cannot have overlapping probes across batches.
4. Add a global probe concurrency cap, e.g. 3-4 providers at a time instead of unbounded parallel `spawn_blocking` work.
5. Add a real plugin probe deadline. A simple timeout around `spawn_blocking` is not enough if the blocking thread cannot be cancelled; this likely needs a QuickJS interrupt/deadline or an isolated runner.
6. Cache `ccusage` runner discovery for the app session or a TTL, and add a timeout to `--version` checks.
7. Introduce a shared `run_command_with_timeout(max_output_bytes)` helper for host API external commands.
8. Make the local HTTP API bounded: connection limit or worker pool, write timeout, and ideally no unbounded thread-per-connection behavior.
9. Debounce/batch writes of `usage-api-cache.json` instead of rewriting the full file after every successful plugin output.
10. Add lightweight diagnostics visible to users/maintainers: enabled plugins, last probe duration per provider, failure/timeout count, current/last `ccusage` runner, child process count, local API enabled/bound status, and recent RSS/CPU snapshot if practical.

### Acceptance criteria

A good first version does not need to implement everything above. It should make resource usage measurable and bounded:

- After 24 hours with default settings, OpenUsage has no orphaned child processes and no zombie children owned by the main process.
- The same provider cannot have multiple concurrent probes from overlapping batches.
- Slow/hung plugin work is bounded by a documented timeout or is visibly reported as still running.
- Idle hidden-panel behavior produces no UI-only one-second React timers.
- Local HTTP API stress does not create unbounded OS threads.
- Diagnostics make it possible to tell which provider or helper command caused a slow/background-heavy session.
- Docs or a maintainer note describe expected idle CPU/RAM/wakeup ranges for a normal configuration.

### Suggested measurement plan

Idle baseline:

```bash
PID=$(pgrep -fn "OpenUsage|openusage")
top -l 5 -s 1 -pid "$PID" -stats pid,cpu,mem,threads,ports,command
ps -M "$PID" | wc -l
lsof -nP -a -p "$PID" -iTCP
```

Child/zombie watch during refreshes:

```bash
PID=$(pgrep -fn "OpenUsage|openusage")
while true; do
  ps -Ao pid,ppid,stat,comm,args | awk -v p="$PID" '$2==p || /ccusage|bunx|npx|npm|pnpm|sqlite3|security/'
  sleep 0.2
 done
```

Local HTTP API stress:

```bash
for i in {1..500}; do curl -s http://127.0.0.1:6736/v1/usage >/dev/null & done; wait
```

macOS profiling:

- Instruments → Time Profiler and Energy Log against `OpenUsage` plus the WebKit WebContent process.
- Compare wakeups/CPU with the panel hidden for 15-20 minutes before and after scheduler/timer changes.
- React Profiler or Web Inspector Performance recording while the panel is open to count commits from `PanelFooter`, `ProviderCard`, `MetricLineRenderer`, ResizeObserver, MutationObserver, and tray icon rasterization.

### Why this matters

OpenUsage is used as a resident menu bar utility. Even small background wakeups, unbounded helper commands, or unbounded local API threads matter more here than they would in a foreground-only app. A resource budget would also make future provider/plugin additions safer because regressions would show up as measurable probe duration, child-process, wakeup, or memory changes rather than vague “high memory usage” reports.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource budget for background refreshes: adaptive scheduler, probe limits, and diagnostics #487

Problem

Current observations

Proposed direction

Acceptance criteria

Suggested measurement plan

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Resource budget for background refreshes: adaptive scheduler, probe limits, and diagnostics #487

Description

Problem

Current observations

Proposed direction

Acceptance criteria

Suggested measurement plan

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions