RFC: daemon auto-reload when CLI binaries change #2422
Replies: 3 comments
-
|
Thanks for the careful write-up — the problem framing and the prior-art links are spot-on, and the recommended Alternative A is the right default. A few pieces of feedback before this turns into a PR, with concrete answers to your open questions at the end. Scope: agent CLIs only for v1, not the multica binaryThe strongest piece of advice I'd give: drop the multica binary from the v1 watch set. Two reasons:
The actual user-facing pain you describe — Codex upgraded out-of-band, daemon keeps advertising the old version — is purely about agent CLIs. Scope v1 to those and you avoid the entire coordination problem with That answers your open questions 1 and 2: agent runtime binaries only, single flag ( Fingerprinting: prefer a content hash over the inode/mtime/size tripleThe proposed triple is portable in theory but has known failure modes:
For a binary that's typically tens of MB, a SHA-256 of the resolved path on a 30–60s tick is microseconds once it's in the page cache, requires no platform-specific code, and is robust against every replacement strategy I've seen. The fingerprint becomes: The symlink-target check is still worth keeping as a cheap pre-filter — if the symlink target hasn't changed, you can skip the hash entirely. But the hash should be the authoritative signal. Required interlocksThese are non-negotiable for the implementation, not optional polish:
"Graceful" needs a definitionYour Alternative A says "wait for tasks to finish, then restart." Today Also: while waiting to drain, the daemon should stop claiming new tasks, as you noted. That's already structurally possible — Open questions — direct answers
Suggested additions to the test planIn addition to your list:
Naming
NetThe direction matches the right mental model. Narrow v1 to agent CLIs, use a content hash with a symlink pre-filter, respect |
Beta Was this translation helpful? Give feedback.
-
|
Quick decision update after thinking this through further. Direction approved, with the following agreed shape for v1:
Rough estimate: single file under @tangyuanjc — your RFC body says you weren't proposing a PR until maintainers confirmed the direction. Direction's confirmed. If you'd like to drive the PR, please go ahead — you've done all the discovery work and your trade-off analysis is already pointing the right way. If you'd rather hand it off, just say so and we'll pick it up. Either way, drop a note here so we don't double up. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @tangyuanjc — circling back since @Bohan-J handed off PR ownership to you on 2026-05-12 and I want to make sure we don't double up. If you've stepped back from driving the v1 implementation, I'm happy to pick it up and follow the locked shape verbatim:
If you'd rather still drive it, totally fine — just drop a quick note here. Otherwise I'll start implementation in ~3 days (around 2026-06-12) and post the branch link back here when the PR is ready for review. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Context
This is a discussion-first RFC; I am not proposing a code PR until maintainers confirm the desired behavior.
Our team hit a daemon sharp edge while dogfooding Multica with local agent CLIs: after upgrading Codex/Hermes/other agent CLIs, the long-lived
multica daemoncan keep advertising stale runtime capability/version information until the user manually restarts it. In one real case, upgrading Codex without restarting the daemon left a task stuck / new model capability invisible until a manual daemon restart.There is nearby maintainer signal in #2410: Bohan clarified that CLI/daemon update behavior is a separate design space from Desktop, where relaunch semantics are unavoidable. This RFC is narrower than "silent updates": it is about detecting that relevant CLI binaries changed outside the daemon and safely reloading the local runtime.
Related prior work:
os.Executable()/ deleted Cellar path issue.LoadConfigonce:multica/server/internal/daemon/config.go
Lines 98 to 179 in fb8ad8c
multica/server/internal/daemon/daemon.go
Lines 598 to 620 in fb8ad8c
PendingUpdate:multica/server/internal/daemon/daemon.go
Lines 1158 to 1164 in fb8ad8c
multica/server/internal/daemon/daemon.go
Lines 1362 to 1427 in fb8ad8c
multica/server/internal/daemon/daemon.go
Lines 1485 to 1520 in fb8ad8c
multica/server/cmd/multica/cmd_daemon.go
Lines 365 to 409 in fb8ad8c
Goals
multica daemon restart.codex,claude,hermes,gemini,openclaw, etc.Non-goals
Proposed model
Add an opt-in daemon flag/env, for example:
multica daemon start --auto-reload-binaries # or MULTICA_DAEMON_AUTO_RELOAD_BINARIES=1 multica daemon startWhen enabled, the daemon records a lightweight fingerprint for:
Config.Agentsafterexec.LookPath/ symlink resolution.A fingerprint can be:
The daemon then checks fingerprints on a low-frequency interval, likely aligned with heartbeat or a dedicated 30-60s ticker. I would prefer polling over
fsnotifyfor v1 because package managers often replace symlink targets atomically, and polling is simpler to make portable.When a change is detected, v1 should schedule a graceful full daemon restart:
reload_pendingin memory / health output.activeTasks == 0, reuse the existing restart handoff immediately.LoadConfig, re-detects CLI availability/versions, and re-registers runtimes.This deliberately favors a full process restart for v1 rather than an in-process runtime registry reload. It is less elegant, but it reuses the existing deregister/register/restart machinery and reduces the risk of partial state bugs.
Alternatives
A. Graceful restart after drain (recommended)
Best default. It fixes stale runtime metadata, avoids task loss, and mostly reuses current daemon restart flow.
Trade-off: if a task runs for hours, users wait for the new runtime metadata until it completes.
B. Immediate restart and retry active tasks
Fastest to freshness. On binary change, cancel active tasks, restart, and rely on task retry/requeue behavior.
Trade-off: risky for long-running coding tasks and surprising as an opt-in default.
C. Notify only, reload on next idle/manual restart
Safest operationally. Daemon detects the change and tells the user / UI that restart is recommended.
Trade-off: it only improves observability; it does not remove the manual restart sharp edge.
Migration / compatibility
daemon start,daemon restart, and server-triggered update behavior unchanged.Tests
Suggested coverage:
codexbinary on PATH: start daemon, replace the fake binary, verify restart is scheduled only after active task count drains.Open questions
--auto-reload-binariesbe one flag, or split into--auto-reload-daemon-binaryand--auto-reload-agent-binaries?reload_pendingin/healthso Desktop/UI can show a precise status?@Bohan-J curious if this direction matches your mental model for the CLI/daemon half of #2410. If you already have a preferred design, happy to adjust or withdraw and follow that direction.
AI disclosure: drafted with help from a Multica Codex agent based on our company dogfooding notes and source-code review.
Beta Was this translation helpful? Give feedback.
All reactions