Skip to content

[daemon] Add app-server daemon lifecycle management#20718

Merged
euroelessar merged 25 commits into
mainfrom
ruslan/app-server-daemon-bootstrap-update
May 8, 2026
Merged

[daemon] Add app-server daemon lifecycle management#20718
euroelessar merged 25 commits into
mainfrom
ruslan/app-server-daemon-bootstrap-update

Conversation

@euroelessar
Copy link
Copy Markdown
Collaborator

@euroelessar euroelessar commented May 2, 2026

Why

Desktop and mobile Codex clients need a machine-readable way to bootstrap and manage codex app-server on remote machines reached over SSH. The same flow is also useful for bringing up app-server with remote_control enabled on a fresh developer machine and keeping that managed install current without requiring a human session.

What changed

  • add the new experimental codex-app-server-daemon crate and wire it into codex app-server daemon lifecycle commands: start, restart, stop, version, and bootstrap
  • add explicit enable-remote-control and disable-remote-control commands that persist the launch setting and restart a running managed daemon so the change takes effect immediately
  • emit JSON success responses for daemon commands so remote callers can consume them directly
  • support a Unix-only pidfile-backed detached backend for lifecycle management
  • assume the standalone install.sh layout for daemon-managed binaries and always launch CODEX_HOME/packages/standalone/current/codex
  • add bootstrap support for the standalone managed install plus a detached hourly updater loop
  • harden lifecycle management around concurrent operations, pidfile ownership, stale state cleanup, updater ownership, managed-binary preflight, Unix-only rejection, forced shutdown after the graceful window, and updater process-group tracking/cleanup
  • document the experimental Unix-only support boundary plus the standalone bootstrap/update flow in codex-rs/app-server-daemon/README.md

Verification

  • cargo test -p codex-app-server-daemon -p codex-cli
  • live pid validation on cb4: bootstrap --remote-control, restart, version, stop

Follow-up

  • Add updater self-refresh so the long-lived pid-update-loop can replace its own executable image after installing a newer managed Codex binary.

@euroelessar euroelessar force-pushed the ruslan/app-server-daemon-bootstrap-update branch from 55c587a to e9d25cb Compare May 2, 2026 01:38
@euroelessar euroelessar force-pushed the ruslan/app-server-daemon-bootstrap-update branch from 3ca4240 to 6ef5841 Compare May 5, 2026 00:25
@euroelessar euroelessar changed the title Add app-server daemon lifecycle management Add Unix-only app-server daemon lifecycle management May 6, 2026
euroelessar added a commit that referenced this pull request May 6, 2026
euroelessar added a commit that referenced this pull request May 6, 2026
euroelessar added a commit that referenced this pull request May 6, 2026
euroelessar added a commit that referenced this pull request May 6, 2026
@euroelessar euroelessar marked this pull request as ready for review May 8, 2026 16:53
@euroelessar euroelessar changed the title Add Unix-only app-server daemon lifecycle management [daemon] Add Unix-only app-server daemon lifecycle management May 8, 2026
@euroelessar euroelessar changed the title [daemon] Add Unix-only app-server daemon lifecycle management [daemon] Add app-server daemon lifecycle management May 8, 2026
euroelessar added a commit that referenced this pull request May 8, 2026
Copy link
Copy Markdown
Collaborator

owenlin0 commented May 8, 2026

do we want to be able to auto-update the logic of the daemon itself? for example, the hourly updater loop? seems important to be able to do, since otherwise we get stuck with whatever gets initially deployed onto a remote machine

codex's take FWIW:

One updateability concern with the daemon bootstrap path: the updater loop itself is currently not updateable once it is running.

bootstrap starts a detached app-server daemon pid-update-loop, and that loop can update the managed Codex binary on disk. It can also restart the managed app-server after a version change. But the updater process keeps running the old executable image, so changes to updater behavior itself will not take effect until something external reruns bootstrap or restarts the updater.

That matters because the updater owns exactly the policy we may most need to fix after distribution: scheduling, jitter, rollout/backoff behavior, installer fetch logic, restart decisions, kill-switch handling, and future telemetry/status reporting. If we ship a bad polling policy to a large fleet, publishing a fixed binary is not sufficient unless the old updater already knows how to replace itself.

I think the right minimal fix is a small Unix-only self-reexec primitive rather than a broad daemon framework. After a successful install, compare the running updater's version/build identity with the managed binary's version/build identity. If the managed binary differs, exec the managed binary as:

codex app-server daemon pid-update-loop

Using exec is attractive here because it replaces the current process image while preserving the PID, so the existing updater pidfile remains valid and we avoid a handoff race between an old updater and a spawned replacement. I would run this check before making app-server restart decisions, so newer updater policy takes over as soon as possible.

The generalization I would make is just the primitive: "reexec the current daemon role from the managed binary if outdated." The only resident daemon role today that needs it is the updater loop; lifecycle CLI commands are short-lived and naturally pick up the new binary on their next invocation, while app-server is already restartable by the updater.

Copy link
Copy Markdown
Collaborator

@owenlin0 owenlin0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(can be punted) should we have a way to enable remote control after the daemon and corresponding app-server process is already running? maybe something explicit and narrowly scoped like a codex app-server daemon enable/disable-remote-control?

Comment thread codex-rs/app-server/README.md Outdated
- Stream events: After `turn/start`, keep reading JSON-RPC notifications on stdout. You’ll see `item/started`, `item/completed`, deltas like `item/agentMessage/delta`, tool progress, etc. These represent streaming model output plus any side effects (commands, tool calls, reasoning notes).
- Finish the turn: When the model is done (or the turn is interrupted via making the `turn/interrupt` call), the server sends `turn/completed` with the final turn state and token usage.

## Daemon Lifecycle Commands
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's remove this from the app-server README for now until we make this the "blessed way" to run app-server.

@@ -0,0 +1,95 @@
# codex-app-server-daemon
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call this experimental?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean rename crate or just make it explicit in the documentation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in the documentation

@euroelessar euroelessar force-pushed the ruslan/app-server-daemon-bootstrap-update branch from 25a5dc8 to 548265e Compare May 8, 2026 22:49
@euroelessar euroelessar merged commit 0c8d425 into main May 8, 2026
29 checks passed
@euroelessar euroelessar deleted the ruslan/app-server-daemon-bootstrap-update branch May 8, 2026 23:51
@github-actions github-actions Bot locked and limited conversation to collaborators May 8, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants