A gateway-first agent orchestration service inspired by OpenClaw patterns, adapted for remote clients (mobile/web/desktop).
This repo captures the implementation notes and architecture decisions from our research and discussion:
- How OpenClaw handles agent-to-agent communication
- How spawned CLI agents are executed and resumed
- How provider routing chooses CLI vs embedded runtime
- Where orchestration state should live
- What changes are needed for mobile/remote clients
- One Gateway process owns sessions, runs, auth, and state.
- Local-first runtime: single process + SQLite file DB (no required Docker setup).
- Clients are thin remotes (mobile/web/desktop).
- Agent execution uses an in-process async run queue in v1.
- Runs stream events over WebSocket/SSE; clients also support cursor sync.
- Per-session/provider thread IDs are persisted for resume semantics.
POST /sessionsPOST /sessions/:id/messages(idempotency key required)POST /sessions/:id/spawnPOST /runs/:id/cancelGET /runs/:idGET /sessions/:id/events?cursor=...WS /stream?sessionId=...(or SSE)
GET /healthPOST /sessionsGET /sessionsGET /sessions/:idPOST /sessions/:id/messagesGET /sessions/:id/events?cursor=...&limit=...POST /runsGET /runs/:idPOST /runs/:id/transitionPOST /runs/:id/spawnPOST /sessions/:id/sendGET /runs/:id/wait?timeout_s=...&interval_ms=...GET /runs/:id/diagnoseGET /daemon/statusPOST /runs/:id/reply-parent(child approval/permission escalation to parent session)
When daemon auth tokens are configured, all endpoints except GET /health require
Authorization: Bearer <token>.
docs/summary.md: condensed technical summary of OpenClaw behavior and implementation guidance.docs/v1-decisions.md: concrete v1 product and system defaults, including ownership and ACL model.docs/requirements-and-plan.md: locked requirements and phased implementation plan for the first dogfoodable slice.docs/api-contract.md: endpoint-level request/response/auth/error contract foragentd.docs/runbook.md: operator runbook for startup, preflight, troubleshooting, and stale-daemon recovery.skills/agent-orchestrator-cli/SKILL.md: operator skill for usingagentd/agentctlworkflows.
This repo now includes a runnable daemon + CLI with queued codex execution:
agentd:./bin/agentd(wrapper forpython3 -m orchestrator.server)agentctl:./bin/agentctl(wrapper forpython3 -m orchestrator.cli)
- Start daemon:
./bin/agentd- Optional auth token:
AGENTD_AUTH_TOKEN=dev-token ./bin/agentd
- Check daemon/worker-pool status:
export AGENTCTL_AUTH_TOKEN=dev-token(if daemon auth is enabled)./bin/agentctl daemon status
- Create a session:
./bin/agentctl session create
- Send user message (auto-enqueues run):
./bin/agentctl message send --session-id <SESSION_ID> --text "Respond with exactly: hi"- Retry-safe:
./bin/agentctl message send --session-id <SESSION_ID> --text "Respond with exactly: hi" --idempotency-key <REQUEST_KEY>
- Wait for run completion:
./bin/agentctl run wait --run-id <RUN_ID>
- Inspect events / tool calls:
./bin/agentctl events list --session-id <SESSION_ID>./bin/agentctl events tail --session-id <SESSION_ID> --follow./bin/agentctl run diagnose --run-id <RUN_ID>
- Spawn and coordinate child sessions:
./bin/agentctl tool session-spawn --parent-run-id <PARENT_RUN_ID> --task "..."./bin/agentctl tool session-send --session-id <CHILD_SESSION_ID> --text "..."./bin/agentctl tool session-reply-parent --run-id <CHILD_RUN_ID> --text "Need approval to run: <command>" --request-kind approval_request- Retry-safe escalation:
./bin/agentctl tool session-reply-parent --run-id <CHILD_RUN_ID> --text "Need approval to run: <command>" --request-kind approval_request --idempotency-key <REQUEST_KEY>
./bin/agentctl tool session-wait --run-id <CHILD_RUN_ID>
- Spawn child:
SPAWN=$(./bin/agentctl tool session-spawn --parent-run-id <PARENT_RUN_ID> --task "Ask for approval before risky actions")
- Capture child IDs:
CHILD_SESSION_ID=$(printf '%s' "$SPAWN" | python3 -c 'import json,sys; print(json.load(sys.stdin)["session"]["id"])')CHILD_RUN_ID=$(printf '%s' "$SPAWN" | python3 -c 'import json,sys; print(json.load(sys.stdin)["run"]["id"])')
- Child escalates to parent session without passing parent session ID:
./bin/agentctl tool session-reply-parent --run-id "$CHILD_RUN_ID" --request-kind approval_request --text "Need approval to run: rm -rf ./tmp-cache"
- Parent observes escalation event/message in parent session stream and responds (for example by sending follow-up to child session):
./bin/agentctl events list --session-id <PARENT_SESSION_ID>./bin/agentctl tool session-send --session-id "$CHILD_SESSION_ID" --text "Approved. Proceed with cleanup only in ./tmp-cache."
Default daemon URL is http://127.0.0.1:8765 and default DB path is .data/agent-orchestrator.db.
- Optional daemon-wide codex profile:
./bin/agentd --codex-profile mobile
- Env fallback for profile:
AGENTD_CODEX_PROFILE=mobile ./bin/agentd
- Optional repeatable codex args injected before
exec:./bin/agentd --codex-extra-arg=--verbose --codex-extra-arg=--color --codex-extra-arg never
- Optional worker pool size (concurrent runs):
./bin/agentd --max-concurrency 4
- Optional bearer auth tokens:
./bin/agentd --auth-token dev-token./bin/agentd --auth-token token-a --auth-token token-bAGENTD_AUTH_TOKEN=dev-token ./bin/agentdAGENTD_AUTH_TOKENS=token-a,token-b ./bin/agentd
- If one or more auth tokens are configured, all non-
/healthendpoints require a matching bearer token. agentctlauth token options:./bin/agentctl --auth-token dev-token daemon statusAGENTCTL_AUTH_TOKEN=dev-token ./bin/agentctl daemon status
- If no profile is configured, codex runs with your existing local codex config.
- Default concurrency is
4workers.
- Print compact JSON:
./bin/agentctl --format json session create
- Print compact daemon/pool status JSON:
./bin/agentctl --format json daemon status
- Extract a field directly:
./bin/agentctl --field session.id session create./bin/agentctl --field run.id message send --session-id <SESSION_ID> --text "hi"./bin/agentctl --field alive_workers daemon status./bin/agentctl --field queue_counts.queued_count daemon status
- For retryable mutating requests, provide one
idempotency_keyand reuse it on retries. - Supported API endpoints:
POST /sessions/:session_id/messages(role=user)POST /sessions/:session_id/sendPOST /runs/:parent_run_id/spawnPOST /runs/:child_run_id/reply-parent
- CLI flags:
message send --idempotency-key <REQUEST_KEY>tool session-send --idempotency-key <REQUEST_KEY>tool session-spawn --idempotency-key <REQUEST_KEY>tool session-reply-parent --idempotency-key <REQUEST_KEY>
- Diagnose a run:
./bin/agentctl run diagnose --run-id <RUN_ID>
- Useful fields:
./bin/agentctl --field diagnostics.worker_id run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.worker_alive run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.alive_workers run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.max_concurrency run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.queue_counts.queued_count run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.queue_position run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.likely_issue run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.active_pid run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.codex_invocation.profile run diagnose --run-id <RUN_ID>./bin/agentctl --field diagnostics.codex_invocation.extra_args run diagnose --run-id <RUN_ID>