RFC: daemon lifecycle events + subscription API #2494
tangyuanjc
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
RFC: daemon lifecycle events + subscription API
Context
This is a discussion-first RFC. I am not proposing a code PR until maintainers confirm whether this belongs as a new native event/subscription layer, as an extension of the outbound webhook RFC, or as a narrower autopilot trigger feature.
We are dogfooding Multica as an internal multi-agent operating system with local daemons, employee agents, autopilots, and closing/reporting workflows. Three sharp edges keep recurring:
doneorin_review, we want a closing agent to create an HTML report and notify the original human creator. Today this is a manual SOP plus a weekly audit script, because autopilot triggers do not exposeissue.status_change.multica issue list.There is strong nearby prior art in the repo:
reload_pending.multica daemon startcauses the local daemon/agent to go offline #1414 and [Bug]: The Windows 11 multica.exe daemon automatically terminates the process #1394 reported Windows daemon offline/lifecycle failures; fix(daemon/windows): break out of parent shell Job Object so daemon survives #1679 fixed one class of daemon survival bug.allowed_principalspredicate for private-agent surfaces, which is directly relevant to event subscription RBAC.This RFC is not meant to replace #1964. The narrower question is: should Multica expose first-class lifecycle events that native surfaces such as autopilots, agents, and external subscribers can consume without each team writing a polling loop?
Goals
Non-goals
Proposed event types
issue.status_changeEmitted when an issue status changes.
Example payload:
{ "event_id": "01H...", "type": "issue.status_change", "workspace_id": "uuid", "actor": {"type": "agent", "id": "uuid"}, "occurred_at": "2026-05-13T00:00:00Z", "issue": { "id": "uuid", "identifier": "WS-95", "previous_status": "in_progress", "status": "done", "assignee_type": "agent", "assignee_id": "uuid", "creator_type": "member", "creator_id": "uuid" } }Primary use case: when
status in ["in_review", "done"], trigger a closing autopilot that creates one child issue assigned to the closing/reporting agent, with a duplicate guard on(parent_issue_id, assignee_id).issue.no_ack_timeoutEmitted by a small scheduler/sweeper when an assigned issue has not been acknowledged within a subscription-defined duration.
The acknowledgement predicate should be explicit. A reasonable v1:
(issue_id, assignee_id, timeout_policy_id)until the issue changes materially.Primary use case: route Tier-1 no-ack tasks to Tier-2 agent assistance after 1 hour, then optionally to CTO/oncall if the escalation also stalls.
runtime.offlineEmitted when a runtime/daemon heartbeat crosses an offline threshold.
Example payload:
{ "event_id": "01H...", "type": "runtime.offline", "workspace_id": "uuid", "runtime": { "id": "uuid", "host": "WIN-FQK6M4FEF13", "last_seen_at": "2026-05-12T10:38:00Z", "offline_threshold_seconds": 900 } }Primary use case: create an incident issue or send a notification before a local daemon stays offline for hours.
Subscription API
One v1 shape:
CLI mirror:
Potential targets:
autopilot:<id>— enqueue an autopilot run with the event payload as context.agent:<id>— create/dispatch a task or issue according to a template.webhook:<id>— reuse RFC: outbound webhooks on issue/run state transitions #1964 delivery semantics for HTTP sinks.inbox— create a workspace inbox notification.For autopilot targets, v1 could avoid a new template language by passing the event payload into the existing autopilot create-issue mode and letting the autopilot description/template define the output. If that is too implicit, make
target_templateexplicit in the subscription row.Delivery and persistence
Recommended default:
event_idat event creation time, not at sink delivery time.event_id,workspace_id,type,object_type,object_id,occurred_at, actor fields, and payload JSON.event_id,subscription_id,attempt,status,next_attempt_at,last_error.autopilot,agent,inbox) should be at-least-once with idempotency keys.This gives replay/debuggability without promising an infinite event stream.
RBAC
Subscription creation/update/delete should be workspace owner/admin only in v1, matching the security posture from #1964.
Delivery visibility should follow the same permission principle as #2359:
Relationship to existing work
#1964 outbound webhooks
#1964 is the right primitive for HTTP delivery to external systems. This RFC should not duplicate its HMAC, persistence, SSRF, retry, or delivery CLI design.
The open design question is whether
issue.status_change,issue.no_ack_timeout, andruntime.offlineshould become new public bus event types that #1964 can deliver, or whether they should be native-only derived events consumed by autopilots first.#2373 inbound webhooks
Inbound webhooks solve external-system-to-Multica. This RFC is Multica-state-to-subscriber and should share naming/security conventions where possible.
#2422 daemon auto-reload
#2422 covers one daemon lifecycle action: version-change detection and graceful reload.
runtime.offlineis adjacent but distinct: it is an operational signal emitted when heartbeat/registration state crosses a threshold.#1414 / #1394 / #1679 Windows daemon lifecycle
These show that runtime lifecycle is not just UI polish. Local daemon survival/offline state affects task reliability and should be observable as a first-class event.
Open questions
Suggested v1 slice
If this direction is acceptable, the smallest useful v1 could be:
issue.status_changeas a derived event emitted from the issue update path.event_subscriptionrows fortarget_type='autopilot'.(event_id, subscription_id).event_subscription_deliveryfor observability.Then add
runtime.offlineandissue.no_ack_timeoutas follow-ups once the event/subscription spine is proven.Tests
Suggested coverage:
in_progress -> doneemits exactly oneissue.status_change.issue.status_change.status=donedoes not fire onin_review.@Bohan-J curious whether this should be framed as a native event-subscription layer, an extension of #1964's event taxonomy, or a narrower "autopilot triggers on issue lifecycle" feature. If you already have a preferred direction, happy to revise the scope before anyone writes code.
AI disclosure: drafted with help from a Multica Codex agent based on our internal dogfooding notes, Multica CLI behavior, and the linked upstream issues/discussions.
Beta Was this translation helpful? Give feedback.
All reactions