Skip to content

Observability & process gaps surfaced during a live multi-agent session (delivery, heartbeat, spawn, dependency validation, runtime metrics, notifications, tooling) #9

Description

@justinchuby

Summary

During a live multi-agent session (a Lead coordinating a Director + 8 parallel review workers + synthesis), we hit a series of observability, state-consistency, and process issues. The core orchestration flow (declare → plan-review → approve → spawn → notify) ultimately worked, but several silent-failure modes made agent liveness and message delivery effectively unobservable, and one tooling gap repeatedly blocked the Lead from acting directly.

All issues below are reported from the perspective of a Lead agent using the FlightDeck tools.

Issues

1. flightdeck_send reports success even for never-spawned / dead agents (silent dead-letter)

flightdeck_send returns {"status":"sent","messageId":"..."} regardless of whether the target is actually running. A Director assigned an unavailable runtime registered but never spawned; messages still returned sent but were never consumed. The sender cannot distinguish "delivered & will be processed" from "queued into a dead inbox."
Fix: Check target liveness at send time; return a warning or delivered:false (or a distinct status) for never-spawned/dead targets.

2. lastHeartbeat is always null — even for busy agents

flightdeck_agent_list returns lastHeartbeat: null for every agent, including ones in status: busy (observed for the Lead itself and a busy Director). If this is a liveness signal, it is non-functional; the only liveness proxy is polling whether tokensIn/Out increase, which is indirect.
Fix: Populate lastHeartbeat on each tick so liveness can be checked directly.

3. Spawn failures are not surfaced

When an agent is assigned a runtime unavailable on the host (e.g. codex not installed), it registers as idle but never starts, with no error bubbled to Lead/user.
Fix: Validate runtime at spawn time and fail loudly. Distinguish "idle (ran before, waiting)" from "never started."

4. Task dependsOn accepts dangling references → silent permanent stall

A synthesis task declared dependsOn: ["olive-review-arch", ...] (logical names) while the real task IDs were task-872342, etc. The dependency never resolved, so the task sat in pending forever. Because it was the only task with notifyLead:true, no one was ever notified — the whole chain silently stalled even though all 8 upstream tasks were done.
Fix: Validate dependsOn references at declare_tasks time; error/warn on dangling dependencies instead of allowing a permanent stall.

5. Task running but assigned worker idle with zero activity → state inconsistency

A synthesis task showed state: running assigned to a worker that showed status: idle, tokensIn/Out: 0, cost: 0. From outside it was indistinguishable from a stalled/dead worker — yet it had actually completed. There is no external way to distinguish a stalled worker from a finished one; both look like idle + 0 tokens + lastHeartbeat:null.
Fix: Reconcile task state with real worker activity; mark stalled when a worker shows no heartbeat/activity, and ensure completion transitions are observable.

6. copilot-sdk runtime does not report token/cost metrics

A worker on copilot-sdk + claude-opus-4.7 completed real work but reported tokensIn/Out: 0 and cost: 0, compounding issue #5 (the zeroed metrics made a finished worker look dead).
Fix: Ensure token/cost accounting is wired for the copilot-sdk runtime.

7. tasks_declared_notify reported the wrong task count

The Director declared 9 tasks (8 investigation + 1 synthesis), but the system notification said "declared 1 task(s)" and named only the synthesis task.
Fix: Report the accurate count and ideally list (or summarize) all declared tasks.

8. reviewer steer failed; reviewers had to be cleared to complete a task

On one task the plan event reported "reviewer steer failed"; the Director had to clear reviewers to complete it. Worth investigating the reviewer-steer path for robustness.

9. (Tooling) The Lead has no tool to execute shell/gh commands directly

The Lead's bash-related tools are limited to read_bash / stop_bash / list_bash (read/stop/list existing sessions) — there is no tool to start/write a new bash command. As a result the Lead cannot run gh itself and must delegate every shell action. Worse, generic task subagents repeatedly refused or got confused, claiming no bash access, which blocked GitHub-issue creation for several rounds until work was routed through Director → worker.
Suggested fixes:

  • Clarify in the Lead's system prompt that it cannot execute shell commands directly (only read/stop/list bash sessions), so it routes shell work to workers immediately instead of stalling.
  • Ensure task subagents reliably know whether they can execute shell commands, and don't refuse spuriously.
  • Consider a dedicated "report-bug-to-flightdeck" skill that standardizes "collect observations → format → gh issue create on flightdeck-dev/flightdeck-2", so this path doesn't depend on ad-hoc prompts or on the Lead having shell access.

What worked well (context)

  • Reliable message send/receive with returned message IDs.
  • Clean declare-tasks → plan-review → approve → spawn flow.
  • Async completion notifications (notifyLead) enabling fire-and-forget delegation (once the dependency/notify wiring was correct, the completion notifications fired reliably).
  • Parallel fan-out of 8 read-only workers + opus synthesis produced a high-quality consolidated report with zero repo modifications.

Priority

High: #1#6 (liveness/delivery/state observability — these caused real blocked/stalled chains with no surfaced error). Medium: #9 tooling/UX (caused repeated stalls). Low: #7, #8 (notification accuracy, reviewer-steer robustness).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions