Bring `conductor resume` to flag parity with `conductor run` by jrob5756 · Pull Request #158 · microsoft/conductor

jrob5756 · 2026-05-06T12:52:29Z

Problem

conductor resume was missing several flags that exist on conductor run, making the recovery story confusing and broken in important cases. The most painful gap: a workflow started with --web or --web-bg could not be resumed with a dashboard, so users lost visibility exactly when something had just gone wrong.

What's added to `resume`

Flag	Purpose
`--provider` / `-p`	Runtime provider override
`--metadata` / `-m`	CLI metadata merged on top of YAML metadata
`--web`	Start a real-time web dashboard
`--web-port`	Dashboard port (0 = auto-select)
`--web-bg`	Fork a detached process running resume + dashboard, print URL, exit

--web and --web-bg are mutually exclusive (matching run).

Intentionally not mirrored

Flag	Why
`--input` / `-i`	Inputs are restored from the checkpoint context
`--workspace-instructions`, `--instructions`	`instructions_preamble` is persisted in the checkpoint
`--dry-run`	Incompatible with executing from a saved point

Implementation notes

resume_workflow_async() now wires up the same WorkflowEventEmitter, EventLogSubscriber, ConsoleEventSubscriber, WebDashboard lifecycle, and RunContext as run_workflow_async().
Stop-signal handling refactored into a shared _execute_with_stop_signal helper used by both _run_with_stop_signal and the new _resume_with_stop_signal.
New launch_background_resume() in bg_runner.py forks a detached conductor resume subprocess and writes a PID file so conductor stop can find it.

Dashboard behavior on resume

Documented in the docstring: the dashboard only shows events from the resumed agent forward. Events from agents that completed before the checkpoint were emitted in the original process and are not replayed.

Future-proofing parity

Added a new Run / Resume Parity subsection to AGENTS.md (mirroring the existing Provider Parity style) listing the parity rule, the flags that must stay aligned, and the flags intentionally skipped — so future contributors keep them in sync.

Tests

10 new cases in tests/test_cli/test_resume_command.py:

--provider / -m flags pass through to resume_workflow_async
Malformed --metadata is rejected
--web + --web-port flags pass through
--web + --web-bg mutex error
--web-bg dispatches to launch_background_resume with workflow path or --from checkpoint
Direct unit tests of launch_background_resume command construction (subcommand, --from, port, provider, metadata) and ValueError when neither workflow_path nor checkpoint_path is given

Verification

uv run ruff check src tests — clean
uv run ruff format --check src tests — clean
Full test suite: 2382 passed, 9 skipped (no regressions)

Example

# Start a workflow with the dashboard
conductor run workflow.yaml --web-bg

# It crashes — now resume it WITH the dashboard
conductor resume workflow.yaml --web-bg

Adds the run-only flags that are meaningful during resumed execution to the resume command, fixing the broken UX where a workflow started with `--web` or `--web-bg` could not be resumed with a dashboard. New flags on resume: - --provider / -p runtime provider override - --metadata / -m CLI metadata merged on top of YAML metadata - --web start a real-time web dashboard - --web-port dashboard port (0 = auto-select) - --web-bg fork a detached process running resume + dashboard Intentionally not mirrored: - --input restored from checkpoint context - --workspace-instructions / --instructions instructions_preamble persisted in checkpoint - --dry-run incompatible with executing from a saved point Implementation: - resume_workflow_async() now wires up the same WorkflowEventEmitter, EventLogSubscriber, ConsoleEventSubscriber, WebDashboard lifecycle, and RunContext as run_workflow_async(). - Stop-signal handling refactored into shared _execute_with_stop_signal used by both _run_with_stop_signal and the new _resume_with_stop_signal. - New launch_background_resume() in bg_runner.py forks a detached `conductor resume` subprocess with the dashboard and writes a PID file so `conductor stop` can find it. - AGENTS.md gains a Run / Resume Parity subsection (mirroring the Provider Parity style) so future flag additions stay aligned. Notes the dashboard caveat in the docstring: on resume, only events from the resumed agent forward are shown. Events from agents that completed before the checkpoint were emitted in the original process and are not replayed. Tests: 10 new cases covering provider/metadata pass-through, --web flag handling, --web/--web-bg mutex, --web-bg dispatch to launch_background_resume, malformed metadata rejection, and direct unit tests of launch_background_resume command construction. Verification: full suite (2382 passed / 9 skipped), lint clean, format clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d wiring tests bg_runner.py: - Extract _terminate_child() and _finalize_background_launch() helpers shared by launch_background and launch_background_resume. - On dashboard-startup timeout, terminate the still-running child before raising so it does not orphan holding the port with no PID file. - Wrap write_pid_file in try/except, terminating the child on failure so we never leave a discoverable-only-by-pkill background process. - Replace strippable assert with explicit guard. - Update module docstring to mention both run and resume. - Document that --no-interactive is always appended. run.py: - _execute_with_stop_signal: cancel pending tasks then drain via asyncio.gather(return_exceptions=True). The previous contextlib.suppress(CancelledError) only swallowed CancelledError, so a stored non-CancelledError on the losing task (e.g. dashboard.stop raised) aborted the cleanup loop and leaked the other pending task. tests/test_cli/test_resume_command.py: 15 new cases covering - launch_background_resume failure paths and detachment kwargs - _execute_with_stop_signal direct semantics (no-dashboard, engine-wins, stop-wins, losing-task-with-exception regression) - resume_workflow_async wiring without mocking the function itself: dashboard OSError non-fatal, provider_override mutates config, metadata merges into config, RunContext populated with bg_mode and run_id/log_file, --metadata value containing = survives parse 2397 passed, 9 skipped — no regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- conductor resume flag parity with run (#158) - reasoning effort displayed in dashboard (#160) - iteration_limit_reached/resolved events for dashboard (#162) - registry latest now means default branch HEAD, not newest tag (#157) - forbid extra fields on Agent/Parallel/ForEach/Workflow schemas (#159) - pretty-print tool args/results in dashboard events (#161) - capture uv stdout+stderr on Windows install failure (#156) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jrob5756 and others added 2 commits May 6, 2026 08:51

jrob5756 force-pushed the resume-run-parity branch from d4ed224 to 0181004 Compare May 6, 2026 16:32

jrob5756 merged commit 38c42b4 into main May 6, 2026
7 checks passed

jrob5756 deleted the resume-run-parity branch May 6, 2026 16:35

jrob5756 mentioned this pull request May 6, 2026

chore: release 0.1.13 #163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring `conductor resume` to flag parity with `conductor run`#158

Bring `conductor resume` to flag parity with `conductor run`#158
jrob5756 merged 2 commits intomainfrom
resume-run-parity

jrob5756 commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrob5756 commented May 6, 2026

Problem

What's added to resume

Intentionally not mirrored

Implementation notes

Dashboard behavior on resume

Future-proofing parity

Tests

Verification

Example

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What's added to `resume`