Skip to content

bug(resume): --web-bg exits silently when max_iterations is reached instead of prompting for additional iterations #198

@jrob5756

Description

@jrob5756

Summary

When a workflow exceeds max_iterations during a resumed run, conductor resume --web-bg exits immediately as a failure with no prompt. In contrast, conductor resume (foreground, no dashboard) displays a Rich prompt asking "How many more iterations would you like to allow?" and lets the user continue. The two paths should provide the same recovery experience.

Reproduction

  1. Run any workflow that will hit max_iterations (e.g. a low limit + a loopy router):
    conductor run workflow.yaml --input topic=…
  2. Let it fail with MaxIterationsError. A checkpoint is written automatically.
  3. Foreground resume (works):
    conductor resume workflow.yaml
    → Rich panel Max Iterations Reached + prompt How many more iterations would you like to allow? (src/conductor/gates/human.py:373). Entering a number continues execution.
  4. Background resume (broken):
    conductor resume workflow.yaml --web-bg
    → Dashboard URL printed, child process starts, hits the limit, then exits silently. The dashboard shows the activity line Reached N/M iterations — awaiting console input but offers no way to provide that input. The PID file is removed and the workflow is gone.

Root cause

cli/bg_runner.py::launch_background_resume forks the child with:

  • --no-interactive (src/conductor/cli/bg_runner.py:303)
  • stdin=subprocess.DEVNULL (src/conductor/cli/bg_runner.py:325)

When the engine hits the limit, WorkflowEngine._check_iteration_with_prompt calls MaxIterationsHandler.handle_limit_reached (src/conductor/engine/workflow.py:2673), which calls _prompt_for_additional_iterations. With stdin pointed at /dev/null, rich.prompt.IntPrompt.ask raises EOFError, which is caught at src/conductor/gates/human.py:380-385 and silently coerced to 0 — i.e. stop. The engine then re-raises MaxIterationsError, the workflow ends, and the dashboard collapses with the child.

The dashboard frontend already knows about the gate — it consumes iteration_limit_reached (src/conductor/web/frontend/src/stores/workflow-store.ts:1622) and writes awaiting console input to the activity log — but there is no UI control or HTTP endpoint to actually submit the user's choice back to the workflow process. So even in foreground --web mode the user is forced to switch back to the terminal; in --web-bg there is no terminal to switch to.

(Related historical context: issue #134 added the iteration_limit_reached / iteration_limit_resolved events so the gate is visible, but resolving it from the dashboard was out of scope. This bug picks up where that left off.)

Expected behavior

When a workflow running under --web-bg (or --web) hits max_iterations, the dashboard should:

  1. Surface the existing iteration_limit_reached event as an interactive modal/dialog with an input for additional iterations (and a Stop button).
  2. POST the user's response to a new endpoint (e.g. POST /api/iteration-limit/resolve with {additional_iterations: int} or {stop: true}).
  3. The web server forwards the response to a MaxIterationsHandler variant that awaits the dashboard reply instead of IntPrompt.ask, with a sensible timeout fallback.

The same affordance should apply to the human gate (--skip-gates=false) and the interrupt prompt, both of which currently have the same TTY-only assumption.

Workarounds

  • Resume without --web-bg so the terminal prompt is reachable.
  • Raise the workflow's max_iterations value in YAML and resume.
  • Pass --skip-gates to make the auto-stop behavior explicit (still ends the run, but at least the intent matches what happens).

Acceptance criteria

  • conductor resume --web-bg (and --web) shows an interactive prompt in the dashboard when iteration_limit_reached fires.
  • User can continue the workflow with N additional iterations or stop, mirroring the terminal prompt.
  • Choice is delivered to the running workflow via a new web API; the existing iteration_limit_resolved event still fires.
  • If no client is connected (no dashboard open), behavior degrades to today's "stop on EOF" path so background runs don't hang forever — ideally with a configurable timeout.
  • Regression tests cover both the API endpoint and the engine path that waits on the dashboard reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:bgBackground process (--web-bg) lifecyclearea:resumeCheckpoint save/resumearea:webWeb dashboard (server + frontend)bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions