Skip to content

feat(progress): surface long-running Bash/ScheduleWakeup waits in chat — silent 5–10 min holds look hung #481

@nathanschram

Description

Context

When a Claude session executes a Bash tool call that internally polls or
sleeps for several minutes (e.g. until gh run list … grep -q completed; do sleep 15; done waiting for a CloudFlare Pages deploy, or a sleep 600
waiting for a job, or any npm run build on a slow repo), the Untether
chat shows a single static ▸ Bash line for the entire duration.
From the user's phone there is no visual difference between "still
polling" and "frozen". The same pattern applies to ScheduleWakeup-driven
/loop-style waits and to long-running MCP tool calls.

Concrete reproduction (live, 2026-05-05 staging)

Session 9905b5fe-7e21-4042-8715-941c9206fa52 on @hetz_lba1_bot,
project lba-web, chat -5193338937. The user reported "session frozen
~40 min". Investigation found two real waits:

  1. 10 m 04 s Bash polling loop:

    until gh run list --branch main --limit 2 --json status,workflowName \
      --jq '.[] | select(.workflowName == "Deploy Production") | .status' \
      2>/dev/null | grep -q "completed"; do
      sleep 15
      echo "  $(date +%H:%M:%S) Deploy Production: …"
    done
    echo "Deploy Production done."
    

    The loop emitted a useful "still alive" line every 15 s on subprocess
    stdout. Untether saw zero of those — the tool result is buffered until
    the Bash call returns, so the ▸ Bash progress line was static for
    the full 10 minutes.

  2. 14 m 00 s waiting on a control_request for an inline approval
    keyboard the user didn't notice (separate UX gap, see follow-up
    comment in this issue thread). Outside scope of this issue.

Plus three CI-poll waits in the same run (3 m + 1.5 m + 2.5 m).

Net effect: the run's Telegram surface looked dead for ~17 minutes of
external-system polling that was actually proceeding normally.

Related issues

Proposed work — three coordinated changes

1. Long-tool elapsed-time progress timer

When a Bash/ScheduleWakeup/long-MCP tool call exceeds a threshold
(default 60 s), the progress message should update its label to show the
elapsed time and (where available) the tool input snippet:

▸ Bash · 7m 12s · still running
  $ until gh run list --branch main … grep -q "completed"; do sleep 15; …

Implementation:

  • New per-action timer in MarkdownFormatter.format_action_line() keyed
    on action_id.
  • Driven by an anyio task spawned from the Bash/Schedule action handler
    in runner_bridge.py that re-renders the action line every
    [progress] long_tool_render_interval seconds (default 30 s).
  • Cap at one timer per active action; cancel on tool_result.
  • Respect [progress] verbosity and /verbose toggle — the elapsed
    string is always shown in verbose=on, hidden in verbose=off unless
    elapsed exceeds a higher threshold (long_tool_visible_threshold,
    default 120 s).

Critical files:

  • src/untether/markdown.pyformat_action_line, action timer hook
  • src/untether/runner_bridge.py — spawn/cancel timer in
    the ActionEvent action_started/action_completed branches
  • src/untether/settings.py[progress] long_tool_render_interval,
    [progress] long_tool_visible_threshold
  • tests/test_meta_line.py — fixture that drives a 5-minute synthetic
    Bash tool, asserts ≥ 8 elapsed-string updates and that the timer is
    cancelled on the tool_result event.

2. Surface Bash subprocess stdout as a "last line" on the progress action

Claude Code already streams Bash stdout in its tool_use_id partial
events when run in --verbose --output-format stream-json (which
Untether uses by default). The line above —
$(date +%H:%M:%S) Deploy Production: … — would have been a strong
"still alive" signal. Untether currently discards these partials.

Implementation:

  • Extend claude_schema (src/untether/schemas/claude.py) and
    translate_claude_event (src/untether/runners/claude.py) to
    preserve the most recent stdout line per active Bash tool call.

  • Surface it on the action line as a sub-line:

    ▸ Bash · 7m 12s · still running
      $ until gh run list … (10 lines suppressed)
      └─ 11:34:12 Deploy Production: in_progress
    
  • Throttle: at most one update per 5 s per action (configurable via
    [progress] bash_stdout_render_interval, default 5).

  • Bound: trim the rendered line to ≤ 120 chars.

  • Strip ANSI colour codes via the existing strip_ansi helper used by
    _sanitise_stderr siblings.

Critical files:

  • src/untether/schemas/claude.py — preserve bash_output_chunk partials
  • src/untether/runners/claude.py:translate_claude_event — feed
    state.bash_action_stdout[action_id] = line
  • src/untether/markdown.py:format_action_line — render the sub-line
  • tests/test_claude_schema.py and tests/test_meta_line.py
    fixture & rendering coverage

3. Coordination with #470 — suppress stall warnings during long-tool waits

Today the stall watchdog
(runners/claude.py:_subprocess_liveness_stall) fires every ~3 min
based on idle_seconds >= stall_threshold AND
tree_active/cpu_active. Once changes 1+2 land, a long Bash poll
or ScheduleWakeup wait is no longer a "stall"
— the run is
actively progressing.

Both this issue AND #470 should suppress Telegram-side stall warnings
when the subprocess is in one of these expected-idle states:

State Suppression rationale
last_event_type == "result" (post-turn idle) #470's original case — turn is done, stdin is just held open
Active Bash tool with last stdout line within stall_threshold/2 the Bash command is producing output, run is alive
Active ScheduleWakeup with timer fire-time still in the future the agent explicitly asked to wait — not a stall
Active long-MCP-tool call (already partially handled by mcp_tool_timeout per #154) already covered, mention here for completeness

The structlog WARNING event should still fire on each stall threshold
crossing for observability (the untether-issue-watcher consumes
those). The change is purely the Telegram surfacing.

New events:

  • progress_edits.stall_long_bash_suppressed — when active Bash tool
    has fresh stdout
  • progress_edits.stall_schedule_wakeup_suppressed — when active
    ScheduleWakeup timer is still in the future

This overlaps directly with #470's progress_edits.stall_post_result_suppressed
proposal — the three suppression rules should land in the same
progress_edits.stall_detected decision branch in runner_bridge.py
so the logic stays in one place.

Action item for #470: I'll comment on that issue to widen its scope
to include long-tool-wait suppression (this issue) so the changes ship
together rather than as overlapping patches.

Acceptance criteria

  • A Bash tool call that runs for ≥ 60 s renders an elapsed-time
    string on its action line, refreshing at least every 30 s.
  • If the Bash call produces stdout, the most recent ≤120-char line
    is surfaced as a sub-action under the action line, refreshing
    at least every 5 s while changing.
  • A ScheduleWakeup action shows time-until-fire in the action
    line, refreshing at least every 30 s (synergy with feat: full /loop support — agent self-pacing via ScheduleWakeup interception #289 if that
    ships first).
  • During an active long-tool wait (Bash with fresh stdout, or
    ScheduleWakeup with future fire time, or post-result stdin idle
    per stall-message backoff once last_event_type=result (post-result idle is benign noise to the user) #470), Telegram stall warnings are suppressed.
  • structlog subprocess.liveness_stall warnings still fire on
    each threshold crossing (observability preserved).
  • New progress_edits.stall_*_suppressed info logs for each
    suppression branch.
  • No regression on existing fast tools — sub-second tool calls
    render identically (no flicker, no elapsed string).

Test plan

Unit — extend tests/test_exec_bridge.py and tests/test_meta_line.py:

  1. Synthetic 5-minute Bash with periodic stdout — assert elapsed
    string updates ≥ 10 times and stdout sub-line refreshes.
  2. Synthetic ScheduleWakeup with delaySeconds=180 — assert
    countdown renders.
  3. Synthetic long Bash + stall threshold cross — assert structlog
    WARNING fires but Telegram surfacing is suppressed.
  4. Synthetic post-result + stall threshold cross — assert stall-message backoff once last_event_type=result (post-result idle is benign noise to the user) #470's
    suppression still works (regression).
  5. Sub-1-s Bash — assert no elapsed string rendered.

Integration — against @untether_dev_bot, claude-test chat:

  • Send run a bash command: for i in 1 2 3 4 5 6 7 8 9 10; do echo "tick $i $(date +%H:%M:%S)"; sleep 12; done → observe action line
    count up over 2 min with rolling stdout sub-line; no stall warning.
  • Send /loop … (whatever the syntax is by the time this ships) and
    let it self-pace → observe countdown per iteration.

Out of scope

  • The 14-min approval-keyboard-wait surfaced in the same lba-web
    session — separate UX issue: chat-side reminder when a
    control_request is pending past the stall threshold. Will file
    separately if there's appetite.
  • The CloudFlare Pages production-deploy time itself — that's CDN-side,
    Untether can only render the wait better, not shorten it.

Source

Filed after the v0.35.3 staging dogfood session
9905b5fe-7e21-4042-8715-941c9206fa52 on 2026-05-05, where a 45-min
run felt frozen but was actually 17 min of healthy CI/CDN polling +
9 min of agent work + a separate 14-min approval-keyboard wait.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions