Skip to content

0.35.3rc19 — /monitor campaign issue sweep (10 commits)#555

Merged
Nathan Schram (nathanschram) merged 10 commits into
devfrom
fix/rc19-monitor-campaign-bundle
May 17, 2026
Merged

0.35.3rc19 — /monitor campaign issue sweep (10 commits)#555
Nathan Schram (nathanschram) merged 10 commits into
devfrom
fix/rc19-monitor-campaign-bundle

Conversation

@nathanschram
Copy link
Copy Markdown
Member

Summary

Bundles 9 OPEN issues from the recent /monitor campaign (lba-1 staging, runs 2026-05-13 through 2026-05-16) into one rc19 staged for fleet rollout, plus the daemon-filed #533 duplicate.

User-confirmed scope decisions: all in-scope issues moved into v0.35.3 milestone; #527 (umbrella stall predicate refactor) explicitly deferred to v0.35.4.

Already fixed in master (closed out separately)

Bundled here

# Title Commit
#528 "↩️ Answered:" echo no longer truncated to 100 chars d175be9
#525 dedup cancel.requested triple-fire (1s TTL on chat/msg id) f558235
#532 consolidate per-engine setup.warning to one setup.summary INFO a9dc234
#547 axis 1 preamble warns agents against systemctl restart untether 12cf4ca
#523 recognise leading-dot slash-command typos (.new, .usage, …) b02c20d
#524 surface skipped outbox entries (directories etc.) in chat fde1bd8
#526 (+ #533) demote stall WARN to INFO during approval-pending 66ba78f
#547 axes 2+3 + #548 hot-reload Telegram confirmation message 40dc6b7
#546 bypass outbox for answer_callback_query (latency fix) ee03b32
version bump 0.35.3rc18 → 0.35.3rc19 + uv.lock 12c2d71

Deferred to v0.35.4

Pre-PR validation

  • uv run pytest — 2737 passed / 2 skipped (1 skipped is pre-existing)
  • Coverage gate — 82.58% (above 80%)
  • uv run ruff check src/ tests/ — clean
  • uv run ruff format --check src/ tests/ — clean
  • uv lock --check — clean

Test plan

🤖 Generated with Claude Code

The `↩️ Answered:` confirmation after an AskUserQuestion text reply was
hard-sliced at [:100] with no ellipsis, so users couldn't see whether
their full message reached the agent (the agent path was unaffected and
always received the complete text). Replace the slice with a 300-char
soft cap + ellipsis via the new `_format_answered_echo` helper.

Regression tests in tests/test_loop_coverage.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rapid double/triple-tap of the inline Cancel button delivered three
`cancel.requested` events for one user intent (Telegram delivered
duplicate callbacks before the keyboard cleared). Repeat
`cancel_requested.set()` was benign today, but log noise + future
side-effectful cancel actions would inherit the 3x fan-out.

Add a 1-second TTL dedup keyed on (chat_id, progress_message_id) in
all three cancel entry points (text-reply, text-fallback, callback).
Per-test autouse fixture clears the module-level dict so tests that
reuse (chat_id, msg_id) aren't surprised by silent drops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ummary

Previously every `config.reload.applied` emitted one [warning]
setup.warning per engine not on PATH. On a single-engine host (e.g.
channelo runs only claude) that's 5 WARNs per reload, padding warn
filters in untether-issue-watcher, /monitor, and Grafana with
intentional install state.

Replace with one INFO `setup.summary` line per reload that lists
found/missing_on_path/bad_config engines. Loud WARN now reserved for
engines the user actively configured (non-empty [engines.<id>] block)
but that aren't on PATH — those are noteworthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tether

Agents routinely follow `Edit untether.toml` with `Bash systemctl --user
restart untether` because their training data is full of "restart the
service after config changes". Untether already hot-reloads the file;
the restart shuts down the very session issuing the command, drain hits
the 120s timeout, and the agent's final answer to the user is silently
dropped via outbox.fail_pending.

Add a dedicated "Configuration changes (`untether.toml`)" section to
the default preamble explicitly telling agents NOT to restart after
editing config, with the consequence spelled out and the restart-only
key list (`bot_token`, `chat_id`, `session_mode`, `topics`,
`message_overflow`) provided as the genuine exception.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
\`.new\`, \`.cancel\`, etc. previously dispatched as fresh agent prompts —
full Claude cold-start cost (OAuth handshake, MCP catalog probe,
preamble injection) paid before the user could cancel. \`.\` and \`/\` are
adjacent on iOS/Android punctuation rows, and several mobile keyboards
auto-replace a leading \`/\` with \`.\` on autocorrect.

Add \`parse_dot_typo\` helper that recognises \`.<cmd>\` and \`.<cmd> args\`
shapes where <cmd> matches a registered slash command (case-insensitive,
ellipsis/path-prefix safe). Wired into route_message in a follow-up
commit so detection happens before agent dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an agent writes a directory (e.g. \`guides/\`) into \`.untether-outbox/\`,
the scanner logs \`outbox.skipped\` and drops it without surfacing
anything to the user. The agent's "I've prepared the guides folder for
you" final message becomes a silent lie.

Wire the skipped tuples through to a new \`_format_outbox_skipped_notice\`
helper in runner_bridge.py (added in the #547 axis 1 commit alongside
the preamble update) that composes a brief 📎 Outbox skipped block and
sends it as a follow-up message in the same chat. Gated by new
\`[transports.telegram.files].outbox_notify_skipped\` config flag (default
true so the surface fires automatically on upgrade).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The runner_bridge.py change shipped in the #547 axis 1 commit (same
file edited for both fixes); this commit adds the regression tests.

When threshold_reason == \"pending_approval\", emit a paced
\`subprocess.approval_pending\` INFO (max once per 30 min) instead of the
\`progress_edits.stall_detected\` WARN. The chat-side \"⏳ Awaiting your
approval (N min)\" message (#494-C) is unchanged — only the log-side
WARN is suppressed, so warn-filter dashboards and the
untether-issue-watcher daemon stop spamming on legitimate approval
waits. Also closes #533 as a duplicate (daemon-filed
subprocess.liveness_stall on nsd — same root cause).

Tests in test_exec_bridge.py assert:
- progress_edits.stall_detected WARN is NOT emitted when
  approval_pending is true
- subprocess.approval_pending INFO IS emitted with approval_pending=True
- The INFO fires at most once per 30-min window even with rapid ticks

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g.reload

The loop.py wiring (\`_notify_reload_applied\` + handle_reload changes)
shipped in the #528 commit (same file edited for both fixes); this
commit adds the formatting module, tests, and FAQ entry.

New module \`src/untether/config_reload_notification.py\` exposes three
message shapes (hot-reload-only / restart-required / partial-reload)
with literal "**No restart needed.**" / "**Restart required**"
headlines — agents read these messages in next-turn context and the
framing flips the trained-in reflex to \`systemctl restart\` after
editing config.

Broadcast follows the same project-chat + admin-DM fan-out pattern as
\`_notify_restart_required\` (#318) so the affirmation reaches whoever
edited the file even in project-routed deployments.

FAQ entry "Do I need to restart Untether after editing untether.toml?"
documents the hot-reload behaviour, restart-only key list, and the
agent-don't-restart guidance from #547 axis 1.

Axis 3 (drain self-restart heuristic) deferred to v0.35.4 — the obvious
"detect the active session ran systemctl" heuristic is fragile and
inverts cleanly to false-positive on legitimate sibling-unit restarts.
The robust path needs a bigger refactor. Axes 1+2 together break the
recurring pattern at its source.

Closes #548.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rapid taps (e.g. approving plans in two chats inside ~2s) saw
callback-answer latency escalate 6-10x: 1st click ~220ms HTTP baseline,
2nd/3rd clicks 1.4-2.9s. Root cause was that \`answer_callback_query\`
went through \`enqueue_op(chat_id=None)\` and stacked on the shared
\`_next_at[None]\` per-chat pacing bucket (private_interval=1.0s) even
though Telegram doesn't rate-limit callback-answers per chat —
they're keyed off callback-query-id.

Route \`answer_callback_query\` directly through \`self._client.answer_callback_query\`,
bypassing the outbox semaphore + per-chat pacing. Retry-after handling
preserved (one retry on TelegramRetryAfter then fail-fast — better than
silent retry loops since the spinner expires after 30s anyway). Add
\`queue_wait_ms=0.0\` field to \`callback.answered\` instrumentation so
monitoring dashboards can confirm the bypass survived future refactors.

Regression test asserts the outbox.enqueue path is never reached during
answer_callback_query.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles 9 issues from the recent /monitor campaign (lba-1 staging,
2026-05-13 through 2026-05-16 runs) + the daemon-filed #533 dup:

- #528 — Answered: echo no longer truncated to 100 chars
- #525 — dedup cancel.requested triple-fire
- #532 — consolidate per-engine setup.warning to one summary
- #547 axis 1 — preamble warns agents against systemctl restart
- #523 — recognise leading-dot slash-command typos (\`.new\` etc.)
- #524 — surface skipped outbox entries (directories etc.) in chat
- #526 — demote stall WARN to INFO during approval-pending (+ #533)
- #547 axes 2+3 + #548 — hot-reload Telegram notification
- #546 — bypass outbox for answer_callback_query (latency fix)

Plus housekeeping: closed #544 / #497 (verified already fixed in
rc16 / rc14), closed #531 (label + monitor TOML config drift fixed
out-of-tree). Axis 3 of #547 (drain self-restart heuristic) deferred
to v0.35.4 — needs a bigger refactor than rc19 wants. #527 (umbrella
predicate refactor) deferred to v0.35.4 per user decision.

2737 tests pass / 82.58% coverage / ruff format + check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b819456c-f039-4409-bb28-06901b5b4952

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/rc19-monitor-campaign-bundle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nathanschram Nathan Schram (nathanschram) merged commit b23d1f8 into dev May 17, 2026
21 checks passed
@nathanschram Nathan Schram (nathanschram) deleted the fix/rc19-monitor-campaign-bundle branch May 17, 2026 08:35
Nathan Schram (nathanschram) added a commit that referenced this pull request May 19, 2026
)

Both issues shipped rc19 fixes (PR #555) but /monitor audits on
2026-05-18 showed each regression still firing in production because
the rc19 patch landed in only one of two code paths.

#524 — outbox silently drops directory entries

rc19 surfaced 📎 Outbox skipped notices on the normal-completion path
in handle_message but missed two adjacent paths: the pre-auto-continue
delivery (subprocess 1 stuck-after-tool-result recovery) and the
run_ok=False failed-run branch. Both still silently dropped the agent's
intended deliverable.

This commit extracts the surfacing logic into _surface_outbox_skipped
in runner_bridge.py and wires it into both gap paths. On a failed run
the code still skips the actual file send (preserving the original
gating) but does a cheap scan_outbox() to collect skipped items and
surface them, so the user always learns what the agent intended to
ship. Honours the existing outbox_notify_skipped config flag and
filters the "..." overflow pseudo-entry from the user-facing block.

#526 — approval-pending stalls misclassified

rc19 demoted the bridge-side WARN (progress_edits.stall_detected) to a
paced INFO (subprocess.approval_pending) when _has_pending_approval()
returned true. The watchdog-side detector in runner.py (which emits
subprocess.liveness_stall and is the actual signal untether-issue-watcher
auto-files on) was untouched, so the daemon kept filing GitHub issues
on routine approval-pending sessions and the nsd audit (2026-05-18)
showed a user cancelling a productive 15-minute investigation because
the chat-side reassurance came too late (1800s threshold).

This commit:
- Adds _recent_event_is_control_request helper in runner.py — uses the
  stream's recent_events ring buffer as the approval-pending signal,
  consistent with the bridge's inline-keyboard predicate but accessible
  to runner-scope code.
- Plumbs the predicate into _watchdog_loop: when the last JSONL event
  is control_request, emit subprocess.approval_pending INFO instead of
  liveness_stall WARN. Skip the auto-kill branch entirely. Pace INFO
  emission to once per 30 min via the shared _APPROVAL_PENDING_REFIRE_S
  constant (now defined once in runner.py and imported by the bridge).
- Splits _STALL_THRESHOLD_APPROVAL into _STALL_THRESHOLD_APPROVAL_FIRST
  (600s) and the existing 1800s refire so the user gets a reassuring
  "tap a button above" chat message at 10 min on first occurrence,
  matching the watchdog's liveness threshold and avoiding the nsd-style
  early cancellation.
- Rewords the chat-side approval reminder copy to make the "tap a
  button above to proceed (no action needed otherwise)" affordance
  explicit, directly quoting the audit's recommended text.

Tests cover both code paths:
- tests/test_outbox_delivery.py (existing) — format helper + settings
  default unchanged; no new file-level tests needed.
- tests/test_exec_bridge.py — failed-run surfacing, notify_skipped=false
  suppression, only-overflow filter, two-tier first-reminder threshold,
  reworded copy.
- tests/test_exec_runner.py — predicate truth-table coverage, watchdog
  demotion via integration with a fake codex script emitting control
  _request, watchdog WARN still fires when no control_request is recent.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant