Skip to content

release: v0.35.3#561

Open
Nathan Schram (nathanschram) wants to merge 60 commits into
masterfrom
dev
Open

release: v0.35.3#561
Nathan Schram (nathanschram) wants to merge 60 commits into
masterfrom
dev

Conversation

@nathanschram
Copy link
Copy Markdown
Member

Summary

Promotes v0.35.3 from dev to master to trigger the PyPI publish.

This is the release gate — squash-merging this PR will:

  1. Fire auto-tag-on-master.yml → creates tag v0.35.3
  2. Fire release.yml → publishes wheel + sdist to PyPI via OIDC trusted publishing
  3. Create the GitHub Release with built artifacts

v0.35.3 milestone: 58 closed, 0 open. CHANGELOG dated 2026-05-20. All CI checks green on the source PR (#560).

Highlights

Full detail in CHANGELOG.md v0.35.3 (38 entries across breaking / changes / fixes / docs / tests).

Post-merge checklist

  • Wait for auto-tag-on-master.yml to push v0.35.3 tag (~1 min)
  • Wait for release.yml to publish wheel + sdist to PyPI (~3 min)
  • Verify PyPI page shows v0.35.3
  • Verify GitHub Release page for v0.35.3 exists
  • Run scripts/run-integration-tests.sh 0.35.3 --manual to write attestation marker
  • Run scripts/fleet-rollout.sh 0.35.3 to roll all 4 hosts (lba-1 + nsd + channelo + mac)
  • /ping smoke on each host's bot

Pre-flight already done

  • v0.35.3 milestone: 58 closed, 0 open
  • ✅ CHANGELOG.md complete + validated (scripts/validate_release.py: 3 passed, 0 failed)
  • ✅ pyproject.toml: 0.35.3
  • ✅ uv.lock synced
  • ✅ Source PR release: v0.35.3 #560 CI green: format / ruff / ty / pytest (3.12, 3.13, 3.14) / build / lockfile / install-test / pip-audit / bandit / CodeQL / docs / release-validation all pass
  • ✅ FAQ shape: 15 question-shaped H2s

🤖 Generated with Claude Code

Enables `[claude] extra_args = ["--chrome"]` so Untether-spawned Claude
Code sessions can opt into the Claude-in-Chrome extension — previously
the `mcp__claude-in-chrome__*` tool namespace was absent from Untether
sessions because Claude Code 2.1.x gates it behind `--chrome` /
`CLAUDE_CODE_ENABLE_CFC=1`, and Untether never passed the flag.

Mirrors `codex.extra_args` and `pi.extra_args`. Flags Untether manages
internally (`-p`, `--print`, `--output-format`, `--input-format`,
`--resume`/`-r`, `--continue`/`-c`, `--permission-mode`,
`--permission-prompt-tool`) are rejected at config-load with a
`ConfigError` so duplicate-argv surprises fail fast. User args land on
argv after the managed stream-json prelude and before resume / model /
effort / allowed-tools / permission flags, preserving the trailing
`-p <prompt>` (or stdin prompt under permission-mode) position.

- src/untether/runners/claude.py: add `extra_args` field, thread
  through `build_args`, parse + validate in `build_runner`
- tests/test_build_args.py: +8 tests (argv ordering, permission-mode
  argv, multi-flag order, build_runner parsing, reserved-flag rejection
  for individual flags and `key=value` prefixes)
- docs/reference/config.md, docs/reference/runners/claude/runner.md:
  document the new key, including reserved-flag list
- CHANGELOG.md: v0.35.3 (unreleased) entry

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: staging 0.35.3rc1

Stage Claude extra_args (#407) for TestPyPI. This rc1 is the wheel the Mac
Untether instance will install to validate Claude-in-Chrome end-to-end per
docs/audits/2026-04-21-claude-in-chrome-test-plan.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deps: bump lxml 6.0.2→6.1.0 and python-dotenv 1.2.1→1.2.2

pip-audit flagged two new transitive CVEs after PR #408 merged:
- lxml 6.0.2: CVE-2026-41066 (fix 6.1.0) — pulled via sulguk
- python-dotenv 1.2.1: CVE-2026-28684 (fix 1.2.2) — pulled via
  pydantic-settings

Both have clean fixes. Lockfile-only change; pyproject.toml constraints
unchanged. Local pip-audit clean after bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(security): Group 1A hygiene — 8 issues

Bundles eight low-risk security hygiene fixes for v0.35.3:

- #205 — split runner.start log so prompt content stays at DEBUG
- #206 — flip AMP dangerously_allow_all default to False (opt-in only)
- #207 — Pi session dir created with mode 0o700 + chmod existing
- #208 — extend stderr sanitisation to /Users, /private/var, /tmp,
        /var, /opt, /srv, /etc, /usr/local, /app, /workspace, /root
- #211 — replace stat()+read_bytes() with capped streaming read in
        anyio worker thread; closes TOCTOU window on /file get
- #213 — add OPENAI_PROJECT_KEY_RE for sk-proj-... redaction (the
        underscore/hyphen char set is not covered by the generic
        sk- pattern)
- #402 — bump Pygments 2.19.2 → 2.20.0 via uv lock (CVE-2026-4539
        ReDoS, transitive)
- #403 — replace 123456789:ABCdef… placeholder bot tokens with
        <BOT_ID>:<BOT_TOKEN> in non-test paths (onboarding.py,
        install.md, llms-full.txt); test fixtures kept as-is for
        GitHub-UI dismissal

All 2410 tests pass; ruff check + format clean; uv lock --check ok.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: silence bandit B108 false positive + ignore CVE-2026-3219

- bandit B108 fires on the new /tmp/ regex pattern in
  _PATH_PATTERNS at runner.py — regex for stderr redaction, not
  a hardcoded temp-file write. Suppressed with `# nosec B108`
  matching the existing render.py:111 pattern.

- pip-audit now flags pip 26.0.1 → CVE-2026-3219 (advisory
  published recently; no fix available upstream). Added to the
  --ignore-vuln list alongside CVE-2026-4539 (pygments — kept
  for posterity even though #402 lockfile bump fixed it).

No source/test code changes. CI-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
)

`_daily_cost` is a module-level tuple updated via read-modify-write
in record_run_cost(). Concurrent finalize_run callers could both
read (today, X), both write (today, X + cost), and lose one run's
cost — letting a malicious or runaway concurrent workload defeat
the per-day budget gate.

Fix: wrap the RMW block in a `threading.Lock`. Critical section is
a single tuple assignment (sub-microsecond), so the lock is fine
under both async (cooperative) and threaded callers without an
async-signature ripple. get_daily_cost() also acquires the lock for
snapshot consistency.

Trade-off note: kept the function sync rather than pivoting to
`anyio.Lock` because that would require updating the 6 sync test
call sites and the 1 sync caller in runner_bridge.py — needless
churn for a sub-microsecond critical section.

Test: new ThreadPoolExecutor-driven fuzz test (16 workers, 200
calls) asserts the observed total equals n * unit_cost — would
fail under racing RMW.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the voice transcription API key into parity with `bot_token`
(closed #196): SecretStr masks the value in repr()/str()/tracebacks
and any accidental structlog serialisation. Access the raw value
via `.get_secret_value()` at the transport boundary.

Changes:
- `settings.py`: field type `NonEmptyStr | None` → `SecretStr | None`;
  new `_validate_voice_key_not_empty` validator preserves the prior
  no-empty-string contract by round-tripping `""`/whitespace to None
- `telegram/bridge.py`: `TelegramBridgeConfig.voice_transcription_api_key`
  annotation → `SecretStr | None`; `update_from()` unchanged (assigns
  SecretStr to SecretStr)
- `telegram/loop.py:2208`: sole unwrap point — call
  `.get_secret_value()` only when non-None before passing to
  `transcribe_voice` (OpenAI SDK still wants raw `str | None`)
- `telegram/voice.py`: unchanged; boundary stays at the loop caller

Tests:
- `test_settings.py`: new `test_voice_transcription_api_key_is_secret_str`
  (round-trip + repr/str masking), `_empty_string_normalised_to_none`
  (whitespace → None), `_default_none` (omitted → None)
- `test_bridge_config_reload.py`: hot-reload tests updated to use
  `.get_secret_value()` for value comparison
- `test_telegram_backend.py`: updated build_and_run assertion

All 2413 tests pass; ruff check + format clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump rc1 → rc2 to publish a fresh staging wheel that includes:

- #431 — Group 1A security hygiene (8 issues: #205, #206, #207, #208,
        #211, #213, #402, #403)
- #432#379 daily cost tracker race (threading.Lock guard)
- #433#378 voice_transcription_api_key SecretStr

rc1 (b6c6ad6) only carried #407 (Claude extra_args). rc2 supersedes
it on TestPyPI.

No CHANGELOG entry — per release-discipline.md §"Staging / rc
versions", entries batch into the stable bump.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ult (#409) (#435)

Self-installed Untether users in heterogeneous environments need to
thread credential-manager tokens (1Password, Doppler, Vault, Infisical,
…) into engine subprocesses. Today the env allowlist is hard-coded in
`utils/env_policy.py` so adding a single var requires a fork + release.

Changes:
- `utils/env_policy.py`:
  - new `is_allowed_with_extras(name, extra_exact=, extra_prefix=)`
  - `filtered_env()` extended with `extra_prefix=` parameter
  - new `log_user_extensions_once()` — module-level latch emits one
    `env_policy.user_extension` INFO per process when user extras are
    active, so the operator sees the addition in journalctl
- `settings.py` `SecuritySettings`:
  - `env_extra_allow: list[str]` (default `[]`)
  - `env_extra_prefix_allow: list[str]` (default `[]`)
  - field validators reject empty/whitespace and enforce `[A-Z_][A-Z0-9_]*`
- `runners/claude.py`, `runners/pi.py`:
  - new `_load_env_extras()` helper (best-effort settings load — never
    blocks a run on a config error, mirrors the env_audit pattern)
  - threads extras through `filtered_env()` + `log_user_extensions_once()`
- `utils/env_audit.py` `audit_proc_env()`:
  - new `user_extra_exact=`/`user_extra_prefix=` params so user-allowed
    names aren't false-flagged as `claude.env_audit.leaked_var`
- Built-in defaults: `BWS_ACCESS_TOKEN` promoted into `_EXACT_ALLOW`
  (Bitwarden Secrets Manager — common enough to ship as a default).
- Docs: `docs/reference/config.md` `[security]` table, CLAUDE.md
  features list.

Tests: +19 across `tests/test_env_policy.py` (8 user-extension cases +
log latch), `tests/test_env_audit.py` (4 user-extras cases), and
`tests/test_settings.py` (7 round-trip + validator cases).

`uv run pytest` → 2432 passed, 2 skipped; ruff clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump rc2 → rc3 to publish a fresh staging wheel that includes #435.

Cumulative since rc1:
- #431 — Group 1A security hygiene (8 issues: #205, #206, #207, #208,
        #211, #213, #402, #403)
- #432#379 daily cost tracker race (threading.Lock guard)
- #433#378 voice_transcription_api_key SecretStr
- #435#409 user-extensible env allowlist + BWS_ACCESS_TOKEN default

No CHANGELOG entry — per release-discipline.md §"Staging / rc versions",
entries batch into the stable bump.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) (#437)

#377 fix:
- `TelegramTransportSettings` gains `allow_any_user: bool = False` (opt-in
  escape hatch) and `_validate_allowed_user_ids_or_optin` model_validator
  raising ValueError when `allowed_user_ids == []` and `allow_any_user is
  False`. Pre-v0.35.3 the empty default silently shipped open bots —
  this is the v0.35.3 promotion of the warning to a hard ConfigError.
- `TelegramBridgeConfig` and `update_from()` carry the new field through
  hot-reload; backend constructs with the value.
- `telegram/loop.py` drops the per-update `security.no_allowed_users`
  warning (validator now blocks startup) and emits
  `security.allow_any_user` INFO every boot when the opt-out is in
  effect.
- `config_migrations.py` `_migrate_legacy_telegram` relocates a top-level
  `allow_any_user` key into `[transports.telegram]` alongside `bot_token`
  / `chat_id` so legacy configs migrate cleanly.

CHANGELOG: backfilled `## v0.35.3 (unreleased)` with `### breaking`,
`### changes`, `### fixes` subsections covering all 13 issues that
shipped in rc1-rc4 (#205, #206, #207, #208, #211, #213, #377, #378,
#379, #402, #403, #407, #409). Per release-discipline.md the section
heading stays `(unreleased)` until the dev → master stable bump
populates the date.

Docs sweep:
- `docs/how-to/security.md` — required-allowlist wording, dev/demo
  opt-out callout, env_extra_allow / env_extra_prefix_allow extension
  guide, sk-proj redaction note, voice-key SecretStr note.
- `docs/how-to/troubleshooting.md` — new top-of-page section for
  `allowed_user_ids is empty` startup error.
- `docs/how-to/group-chat.md` — required wording.
- `docs/how-to/operations.md` — `env_extra_allow` + `allow_any_user`
  added to hot-reloadable list.
- `docs/tutorials/install.md` — `allowed_user_ids` added to all three
  example configs (assistant / workspace / handoff).
- `docs/reference/config.md` — `allow_any_user` row added,
  `allowed_user_ids` flipped to required, AMP `dangerously_allow_all`
  default note flipped to `false`.
- `docs/reference/runners/amp/runner.md` — flag is now optional;
  `dangerously_allow_all = false` example.
- `docs/reference/env-vars.md` — `BWS_ACCESS_TOKEN` default mention,
  `[security] env_extra_*` extension subsection.

Test fixtures:
- ~30 test fixtures across `test_settings`, `test_cli_*`,
  `test_projects_config`, `test_telegram_backend`,
  `test_bridge_config_reload`, `test_config_watch`,
  `test_config_path_env`, `test_onboarding*`, `test_runtime_loader`,
  `test_settings_contract`, `test_exec_bridge` patched to add
  `allow_any_user = true` (or `"allow_any_user": True`) where the
  fixture exercises non-allowlist behaviour. Tests that specifically
  cover #377 use `populated allowlist` cases.

#377 tests: 4 new in `test_settings.py` covering block + opt-out +
populated + both-set.

GitHub housekeeping (parallel to this commit, not in the diff):
- Closed #205, #206, #207, #208, #211, #213, #378, #379, #402, #403,
  #409 with implementation references. #377 closes via this PR's body.

Version: 0.35.3rc3 → 0.35.3rc4 (`pyproject.toml`, `uv.lock`).

Verification: 2436 tests pass / 2 skipped (~68s). Ruff check + format
clean. uv lock --check in sync.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the literal "Basic dXNlcjpwYXNz" string in test_malformed_bearer_header
with a runtime-constructed header so GitHub's secret-scanner stops flagging it.
The test still asserts verify_auth rejects Basic auth — Untether webhooks only
accept Bearer + HMAC.

The corresponding GitHub secret-scanning alert is a true false positive (test
fixture, not a real credential) and will be dismissed in the GitHub UI as
"Used in tests / false positive".

Closes #404
…-approve safety (#380) (#442)

The 2026-04-20 audit (§ASI02) flagged
``ControlRewindFilesRequest`` and ``ControlMcpMessageRequest`` as worth
a deeper look because rewind could in principle undo state that drove a
prior denial decision and MCP messages could carry tainted payloads
from a compromised MCP server.

Audit verdict: both are safe to auto-approve under the current upstream
Claude Code 2.1.x trust model.

- mcp_message: Untether is a transport pass-through; the message
  payload is opaque storage and is never inspected, executed, or
  rendered. A compromised MCP server is the inherent threat model of
  any MCP server, not specific to auto-approve. Routing this through
  Telegram approval would not block the payload.
- rewind_files: rewind is user-initiated upstream (the model cannot
  trigger it autonomously). Untether's per-session approval state
  (_PLAN_EXIT_APPROVED, _DISCUSS_APPROVED, _HANDLED_REQUESTS) is NOT
  mutated by rewind. Subsequent writes still pass through the standard
  ControlCanUseToolRequest gate.

No code change beyond:

1. Multi-paragraph safety-invariant comment in
   src/untether/runners/claude.py near _AUTO_APPROVE_TYPES, including
   the re-audit trigger (upstream semantic change to either subtype).
2. 3 regression-lock tests in
   tests/test_claude_control.py::TestAutoApproveSafetyInvariant
   that fail loudly if the auto-approve path starts inspecting payloads
   or coupling to per-session approval state.
3. Audit memo at docs/audits/2026-04-27-380-auto-approve-scope-review.md.

Closes #380

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (#440)

The chat-level message-routing command (`all` / `mentions` / `clear`)
shared a name with the unrelated webhook/cron triggers system, which
became increasingly confusing as `/config` grew separate trigger pages.

User-visible changes:
- New `/listen` command (`all`/`mentions`/`clear`) replaces `/trigger`
- `/trigger` continues to work as a deprecated alias for one release
  cycle and prepends a one-line deprecation notice
- `/config → 📡 Listen` page replaces `📡 Trigger`
- Home page summary renders `Listen: all` instead of `Trigger: all`
- Bot command menu lists `listen` instead of `trigger`

Internal renames:
- `telegram/trigger_mode.py` → `telegram/listen_mode.py`
- `commands/trigger.py` → `commands/listen.py`
- Type `TriggerMode` → `ListenMode`
- Function `resolve_trigger_mode` → `resolve_listen_mode`
- ChatPrefsStore / TopicStateStore: new `*_listen_mode` methods;
  legacy `*_trigger_mode` methods preserved as one-release aliases

Storage: msgspec field is still named `trigger_mode` for backward
compat with existing `telegram_chat_prefs_state.json` /
`telegram_topics_state.json` files. No migration is needed.

Tests: full suite passes (2438 passed, 2 skipped). Two new tests in
test_telegram_agent_trigger_commands.py cover the deprecation prefix
and clean `/listen` output. test_config_command toast expectations
updated to "Listen: ...".

Closes #297

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a global pause control for the trigger system (crons + webhooks)
accessible via /config in Telegram. During pause:
- Cron scheduler skips its tick — run_once crons are NOT consumed and
  fire on the next matching tick after resume
- Webhook server returns 503 (with Retry-After: 60) instead of
  dispatching, so external monitors can distinguish paused-but-up from
  healthy. Returns 404 for unknown paths as before
- /health endpoint surfaces {"status":"paused","paused":true}

Pause is in-memory only — restart auto-resumes. This is the safe
default per the issue's recommendation, and mirrors /at scheduler
behaviour.

UI:
- New /config home-page row "⏸ Pause triggers" / "▶️ Resume triggers"
  appears only when triggers are configured
- New dedicated "📡 Triggers" page (config:tg) showing state + counts
  with Pause/Resume button; gracefully handles no-trigger-manager
  and zero-config cases
- /ping shows "⏸ triggers paused: … (suspended)" indicator while paused

Tests: 15 new tests across test_trigger_manager.py (8 pause toggle
behaviours including 503 webhook check), test_ping_command.py
(2 paused/resumed indicators), and test_config_command.py
(5 TestTriggersPage covering unavailable/empty/pause/resume/toast).
Full suite: 2445 passed, 2 skipped.

Closes #294

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fication (#438) (#443)

Adds [watchdog] claude_stream_idle_timeout_ms (default 300_000 ms,
range 30 s – 30 min) so deployments hitting upstream Anthropic API
stalls on long opus 4.7 1M plan-mode generations can raise the
watchdog without forking the codebase. Untether's Claude runner reads
the value via setdefault — shell-set CLAUDE_STREAM_IDLE_TIMEOUT_MS
still wins. Settings load failure falls back to the hardcoded 300_000
default with a debug log entry.

Type-A vs Type-B classification on the failure message:

- Type A — mid-generation stall (num_turns >= 1 && duration_api_ms > 0).
  Often legitimate long opus reasoning that exceeded the watchdog.
  Inline hint suggests raising the new config knob.
- Type B — cold-start zero-byte stall (num_turns <= 1 && duration_api_ms
  == 0). Upstream API outage — raising the timeout will NOT help.
  Inline message says so explicitly.

Auto-retry on Stream idle timeout deferred to v0.35.4 pending upstream
Anthropic stabilisation (8 duplicate api:anthropic issues filed
2026-04-17→26 across macOS/Windows/web/WSL).

Tests: 5 new tests in test_claude_runner.py. Full suite 2460 passed,
2 skipped. Lint clean.

Closes #438

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…410) (#444)

Promotes claude_usage.schema_mismatch from one-shot per-process to
per-call counter so the issue-watcher catches ongoing API-shape drift
instead of just the first hit. Structured event carries a cumulative
`count` field; new runner_bridge.get_usage_schema_mismatch_count()
exposes the counter for the debug page.

UsageCacheStats added to utils/usage_cache.py tracking last successful
fetch wall time, cache age, last-error class+message; populated on
every fetch path including stale-while-error fallbacks.

_read_token_expiry_ms() added to telegram/commands/usage.py so the
OAuth token expiry can be surfaced without raising on missing
credentials (best-effort: returns None on any read failure).

/usage debug appends a 🔧 debug block (HTML) showing:
- last successful fetch (UTC ISO + age + fresh/stale label)
- last error (class + message, 120-char truncated)
- OAuth token expiry (with hh/mm remaining)
- cumulative schema-mismatch counter

Operator-facing signal so the next time the subscription footer goes
silent, the root cause is visible without grepping journalctl.

Tests: 5 new in test_usage_cache.py::TestCacheStatsObservability;
1 in test_command_engine_gates.py::TestUsageDebugMode; existing
test_schema_mismatch_warning_fires_once repurposed to assert per-call
firing with cumulative counts. Full suite: 2465 passed, 2 skipped.

Closes #410

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n + last-fired history + /stats breakdown (#271) (#445)

Tier 2: `/config → ⏰ Triggers` now lists every cron and webhook configured
for the current chat. Crons render as `id · describe_cron(...) · proj · eng ·
last X` and webhooks as `id · path · auth · proj · eng · last X`. Lists are
scoped via `crons_for_chat`/`webhooks_for_chat` with the bridge default_chat_id
fallback, capped at 10 entries with an overflow marker, and omitted when the
chat has no triggers (pause/resume controls remain regardless).

Tier 3: new `triggers/history.py` JSON store at
`<config_path>.with_name("triggers_history.json")`. Records `time.time()`
after every successful cron dispatch (cron.py:130) and webhook dispatch
(dispatcher.py:dispatch_webhook + dispatch_action). Recording is best-effort
— OSError writes log `triggers.history.write_failed` and swallow.

`/stats` appends `(N triggered, M manual)` per engine line and on the totals
row when at least one count > 0. `DayBucket`/`AggregatedStats` carry additive
`triggered_count`/`manual_count` with `.get(..., 0)` fallbacks so existing
stats.json files load cleanly. `runner_bridge.handle_message` resolves the
split via `triggered=bool(context and context.trigger_source)`.

28 new tests: 10 in test_triggers_history.py (round-trip, corrupt JSON,
version mismatch, persistence), 7 in test_session_stats.py (triggered/manual
split, back-compat with old format), 3 in test_stats_command.py (breakdown
present/omitted/totals), 7 in test_config_command.py::TestTriggersPagePerChat
(crons listed, webhooks listed, chat filtering, default_chat_id fallback,
last-fired rendering, overflow cap), 2 in test_trigger_cron.py (cron firing
records last_fired + history failure resilience), 2 in
test_trigger_dispatcher.py (webhook records last_fired + history failure
resilience). Full suite: 2496 passed, coverage 82.18%.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…) (#446)

After a Claude bidirectional session emits `result`, the CLI keeps stdin
open so multi-turn sessions don't re-spawn. In practice this leaves a
400 MB RSS subprocess + ~200 TCP sockets idling for 30+ minutes between
prompts, and from the user's perspective the session looks "stuck" —
final message rendered, no further indication of state.

Option D hybrid:
- New `[watchdog].post_result_idle_enabled = true` (kill switch) and
  `[watchdog].post_result_idle_timeout = 600.0` (30s–1h) in settings.
- `ClaudeStreamState.result_received_at` armed by `translate_claude_event`
  on every `StreamResultMessage` (re-armed per turn so multi-turn works).
- New `ClaudeRunner._post_result_idle_watchdog` task runs in the existing
  `run_impl` task group when `use_control_channel` is True. Polls the
  timer; when the deadline passes, calls `this_proc_stdin.aclose()`
  (same mechanism as the normal-flow exit at line 2412, just earlier).
  CLI hits stdin EOF and exits gracefully (rc=0).

- Auto-continue safety: the existing `_should_auto_continue` gate
  excludes `last_event_type == "result"` (locked by
  `test_skips_result_event_type` in test_exec_bridge.py), so the clean
  rc=0 exit will not phantom-resume the session.
- Approval-state guard: if `_REQUEST_TO_SESSION` or `_PENDING_ASK_REQUESTS`
  has live entries for this session, defer the close (re-arm the timer)
  to avoid orphaning a button-click control_response in flight.

UX hint #1: a supplementary `StartedEvent` with `meta={"complete":
"✓ turn complete"}` is emitted alongside `CompletedEvent` on successful
results (the supported pattern for late-arriving meta per
runner-development.md). `markdown.format_meta_line` renders it in the
footer so the user sees the turn boundary immediately. Errored results
don't get the hint (no false "complete" tag on a failure).

Two structlog events for ops:
- `claude.post_result_idle.deferred` — approval guard suppressed close
- `claude.post_result_idle.closing_stdin` — deadline passed, stdin closed

7 new tests in test_claude_runner.py: result-event arms timer, emits
turn-complete meta, skips meta on error, watchdog fires when clean,
watchdog defers when pending approval, format_meta_line renders the hint
when present and omits it when absent. Full suite: 2503 passed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#447)

Closes #269. The four settings groups in the issue had different states:
- [footer]: already loads fresh per-message via _load_footer_settings (no work)
- [cost]: already loads fresh per-call inside _check_cost_budget (no work)
- [watchdog]: already loads fresh per-run via _load_watchdog_settings at the
  top of handle_message (no work — verified, applies on next run)
- [progress]: was baked in at startup via MarkdownFormatter constructor +
  ExecBridgeConfig.min_render_interval — this PR closes that gap

Changes:
- markdown.py: new MarkdownFormatter.refresh_from(progress_settings) updates
  max_actions + verbosity from a fresh ProgressSettings snapshot. Tolerates
  missing/invalid attributes (clamps negative max_actions to 0; ignores
  unknown verbosity values).
- telegram/bridge.py: new TelegramPresenter.refresh_progress_settings()
  delegates to formatter.refresh_from.
- runner_bridge.py: new _load_progress_settings() sibling of
  _load_footer_settings / _load_watchdog_settings; handle_message reads it
  fresh per-run, calls cfg.presenter.refresh_progress_settings(...) via
  duck-typed getattr (Presenter is a Protocol, so we don't add to it), and
  threads progress_cfg.min_render_interval into each ProgressEdits instance
  instead of the startup snapshot. Per-chat /verbose overrides downstream
  of _resolve_presenter reconstruct from the refreshed defaults.

Out of scope (entry-point limitation): engine + command registration still
require pipx upgrade / restart. Documented on the issue.

8 new tests in tests/test_meta_line.py: TestMarkdownFormatterRefresh covers
max_actions update, verbosity update, negative clamp, invalid-verbosity
rejection, missing-attribute tolerance, presenter delegation. Plus
_load_progress_settings defaults / error-fallback. Full suite: 2511 passed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 9 v0.35.3 Group 2 issues now landed on dev:

- #404 — secret-scanning alert (PR #439)
- #297 — /trigger → /listen rename + alias (PR #440)
- #294 — master trigger pause/resume toggle (PR #441)
- #380 — auto-approve scope review (PR #442)
- #438 — claude_stream_idle_timeout_ms + Type-A/B classification (PR #443)
- #410 — subscription usage observability + /usage debug (PR #444)
- #271 — trigger visibility Tier 2 + Tier 3 (PR #445)
- #333 — Claude post-result idle timeout + ✓ turn complete UX hint (PR #446)
- #269 — hot-reload [progress] settings (PR #447)

Bumps to TestPyPI for staging via @hetz_lba1_bot once integration tests
U1-U7 pass against @untether_dev_bot.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps [dependabot/fetch-metadata](https://github.com/dependabot/fetch-metadata) from 2.5.0 to 3.1.0.
- [Release notes](https://github.com/dependabot/fetch-metadata/releases)
- [Commits](dependabot/fetch-metadata@21025c7...25dd0e3)

---
updated-dependencies:
- dependency-name: dependabot/fetch-metadata
  dependency-version: 3.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 7.4.0 to 8.1.0.
- [Release notes](https://github.com/astral-sh/setup-uv/releases)
- [Commits](astral-sh/setup-uv@6ee6290...0880764)

---
updated-dependencies:
- dependency-name: astral-sh/setup-uv
  dependency-version: 8.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 7.0.0 to 7.0.1.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@bbbca2d...043fb46)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: 7.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.32.6 to 4.35.2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@820e316...95e58e9)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
#471 + #271) (#472)

* fix(at): stamp at:<token> trigger_source on /at-scheduled runs (#271)

Mirror the cron:<id> / webhook:<id> footer markers added in #271 (rc4)
and Tier 2/3 (rc5) so /at-scheduled runs also show provenance.

at_scheduler.schedule_delayed_run wraps the captured chat context (or a
fresh RunContext when the chat is unmapped) with trigger_source =
"at:<token>" via dataclasses.replace. runner_bridge.handle_message's
icon-prefix tuple extends from ("cron:",) to ("cron:", "at:") so the
alarm-clock icon renders for both — semantically /at is a one-shot
delayed cron. record_run's existing triggered=bool(context and
context.trigger_source) gate picks up /at runs in the /stats
triggered/manual breakdown automatically.

Tests: 1 new in test_at_command.py
(test_handle_stamps_trigger_source_on_mapped_chat); the existing
test_handle_captures_global_default_when_unmapped extended to assert
the trigger_source-only RunContext path; existing
test_run_delayed_forwards_captured_context_and_engine updated since
the captured context is no longer reference-equal to the original.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(gemini): pass --skip-trust by default for headless runs (#471)

Gemini CLI rejects runs from any directory not in
~/.gemini/trustedFolders.json — even with --approval-mode yolo — and
there is no interactive prompt path in headless usage, so projects
outside the trust list silently failed before any agent output.

Untether already runs Gemini with yolo for the same "always headless"
reason, so passing --skip-trust extends the same precedent.
GeminiRunner.skip_trust (default True) is the runtime switch; opt out
per deployment with [gemini] skip_trust = false in untether.toml
(security-conscious operators who want Gemini's project-local
extension/MCP trust gate enforced).

Bump to 0.35.3rc6 for staging.

Tests: 2 new in test_build_args.py::TestGeminiBuildArgs
(test_skip_trust_default_includes_flag,
test_skip_trust_opt_out_omits_flag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sing feature coverage (#473)

Audited every issue in the v0.35.3 milestone (26 issues) against the
full repo documentation surface and closed the gaps. Reference issues
covered: #205, #206, #207, #208, #211, #213, #269, #271, #294, #297,
#333, #377, #378, #379, #380, #402, #403, #407, #409, #410, #438, #471.

CHANGELOG.md
- Added missing entry for #297 (/trigger → /listen rename) under
  ### changes. The other "milestone" issues (#224, #228, #239) were
  closed against v0.35.3 for tracking only — their fixes shipped in
  v0.35.0/v0.35.1rc2; per the repo's "no retroactive edits to prior
  sections" rule, they remain undocumented in CHANGELOG (closure
  comments cite the actual versions).

/trigger → /listen rename sweep (#297)
- README.md: command table row, group-chat link
- docs/reference/commands-and-directives.md: command row
- docs/reference/transports/telegram.md: command list + admin note
- docs/reference/integration-testing.md: O3 + Q12 test rows
- docs/explanation/routing-and-sessions.md: pre-routing filter section

Runner specs
- gemini/runner.md: --skip-trust default + opt-out via [gemini]
  skip_trust = false (#471)
- claude/runner.md: post-result idle watchdog + "✓ turn complete"
  meta hint (#333), claude_stream_idle_timeout_ms config + Type-A/B
  classifier (#438)

How-to guides
- schedule-tasks.md: trigger provenance + history + /stats
  triggered/manual breakdown (#271 Tier 3); master pause/resume
  toggle (#294)
- inline-settings.md: new Triggers page (#271 Tier 2 + #294)
- troubleshooting.md: Type-A/B stream idle classification (#438);
  post-result idle watchdog + ✓ turn complete (#333)
- security.md: extended path-redaction coverage (#208); Pi session
  dirs 0o700 (#207)
- subscription-usage.md: /usage debug section (#410)
- operations.md: pause status surfacing in /health (#294); /usage
  debug cross-link (#410); expanded hot-reload list to include
  [progress] (#269), [watchdog] (#333, #438), [footer], [cost]

README.md
- Scheduled tasks bullet: pause/resume toggle (#294); footer
  provenance markers (#271 Tier 3); /stats triggered/manual split
- Inline settings bullet: 📡 Triggers page (#271, #294)
- Commands table: /usage debug (#410); /listen (#297); /config
  Triggers page row

Verified clean:
- python3 scripts/validate_release.py (rc6 pre-release)
- grep -rnE "/trigger\\b" docs/ README.md returns zero non-deprecation
  hits in production docs (test plans and historical results retain
  /trigger by design)
- Cross-references resolve to existing anchors

Plan: ~/.claude/plans/untether-you-are-running-rustling-shannon.md
(also staged in .untether-outbox/v0.35.3-doc-audit-plan.md)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.2 to 4.35.3.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@95e58e9...e46ed2c)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…) + local-context protection (#479)

* fix(security): claude runner.start no longer leaks prompt at INFO (#478)

The Claude runner's run_impl override at src/untether/runners/claude.py
had its own duplicate runner.start log call that was missed when the
base runner was fixed for #205. Every Claude session emitted
`prompt=prompt[:100] + "…"` at INFO level — leaking the first ~100
chars of the Untether preamble (boilerplate, but spec-violating).
Discovered during the v0.35.3 follow-up E2E pass.

Fix mirrors the base runner impl:
- INFO `runner.start`: only `engine`, `resume`, `prompt_len`, `args`
- DEBUG `runner.start_prompt`: preview of first 100 chars (opt-in)

Argv redaction also tightened:
- env -i KEY=VAL pairs redacted via redact_env_i_args (was already
  applied at subprocess.spawn but not at runner.start, so e.g.
  BWS_ACCESS_TOKEN, GEMINI_API_KEY values would land in INFO logs)
- Legacy-mode (no permission_mode) `-- <prompt>` tail collapsed to
  `-- <prompt redacted>` so prompt content never reaches INFO under
  any code path

2 new regression tests cover both control-channel and legacy modes:
- test_runner_start_does_not_log_prompt_at_info
- test_runner_start_redacts_legacy_mode_prompt_in_args

Closes #478.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(faq): add docs/faq/index.md for help-centre FAQPage schema (#477)

Marketing-site infra (FAQPage extractor on
`feature/help-seo-geo-items-1-4` in littlebearapps/littlebearapps.com)
already extracts question-shaped H2s and emits Schema.org FAQPage
JSON-LD on any help article with `category: faq` frontmatter or ≥3
question-shaped H2s. No tool currently has a dedicated FAQ scaffold;
this commit closes the loop for Untether.

The new file lives at docs/faq/index.md (Diátaxis-aligned scaffold —
plain title + description frontmatter, marketing-site sync injects
category/tool/dates). 12 question-shaped H2s exceed the 7-minimum
acceptance criterion:

  1. What is Untether?
  2. How do I install Untether?
  3. Which AI coding agents does Untether support?
  4. Do I need an API key to use Untether?
  5. Where does my code and data go?
  6. How do I approve tool calls from my phone?
  7. What happens if my agent crashes or my phone loses signal mid-run?
  8. How do I keep agents from spending too much money?
  9. Can I send voice notes instead of typing?
  10. How do I update Untether?
  11. How do I uninstall Untether?
  12. Where can I get help or report a bug?

Each answer is a complete paragraph (no TODO / placeholder), sourced
from README + real common-channel topics. Cross-links to existing
help-guide URLs preserve nav chains.

Coordinated mapping in `littlebearapps/littlebearapps.com`
(`scripts/docs-sync.config.ts` → add `untether` → `docs/faq` →
`category: faq`) is a separate one-line PR per the issue's
"Coordinated mapping" section. Once both land, the next nightly sync
surfaces the FAQ at <https://untether.littlebearapps.com/help/untether/faq/>
with a visible `<script type="application/ld+json">` FAQPage block,
unlocking AI-citation surface (ChatGPT, Perplexity, Google AI
Overviews) and SERP rich-snippet eligibility.

Closes #477.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ctx: protect docs/faq/index.md from deletion + register in local docs (#477)

The FAQ doc is part of the marketing-site FAQPage Schema.org pipeline
(littlebearapps/littlebearapps.com:scripts/docs-sync.config.ts → untether
→ category: faq). Removing it silently breaks the docs-sync mapping and
regresses AI-citation surface. This commit hardens local Claude Code
context so the file:

  - cannot be silently deleted, moved, or truncated by accident
  - has explicit guidance on when/how to update it during releases
  - is registered in CLAUDE.md so future contributors know it exists

Changes:

* `.claude/hooks/help-faq-protect.sh` (new) — PreToolUse Bash hook
  blocking `rm`, `git rm`, `mv`-away, and shell `>` truncation
  targeting `docs/faq/index.md`. Edits via Edit/Write/append `>>` are
  intentionally allowed — the FAQ is meant to evolve. Smoke-tested
  with 7 synthetic inputs covering both deny and allow paths.

* `.claude/hooks/release-guard-protect.sh` (updated) — also protects
  `help-faq-protect.sh` from being weakened or removed via Edit/Write.

* `.claude/hooks.json` (updated) —
  - registers help-faq-protect.sh under PreToolUse Bash
  - extends the existing Edit/Write context-prompt with a docs/faq/*
    branch (HELP-FAQ CONTEXT) reminding contributors of question-shape
    rules and the maintain-as-features-land cadence
  - extends the version-bump-checklist (PostToolUse) with an FAQ
    touch-up step

* `.claude/rules/help-faq.md` (new) — auto-loads when editing
  `docs/faq/**`. Documents the hard rules (NEVER delete; MUST update
  with feature changes), soft conventions (question-shaped H2, ≥7
  Q/A, real behaviour not aspirational), and the release-cadence
  workflow.

* `.claude/rules/release-discipline.md` (updated) — adds an FAQ
  touch-up step to the version-bump checklist.

* `CLAUDE.md` (updated) —
  - new "Help-centre FAQ" section after "Documentation screenshots"
    explaining the file's role and the no-deletion rule
  - Hooks table registers `help-faq-protect`
  - Rules table registers `help-faq.md`

Refs #477.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps pre-release version so TestPyPI can publish a fresh wheel that
includes the v0.35.3 follow-up bundle merged via PR #479:
  - fix(security): claude runner.start no longer leaks prompt at INFO (#478)
  - docs(faq): add docs/faq/index.md for help-centre FAQPage schema (#477)
  - ctx: protect docs/faq/index.md from deletion + register in local docs (#477)

The rc6 wheel on TestPyPI predates this work — without the bump the
publish step skips ("File already exists") and the staging upgrade path
keeps installing the older wheel.

Per release-discipline.md, pre-release versions don't require a
CHANGELOG entry (validate_release.py skips them) and aren't tagged
(auto-tag-on-master.yml skips pre-releases).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#481) (#484)

Two coordinated fixes that share the same `progress_edits.stall_detected`
decision branch in `runner_bridge.py`. Reproduction: a 45-min Claude
session on staging looked hung — 10-min Cloudflare deploy poll + 14-min
approval-keyboard wait kept the chat silent, then surfaced unhelpful
stall warnings during legitimate waits.

#470 — Post-result stall suppression + closing message
- New `progress_edits.stall_post_result_suppressed` info log when
  `stream.last_event_type == "result"` and the post-result idle
  watchdog (#333) is the legitimate owner of the silence
- Auto-cancel `_STALL_MAX_WARNINGS` arm gated by the same boolean —
  no more SIGTERM'ing sessions that are about to gracefully close
- Watchdog stamps `ClaudeStreamState.post_result_closed_at` before
  `aclose()`; bridge's heartbeat tick sends a one-shot
  `✓ turn complete · session closed after Nm idle` message
  (idempotency via `post_result_closing_sent` flag)

#481 — Long-tool visibility + suppression matrix
- New `[progress] heartbeat_interval` (default 30 s) drives a tick
  inside `_stall_monitor` that bumps `event_seq` whenever any open
  action is older than 60 s, forcing a re-render with a fresh
  elapsed-time tail
- `format_action_line` gained `elapsed_seconds` kwarg; non-completed
  actions > 60 s render as `▸ Bash · 3m 47s · npm run build`,
  regardless of `/verbose` toggle
- `format_verbose_detail` gained `BashOutput` (renders last line of
  `result_preview` so polling loops show live stdout), `KillShell`,
  `ScheduleWakeup` (countdown + reason), and `Monitor` (countdown)
  branches
- `ActionState` gained `started_at` / `last_update_at` wall-clock
  fields populated from the new `ProgressTracker.clock` callable
- `MarkdownFormatter.render_progress_parts` / `MarkdownPresenter` /
  `Presenter` Protocol / `TelegramPresenter` all gained `now: float | None`
  threaded from `runner_bridge._run_loop`
- New `format_duration` / `format_countdown` helpers
- Five new suppression branches in `_stall_monitor`, gated by
  `not frozen_escalate` so genuinely-frozen sessions still warn:
  - stall_post_result_suppressed (#470)
  - stall_schedule_wakeup_suppressed (engine_state.live_wakeups)
  - stall_monitor_active_suppressed (engine_state.live_monitors)
  - stall_bash_grace_suppressed (new `[watchdog] bash_grace_seconds`,
    default 60 s)
  - stall_long_bash_suppressed (BashOutput within stall_threshold/2)

Bonus fix: `_register_background_handle` now reads `delaySeconds` first
(per upstream Claude Code schema, #289) instead of only `delay_ms` —
production deadlines were always 0.0, breaking countdown rendering.
Backward-compat fallback to `delay_ms`/`timeout_ms` preserved.

structlog WARN events at runner.py and runner_bridge.py are unchanged
so untether-issue-watcher and ops dashboards continue to receive the
underlying signals — only the chat-side surfacing decision changed.

Tests: 32 new (11 in test_exec_bridge.py for suppression branches,
auto-cancel gating, frozen-ring precedence, closing-message
idempotency, heartbeat countdown mutation; 3 in test_claude_runner.py
for delaySeconds + post-result state init; 18 in test_verbose_progress.py
for new tool detail branches, format_duration helpers, long-running
tail). Full suite: 2548 passed, 82.26% coverage.

Integration tests: U3 (basic Claude Code) passes cleanly via
@untether_dev_bot — 33 s run, zero stall warnings, "✓ turn complete"
footer rendered. Long-running BashOutput-polling and 30-min
genuinely-frozen tests deferred to staging dogfood.

Out of scope / known constraints:
- Strict 5 s rolling Bash stdout sub-line is not achievable without
  upstream Claude Code interim tool_result deltas. The BashOutput
  polling path is the proxy and refreshes at each polling cycle
  (~15 s in practice).
- ScheduleWakeup countdown rendering depends on #289 (`/loop`
  interception) for the timer to actually fire; suppression of stall
  warnings while a wakeup is pending works today.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(loop): add LoopSettings + EngineOverrides.loop_enabled (#289)

Foundation for /loop and ScheduleWakeup support — Untether-side
observation of Claude Code's session-scoped scheduling tools so
loops keep firing after the subprocess exits.  Default OFF — opt-in
per-chat via /config → 🔁 Loop mode.

src/untether/settings.py — new LoopSettings model:
  enabled (default false), inline_threshold_seconds (300),
  redundancy_check_interval (30), max_iterations (20),
  max_total_duration_hours (4), min_interval_seconds (60),
  expiry_days (7).  Cost limits stay in [cost_budget] —
  the caps in [loop] are runaway-safety only.

src/untether/telegram/engine_overrides.py — new loop_enabled field
on EngineOverrides struct, threaded through normalize_overrides()
and merge_overrides() following the existing budget_enabled pattern.
LOOP_SUPPORTED_ENGINES = frozenset({"claude"}) — Claude-only since
other engines don't expose CronCreate / ScheduleWakeup.

Tests: 7 new in test_settings.py (defaults, TOML round-trip, bounds,
unknown-key rejection); 5 new in test_telegram_engine_overrides.py
(default None, merge topic/chat priority, ChatPrefsStore round-trip,
LOOP_SUPPORTED_ENGINES constant).  76 tests pass across the changed
files.

Empirical pre-work in this session:
  Probe 4 + 4b — hanging tool_use(AskUserQuestion) does NOT cause
  catastrophic resume behaviour; outcome (c) confirmed.  Drops the
  consensus-mandated interactive-state gate from PR1 scope.
  Probe 5 — CronCreate uses field "cron" (not "cron_expression");
  CronDelete takes id; CronList renders one entry per line as
  "<8hex> — <human-schedule> (recurring|one-off) [session-only]: <prompt>".
  Dispatcher rename — Telegram management surface will be /loops
  (PLURAL) so /loop (singular) keeps passing through to Claude;
  the dispatcher in telegram/loop.py:2256–2300 matches first-word
  only and either fully intercepts or never.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(loop): add is_session_alive helper to claude runner (#289)

loop_scheduler._fire (PR1) needs a cheap "is the subprocess for this
session_id currently running?" check before firing a loop iteration.
Spawning claude --resume against an alive subprocess would race the
in-flight turn and almost certainly violate session locking.

src/untether/runners/claude.py — new module-level is_session_alive(sid)
that reads membership of the existing _SESSION_STDIN registry.  The
registry is populated when a runner spawns its subprocess and cleared
in the run_impl finally block, so membership is the canonical signal
of "subprocess is up right now."

Tests: 2 in test_claude_runner.py (membership round-trip with cleanup,
unknown session returns False).

* feat(loop): add loop_scheduler module with persistence + tests (#289)

Untether-side scheduler for /loop and ScheduleWakeup. Mirrors
at_scheduler.py shape: 4 install globals + _PENDING dicts + install/
uninstall API. Adds:

- _LoopEntry dataclass with fallback_first_user_message (text, not
  msg id — Gap 4 of the handover) for the <<autonomous-loop-dynamic>>
  sentinel fallback path.
- register_pending_cron / register_pending_wakeup / bind_upstream_id
  for the observer hooks (wired in a follow-up commit — this commit
  is foundation only).
- cancel_by_token / cancel_by_upstream_id / cancel_pending_for_chat
  with do-not-resume sentinel write on user cancel.
- _fire path with race-avoidance (is_session_alive lazy import),
  drop-on-busy, max-iterations / max-total-duration / 7-day expiry caps,
  re-issue prompt wrap "Loop iteration N: ... do the task now; do not
  summarize old results unless necessary." (Probe 3 + consensus).
- Generation counter + cancel_event so old _arm_timer tasks left over
  from a previous round detect they are stale and bail out instead of
  double-firing on the new round's scope.
- Atomic JSON persistence to active_loops.json (sibling to config) via
  utils.json_state.atomic_write_json. Restart resilience: past
  fire_at_wallclock fires immediately (no catch-up multiplier),
  cancelled entries skipped on reload, do-not-resume sentinel persists.
- Cron next-fire computation via existing triggers.cron.cron_matches
  (5-field expressions, 366-day horizon).

41 unit tests covering: install/uninstall lifecycle, registration
(cron + wakeup with sentinel fallback), upstream-ID binding,
cancellation paths, inspection helpers, cron parsing edge cases,
fire path (cancelled / max-iter / do-not-resume / busy / race-alive /
success / sentinel-fallback / one-shot expiry), persistence round-trip,
restart resume + skip-cancelled, do-not-resume across restart, corrupt
file handling, persistence-disabled mode.

Coverage of loop_scheduler.py: 84% (above 80% threshold).

NOT WIRED YET — observers in runners/claude.py and drain integration in
telegram/loop.py land in subsequent commits per the v0.35.4 PR1 plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(loop): observe CronCreate / ScheduleWakeup / CronDelete in claude runner (#289)

Wires the loop_scheduler module into the JSONL stream-translation path.
Observers run as siblings of (not replacements for) the existing
_register_background_handle / _clear_background_handle hooks at lines
~1028 and ~1090.

Changes:

- src/untether/runners/run_options.py: add `loop_enabled: bool | None`
  to `EngineRunOptions` so the per-chat /config → 🔁 Loop mode toggle
  can short-circuit observers via the existing run-options contextvar.
- src/untether/telegram/loop.py: plumb `loop_enabled` from merged
  EngineOverrides into the resolved EngineRunOptions.
- src/untether/runners/claude.py:
  - `ClaudeStreamState.first_user_message_text` (str | None) — populated
    from the `prompt` arg in `new_state` so loop entries can fall back
    to it when ScheduleWakeup observes the
    `<<autonomous-loop-dynamic>>` sentinel (Probe 3 result).
  - `_loop_enabled_for_chat(chat_id)` — resolves per-chat run-options
    override → global `[loop] enabled` → False fallback. Sync (no async
    prefs lookup; the contextvar is set upstream by executor.py).
  - `_observe_loop_tool_use(state, content)` — handles CronCreate /
    ScheduleWakeup / CronDelete tool_use blocks. Uses the canonical
    field names (`cron`, not `cron_expression`; `id`, not `taskId`)
    confirmed by Probe 5. Skips ScheduleWakeup when `delaySeconds` is
    at or below `[loop] inline_threshold_seconds` so short waits stay
    rendered live by the rc8 countdown.
  - `_observe_loop_tool_result(state, tool_use_id, content)` — parses
    `\bjob ([0-9a-f]{8})\b` from CronCreate result text and binds the
    upstream cron ID via `loop_scheduler.bind_upstream_id`.
  - Calls wired at the existing tool_use / tool_result decode sites
    inside `translate_claude_event`. Master-toggle gate sits at the
    top of the observers so OFF behaviour is identical to today.
- tests/test_claude_runner.py: new `TestLoopObservation` class (10
  tests) covering chat-id-unset no-op, master-toggle off, CronCreate
  registration, `cron` vs `cron_expression` field precedence, missing
  prompt rejection, ScheduleWakeup above/below threshold, CronDelete,
  upstream-ID binding, and `_loop_enabled_for_chat` resolution. Plus
  one sync test for `first_user_message_text` capture in `new_state`.

All 2615 tests pass. Loop_scheduler observer wiring is now live —
PR1 still default OFF; per-chat toggle UI lands in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(loop): add /config:loop sub-page + home-page button (#289)

The Loop mode toggle is the user-facing master gate for /loop and
ScheduleWakeup observation.  Default OFF — opt-in per chat with an
explicit cost+quota warning before turning ON.

- New `_page_loop()` mirroring `_page_planmode()` shape: tri-state
  per-chat override (On / Off / Clear → fall back to global
  `[loop] enabled`), HTML body explaining behaviour ON vs OFF, "💰 Set
  a budget" deeplink to `config:cu` for one-tap budget setup before
  enabling.
- Engine-aware: only renders for `LOOP_SUPPORTED_ENGINES = {claude}`;
  shows "Only available for Claude Code" message on other engines.
- Home page (Claude only): replace the previous Plan-mode + Engine
  layout to slot in `🔁 Loop mode` next to `📡 Listen`, push
  `⚙️ Engine & model` next to `🧠 Effort`, and break `ℹ️ About` onto
  its own row.  Codex / OpenCode / Pi / Gemini / AMP home pages are
  unchanged — no `config:loop` callback rendered.
- Toast labels for `loop:on`/`loop:off`/`loop:clr` callbacks so
  early-answer dispatch shows confirmation immediately.
- 7 new tests in `TestLoopMode`: page renders with toggle + cost
  warning + budget deeplink, hidden for non-Claude, set-on returns
  home, clear resets per-chat override, no-config-path branch,
  home-page button visibility (Claude vs Codex).

All 240 config_command tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(loop): drain integration + /cancel + /new wiring (#289)

Safety-critical wiring so loops survive shutdown cleanly and respond to
user-initiated cancellation.

- src/untether/telegram/loop.py:
  - Install `loop_scheduler` immediately after `at_scheduler`.  Resolve
    `state_path` from `cfg.runtime.config_path.with_name("active_loops.json")`
    so loop state is persisted alongside `last_update_id.json` and
    `active_progress.json`.
  - Wire an `is_chat_busy(chat_id)` callable that scans `running_tasks`
    for refs in the chat — `loop_scheduler._fire` consults it to drop
    iterations when the chat already has a run in flight (mirrors
    upstream's "no catch-up" semantic).
  - Drain integration: `_drain_and_exit` now logs `pending_loops` from
    `loop_scheduler.active_count()` alongside `pending_at`.  The
    task-group cancel propagates into `_arm_timer` sleeps cleanly via
    the cancel-event primitive added in Commit A.
- src/untether/telegram/commands/cancel.py:
  - `handle_cancel` now also drops pending /loop entries for the chat
    when there's no specific reply target.  Reports
    "❌ cancelled N active loops" alongside the existing /at handling.
  - `cancel_pending_for_chat` writes the do-not-resume sentinel for
    each cancelled loop's session_id (handover default — block only
    `loop_scheduler --resume`, NOT `/continue`).
- src/untether/telegram/commands/topics.py:
  - `_cancel_chat_tasks` (called by `/new`) drops loop entries too so
    the "wipe a chat's state" semantics are complete.

All 2622 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(loop): document Loop mode + cost interaction (#289)

Five doc files updated as the user-facing surface for Loop mode (default
OFF, opt-in per chat).

- docs/how-to/schedule-tasks.md:
  - New intro callout below H1 stating Loop mode is opt-in and pointing
    to the new section.
  - New "## Loop mode" section between /at and Telegram scheduling
    explaining the observe-and-fire-on-resume architecture, runaway
    caps, cost considerations (cache-warm vs cold per-fire ranges),
    cancel + persistence semantics.
- docs/how-to/cost-budgets.md:
  - Warning callout after "Per-chat overrides" — loop fires count
    toward the same daily/per-run caps; set a budget BEFORE turning
    Loop mode on.
- docs/how-to/troubleshooting.md:
  - New "Loop didn't fire / loop fired too many times" symptom table:
    toggle off, max_iterations, daily_budget_exceeded, "fresh user
    turn" expected behaviour, stale active_loops.json, restore failures.
- docs/faq/index.md:
  - New H2 "Does /loop work via Untether?" answering the most-asked
    expected question. Verifies against .claude/rules/help-faq.md:
    13 H2s (above floor of 7), all question-shaped, no TODOs.
- docs/reference/config.md:
  - New `[loop]` section between `[watchdog]` and `[auto_continue]`
    documenting all 7 config keys plus the explicit "cost limits are
    NOT in [loop]" pointer to [cost_budget].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: changelog entry for /loop + ScheduleWakeup support (#289)

v0.35.4 (unreleased) entry summarising the multi-commit Loop-mode work
landed under #289.  Validation passes (pre-release suffix on
pyproject.toml means validate_release.py skips the strict checks; the
entry is forward-looking for the eventual stable release).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: re-target loop-mode PR to v0.35.3rc9 (#289)

Per Nathan's correction — the /loop and ScheduleWakeup work lands
inside the v0.35.3 milestone train as the next staging rc
(0.35.3rc9), not v0.35.4 as the original handover suggested.  Issue
#289 was already correctly milestoned to v0.35.3 on GitHub.

- pyproject.toml: 0.35.3rc8 → 0.35.3rc9
- uv.lock: re-synced
- CHANGELOG.md: fold the loop-mode entries from a forward-looking
  v0.35.4 (unreleased) block into the existing v0.35.3 (unreleased)
  block (### changes + ### docs subsections)
- docs/how-to/schedule-tasks.md: drop the stray "pre-v0.35.4" version
  string from the intro callout (use "prior-version baseline" instead
  so the prose doesn't drift on each rc)

No code or test changes — full suite still 2622 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: unblock dev CI — ruff SIM300 + new pip CVE ignore

Two pre-existing CI failures already on dev's last run (acb6ec0).
Both fixes are tiny and unrelated to loop scope:

- tests/test_telegram_engine_overrides.py:235 — apply ruff's suggested
  rewrite of the SIM300 Yoda-condition assertion (semantically
  identical; literal on the left now).
- .github/workflows/ci.yml:210 — add CVE-2026-6357 to the pip-audit
  ignore list.  pip 26.0.1 has the CVE; fix is pip 26.1 which the uv
  tooling hasn't pulled yet.  Sibling of the existing CVE-2026-3219
  ignore from the same audit pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dependabot Bot and others added 23 commits May 12, 2026 03:56
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.3 to 4.35.4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@e46ed2c...68bde55)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…516)

Closes the rc11/rc12 over-correction on #508 that produced 25k–42k char
(~8–12 Telegram message) finals on staging plan-mode research/audit
runs. User report (Nathan, 2026-05-12): "I had a summary from Claude
Code yesterday which was 11 Telegram messages long!! What I really
want back is to have Claude Code provide summaries like we have here
in command line — summaries of plans (not the entire plan), summaries
of recommendations and/or findings and/or next steps (where relevant)."

Three stacked over-shoots in rc11/rc12:

1. A1 preamble: "expand the bullets into a substantive summary" for
   research/audit → plan body ballooned to 2–5k chars.
2. A2 preamble: "your next assistant message ... MUST repeat the
   substantive findings" → post-approval text ballooned to 0.5–2k
   chars AND was paraphrased rather than literal-copied.
3. Layer E: substring-skip rule (body in final_answer) failed on every
   paraphrased run, so the plan body was unconditionally concatenated
   in front of the post-approval text.

Evidence from `journalctl --user -u untether.service` (last 48h on
staging @hetz_lba1_bot v0.35.3rc12): aushistory finals at 14k / 16k /
28k / 35k / 42k chars; scout finals at 26k / 27k chars. The 42k case
matches the 11-message user repro. Telegram MCP `search_messages` for
the literal "📋 Plan (approved):" returned hits on every recent
plan-mode completion in both chats — confirming Layer E was the
load-bearing over-firer.

rc13 retuning:

- A1 → "concise 3–5 bullet summary; plan is shown for approval, not
  as the final deliverable" (drops the substantive-expansion license).
- A2 → "brief CLI-style summary, 3–7 bullets or 1–2 short paragraphs,
  ~500–1500 chars, do NOT re-paste the full plan content".
- A3 (## Summary Plan/Document Created bullet) → "Path AND a 3–5
  bullet headline summary, not a re-paste of the full content". Note:
  A3 affects the ## Summary block on ALL completed work, not just
  plan-mode runs — intentional, matches user's stated goal.
- _prepend_exitplanmode_plan: substring check replaced with a length
  gate (`len(final_answer) < 600`). Substring check stays as a cheap
  belt-and-braces second skip. Plan body is capped at 1500 chars +
  truncation marker so a runaway body can't ship 30k chars even when
  Layer E does fire (preserves original #508 UX for genuinely empty
  post-approval results without re-introducing concatenation).

Live verification on @untether_dev_bot (test chat -5284581592):

- Primed test (with "keep it short" instruction): answer_len=882
  chars (~1 Telegram message), no "📋 Plan (approved):" literal.
- Unprimed test (default research-task prompt): answer_len=1019 chars
  — preamble is doing its job without user help. Layer E correctly
  skipped (1019 > 600). Quality verified: 3 substantive bullets +
  ## Summary block with Completed / Next Steps.

The original #508 fallback path (Claude exits with very short post-
approval text → Layer E fires with capped plan body) is unit-tested
only; not live-verified because the new preamble makes it almost
impossible to repro intentionally.

Tests: 7 new/updated in tests/test_preamble.py (regression-locks the
rc11 verbosity-driving phrases out of _DEFAULT_PREAMBLE, plus
length-gate / body-cap / substring-skip cases) and 2 in
tests/test_claude_runner.py (`test_translate_result_skips_prepend_
when_answer_substantive`, `test_translate_result_caps_long_plan_body_
when_prepending`). Full suite: 2652 passed, 2 skipped, 82.38%
coverage. ruff format + check clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the requirements on [uv-build](https://github.com/astral-sh/uv) to permit the latest version.
- [Release notes](https://github.com/astral-sh/uv/releases)
- [Changelog](https://github.com/astral-sh/uv/blob/main/CHANGELOG.md)
- [Commits](astral-sh/uv@0.9.18...0.11.13)

---
updated-dependencies:
- dependency-name: uv-build
  dependency-version: 0.11.13
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…x_cost_per_day (#522)

The rc13 audit observed a single Claude session burn ~$102 across 5
sub-runs on legal-librarian-local — each individual run was well under
any reasonable max_cost_per_run, but the session stacked them via
/continue. The config docs implied per-run + per-day were the only
ceilings without spelling out the stacking caveat.

Adds a Note callout under [cost_budget] that:
- Explicitly states sessions can stack many runs
- Cites the audit-observed $100+ session
- Recommends max_cost_per_day for cross-session ceilings
- Notes max_cost_per_session is not provided (file a feature request)

No code change. Closes #517 as docs-only per the rc13 audit handover plan.

Closes #517
session.summary was recording `last_event_type=control_request` (pass 1)
and `last_event_type=control_response` (pass 2) on ok=True completed runs.
Root cause at src/untether/runner.py:810 — `stream.last_event_type` was
written unconditionally from the raw JSONL `type` field, including the
permission-flow stdin/stdout traffic (Claude → Untether control_request,
and the parent-initiated mcp_status response on stdout).

Fix: skip the update when etype is in {"control_request",
"control_response"} so the field reflects the last *stream* event (result,
assistant, etc.). The auto-continue gate at runner_bridge.py:282 still
sees the raw "user" type because non-control events are unchanged.
`recent_events` deque still records control entries — useful for the
stall-diagnostic timeline that surfaced the bug in the audit.

Verified via @untether_dev_bot: plan-mode prompt → ExitPlanMode → approve
→ completion now produces `session.summary ... last_event_type=result
ok=True` (was `control_request`/`control_response` on rc13).

Closes #502
…LimitInfo (#521)

The rc13 audit observed `claude.rate_limit_event` log lines with
`retry_after_s=None cumulative_s=0.0` on every event — including during a
5-event burst on 'bip' that preceded a subscription-cap exhaustion across 3
chats. The chat then rendered the generic "⏳ Rate limited — waiting to
retry" with no actionable wait time.

Root cause: the Claude CLI emits two shapes of `rate_limit_event`:

1. Full form with `retry_after_ms` (covered by existing tests).
2. Bare/reset-window form with `requests_reset` / `tokens_reset` ISO
   timestamps but no `retry_after_ms`. This is what subscription-cap
   throttles emit per `docs/reference/runners/claude/stream-json-cheatsheet.md`.

Our handler at runners/claude.py only consumed `retry_after_ms`, so the
reset-window form fell into the "no retry hint" branch — `retry_after_s`
stayed None and `state.rate_limit_total_s` never accumulated, even though
the upstream payload contained actionable wait info.

Fix:
  - New helper `_derive_retry_after_s(info)` that picks the EARLIER of
    `requests_reset` / `tokens_reset` (the rate limit lifts as soon as
    either budget refills), clamped ≥ 0, robust to "Z" and `+00:00`
    suffixes, and returns None for unparseable / missing timestamps.
  - Translate path: when `retry_after_ms` is None but a parseable reset
    timestamp exists, fall back to the derived value. Track which path
    fed the field via a new `retry_after_source=retry_after_ms|reset_ts`
    log field so future audits can tell at a glance.
  - Enrich the structured log to include every present RateLimitInfo
    field (`requests_limit`, `requests_remaining`, `requests_reset`,
    `tokens_limit`, `tokens_remaining`, `tokens_reset`, `retry_after_ms`)
    under `info=...`. The previous one-field log gave no diagnostic
    surface; this lets the watcher and future audits see what upstream
    actually sent.

Out of scope for this PR (filed as follow-ups if needed): pre-emptive
budget warnings at 75/90% subscription usage. That's a larger feature
spanning the cost tracker and chat footer — better as a discrete change.

The two subscription-error message variants observed in the audit
("out of extra usage", "hit your limit") already map to the same friendly
hint via `error_hints.py:52-60`, so no work is needed on that front.

Tests:
  tests/test_claude_runner.py
    - test_translate_rate_limit_event_derives_retry_after_from_reset_ts:
      requests_reset 90s out → cumulative accumulates ~90s
    - test_translate_rate_limit_event_prefers_earlier_reset_when_both_present:
      with both reset fields, pick the earlier one
    - test_translate_rate_limit_event_retry_after_ms_takes_precedence:
      explicit retry_after_ms wins over derived reset_ts
    - test_translate_rate_limit_event_handles_unparseable_reset_ts:
      garbage timestamp silently ignored, falls back to generic copy
    - All four existing tests still pass (no regression)

Verified:
  uv run pytest tests/test_claude_runner.py tests/test_claude_schema.py
    -x --no-cov  → 107 passed
  uv run ruff format/check, uv lock --check — clean
  systemctl --user restart untether-dev — dev service comes up cleanly

Closes #518
…-aware stall message (#520)

Three sub-fixes addressing the rc13 audit's "20-min ExitPlanMode approval-wait
peak_idle" findings, all bundled in one PR because they share `_watchdog_loop`
and the stall-monitor render path.

A. `liveness_stalls` field on session.summary
   The audit observed `stall_warnings=0` despite `subprocess.liveness_stall`
   firing — by design: `_total_stall_warn_count` is the user-facing-threshold
   counter (runner_bridge.py:1143), `subprocess.liveness_stall` is the
   subprocess-health canary in the watchdog loop (runner.py:1023). Conflating
   them would break the user-facing invariant. Added `JsonlStreamState.
   liveness_stalls: int` (0 or 1 today — `liveness_warned` latches after the
   first warning; kept as int for forward-compat) and surfaced it as a new
   `liveness_stalls=` field in the session.summary log.

B. Populate `prev_diag` baseline so `cpu_active` is bool, not None
   `prev_diag` was initialised to None and only assigned *after* the
   one-shot warning fired, so `is_cpu_active(None, diag)` always returned
   None on the audit-observed warning. Take a baseline snapshot on the first
   successful poll instead.

   SEMANTICS CAVEAT: the auto-kill check at runner.py:1039 is
   `cpu_active is not True`. Today None always satisfies that, so the
   auto-kill path triggers (combined with tcp_established == 0). After
   this fix, `cpu_active` is an accurate bool: still-active processes
   return True (skip kill); genuinely-idle ones return False (kill, same
   as before). Auto-kill becomes more accurate, not more aggressive.
   No tests assert on the None-triggers-kill path (verified via
   grep for `_stall_auto_kill = True` in tests/).

C. Approval-aware stall message
   `threshold_reason = "pending_approval"` is already computed for the
   threshold selection (runner_bridge.py:1110) but was never used in the
   message-assembly block, so users saw the same generic "No progress for
   N min — session may be stuck" copy that genuine hangs produce. Added a
   new branch above the `mcp_server is not None` arm with copy
   "⏳ Awaiting your approval ({mins} min)", excluded pending_approval
   from `_genuinely_stuck`, and lifted `_tool_name = None` initialisation
   to the top of the message block to fix a latent UnboundLocalError that
   would have hit other branches if `_tool_name` were accessed before the
   final `else:`.

Tests:
  tests/test_exec_runner.py
    - test_jsonl_stream_state_defaults: asserts liveness_stalls == 0 default
    - new test_liveness_stall_increments_counter: drives a real subprocess
      past _LIVENESS_TIMEOUT_SECONDS=0.2, asserts liveness_stalls==1 (A)
      and cpu_active is bool (B) via structlog.testing.capture_logs
  tests/test_exec_bridge.py
    - test_stall_fires_after_approval_threshold updated to assert the
      message contains "Awaiting your approval", NOT "No progress" or
      "session may be stuck"

Verified locally:
  uv run pytest tests/test_exec_runner.py tests/test_exec_bridge.py
    tests/test_proc_diag.py -x --no-cov  → 255 passed, 2 skipped
  Integration via @untether_dev_bot: session.summary now emits
    `liveness_stalls=0` for a normal codex run (field wired correctly).
    Full liveness-watchdog fire requires a 60s+ idle wait (config min)
    or 30-min approval threshold — covered by the unit test that drives
    a real subprocess with tight timing.

Closes #494
Bundle merged this cycle on dev:
- fix: #502 — skip control-channel events from last_event_type (#519)
- fix: #494 — liveness_stalls counter, cpu_active reliability, approval-aware stall message (#520)
- fix: #518 — derive retry_after_s from reset timestamps, log full RateLimitInfo (#521)
- docs: #517 — note cumulative session cost is not capped, recommend max_cost_per_day (#522)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings nsd, channelo, and the Mac into the same operational fold as the
lba-1 staging host. Adds parallel single-stage rollout/rollback scripts
gated by integration-test attestation, ports the issue-watcher daemon
to all 4 hosts, and updates rules + skill + hooks to enforce the new
workflow.

Highlights:

- contrib/untether-issue-watcher.service — systemd template with __HOST__
  placeholder for nsd/channelo deploy. Header documents the deploy steps
  including the chmod +x lesson (scp doesn't preserve the executable bit).
- contrib/com.littlebearapps.untether-issue-watcher.plist — macOS launchd
  agent using __USER__ placeholder. Defaults to Plan B (Unified Logging
  via `log stream`) so it doesn't require modifying Untether's own
  launchd plist.
- scripts/fleet-rollout.sh — single-stage parallel upgrade across all 4
  hosts. Auto-detects per-host install manager (uv tool vs pipx),
  enforces integration-test attestation gate, emits a heartbeat every
  15s during long parallel installs, writes per-host ok/failed status
  back to ~/.untether-dev/fleet-rollout-state.json as branches finish.
- scripts/fleet-rollback.sh — same parallel pattern, no attestation gate.
- scripts/run-integration-tests.sh — writes the per-version attestation
  marker after a successful Telegram MCP integration test run against
  @untether_dev_bot. Manual-mode only this session; auto-orchestration
  reserved as a future enhancement.
- .claude/rules/release-discipline.md — new "Pre-rollout integration
  test attestation" subsection + new "Fleet rollout (rc and stable)"
  section documenting the parallel workflow and rc-supersede semantics.
- .claude/skills/release-coordination/SKILL.md — inserts Phase 5.5
  (attestation marker write) and rewrites Phase 6 as parallel fleet
  rollout instead of single-host staging dogfood.
- CLAUDE.md — 3-phase release workflow now references
  scripts/fleet-rollout.sh; auto-filed sources note multi-host
  watcher coverage.
- .claude/hooks.json — version-bump checklist gets a new FLEET ROLLOUT
  block (steps 8-9) reminding about attestation + fleet-rollout.sh.
- .gitignore — adds docs/prompts/ and docs/research/ (mirror of
  docs/plans/ pattern for working docs that stay local).

Strategic plan: docs/plans/2026-05-13-fleet-monitoring-and-upgrades.md

This branch is the implementation of the strategic plan's Phase 1-4.
Phases 2 (per-host /monitor configs) and 3 (untether-fleet meta-target)
live as local-only configs under ~/.config/monitor/ — those are not
project-tracked but referenced in CLAUDE.md and the rules.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #537 landed the multi-host fleet rollout scripts and updated CLAUDE.md
+ release-discipline.md + release-coordination skill to cover the new
4-host parallel workflow. This patch fills the cross-references the
remaining surfaces were missing:

- docs/reference/dev-instance.md — note lba-1 staging is one of 4 hosts
- docs/reference/integration-testing.md — replace step 14 ("commit, tag,
  release") with the attestation-marker + fleet-rollout sequence
- .claude/rules/dev-workflow.md — add a "Multi-host fleet rollout"
  section pointing to the scripts and release-discipline.md

Pure cross-references; no behaviour changes, no duplication of the
canonical fleet-rollout content (which lives in release-discipline.md).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI pip-audit job has been failing on every PR with:
- CVE-2026-44431 (urllib3)
- CVE-2026-44432 (urllib3)

Both fixed in urllib3 2.7.0. urllib3 is transitive via requests (pulled
in by zensical/mkdocstrings-python in the docs group), not a direct
runtime dependency — bumping via `uv lock --upgrade-package urllib3`
keeps the impact surgical: lockfile-only change, no pyproject.toml edit
needed.

Verified locally: only urllib3 changes in uv.lock; pip-audit should now
pass.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* deps: bump urllib3 2.6.3 → 2.7.0 to clear pip-audit CVEs

CI pip-audit job has been failing on every PR with:
- CVE-2026-44431 (urllib3)
- CVE-2026-44432 (urllib3)

Both fixed in urllib3 2.7.0. urllib3 is transitive via requests (pulled
in by zensical/mkdocstrings-python in the docs group), not a direct
runtime dependency — bumping via `uv lock --upgrade-package urllib3`
keeps the impact surgical: lockfile-only change, no pyproject.toml edit
needed.

Verified locally: only urllib3 changes in uv.lock; pip-audit should now
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: #497 — debounce catalog.refresh_sent to prevent storms

The opt-in [watchdog] notify_catalog_refresh path (#365) enqueued one
mcp_status control_request on every tool_result batch with no minimum
interval. The 2026-05-09 staging audit observed this firing 183 times
in a single ~18 min Claude run on the 'scout' project — a "storm" that
floods the runner's stdin and Claude Code's catalog-status query path.

Adds a per-session monotonic-clock gate in translate_claude_event's
StreamUserMessage arm, controlled by the new
WatchdogSettings.catalog_refresh_min_interval_s (default 5.0 s,
range 0–60 s; 0 disables and restores pre-#497 behaviour).

Test changes:
- test_tool_result_queues_mcp_status_when_notify_enabled now drives
  time.monotonic() past the debounce window between the two queue
  assertions so the existing semantic still holds.
- New test_tool_result_debounces_back_to_back_batches reproduces the
  scout storm conditions (10 batches 100 ms apart yield exactly 1
  refresh).
- New test_tool_result_debounce_disabled_with_zero_interval confirms
  the off-switch.

CHANGELOG: appended under v0.35.3 ### fixes (above the existing rc14
session.summary entry, after the rc14 rate_limit_event entry).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) (#542)

Sweep the v0.35.3 milestone close-out:

- #501: claude-schema accepts dict tool_result.content (1ccd73d)
- #498: config.loaded INFO → DEBUG demote (07261df)
- #502: skip control-channel events from last_event_type (ff1a9ab)
- #522: docs note for cumulative session cost — closes #517 + addresses
  the cost-outlier monitor family (#491, #492, #493, #504) (722dd25)
- #530: ops note for the local monitor-config Mac substrate fix
  (out-of-repo configs at ~/.config/monitor/)
- #404: tests note for the runtime-built Basic auth header (84f7f02)

Inserts placed at the end of each section so they don't conflict with
PR #541's pending additions at the top of the v0.35.3 ### fixes block
(rc14 rate_limit_event, rc14 catalog.refresh_sent debounce, rc14
session.summary, rc12 ExitPlanMode prepend).

No code change. Pre-release validator skips changelog validation for
0.35.3rc14, so this entry will be exercised when the rc15 chore lands.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps pyproject.toml from 0.35.3rc14 → 0.35.3rc15 so CI publishes the
post-rc14 dev-branch changes to TestPyPI:

- #539 deps: bump urllib3 2.6.3 → 2.7.0 to clear pip-audit CVEs
- #538 docs: cross-reference fleet rollout from staging/integration docs
- #541 fix: #497 debounce catalog.refresh_sent to prevent storms
- #542 docs(changelog): complete v0.35.3 entries (#404 #498 #501 #502 #522 #530)

Pre-release (rc) — no CHANGELOG entry required; validate_release.py skips
pre-release versions. Stable v0.35.3 changelog already covers the included
fixes via PR #542.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…35.3rc16) (#545)

The rc11 #507 fix added `live_wakeups_arm_delay: dict[str, float]`
populated in `_register_background_handle` and read in
`_post_result_idle_watchdog` to shorten the 600 s post-result idle
timeout to `max_armed_delay + 60 s` when /loop is OFF. But the dict was
wiped by `_clear_background_handle` on the ScheduleWakeup tool_result —
which is the schedule-confirmation, NOT a terminal signal — so by the
time the watchdog ticked (after the `result` event, which lands AFTER
tool_result) the dict was empty and the dead-wakeup shortcut never
engaged.

Live impact: channelo VPS auditor-toolkit session d11739ee-… on rc15,
24+ min hold-open with `pending_wakeup=False` despite
`last_action='tool:ScheduleWakeup (done)'`.

Fix: replace the per-tool_id dict with
`ClaudeStreamState.last_schedule_wakeup_arm_delay: float | None` —
a per-turn scalar high-water-mark (`max` semantics for multi-wakeup
turns) that survives `_clear_background_handle` and resets on each
fresh user prompt (`StreamUserMessage` with non-tool_result content;
mixed batches preserve the scalar so a tool turn still in flight
doesn't lose state).

The original #507 unit tests directly seeded `live_wakeups_arm_delay`
and bypassed `_clear_background_handle`, which is why the rc11 fix
appeared green in CI but failed on channelo rc15 in production. 4 new
tests in `tests/test_claude_runner.py` cover the full
tool_use → tool_result → result lifecycle (does NOT bypass
`_clear_background_handle`), multi-wakeup max selection, new-turn
reset, and the mixed-batch edge case. The two existing #507 tests now
seed the scalar instead of the dict.

The broader background-task-lifecycle refactor (terminal-vs-arm signal
per primitive + deadline-expiry sweeps) tracked in #374 stays in
v0.35.4. The sibling defect where the 600 s safety-net watchdog
silently doesn't fire stays in #333 for v0.35.4 pending entry/exit
instrumentation.

Refs #507, #374, #333.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mentation + #347 live_bg_bashes scalar (v0.35.3rc17) (#549)

Channelo VPS on rc16 (which already ships the #544 ScheduleWakeup
arm-delay scalar fix) hit a 43+ min post-result hang on session
`b5c1c3e0-…` with `pending_wakeup=False` — NO ScheduleWakeup involved,
so the #544 fix doesn't apply.

Diagnostic ruled out 3 of 4 #333 candidates via logs + live `py-spy
dump --pid 1127`:
- `post_result=True` in stall logs → `result_received_at` IS set
- `post_result_idle_enabled` defaults True; channelo `[watchdog]`
  config doesn't override
- Subprocess + children still alive → `reader_done` NOT set

The 4th candidate ("task crashed silently / never started") can't be
discriminated without entry/exit instrumentation. The rc16 changelog
deferred #333 to v0.35.4 pending instrumentation — rc17 lands it now
and overrides that deferral.

Instrumentation (`ClaudeRunner._post_result_idle_watchdog`):

- `claude.post_result_idle.task_started` at entry (session_id,
  timeout_s, poll_interval_s)
- `claude.post_result_idle.tick` every iteration (armed, elapsed_s,
  effective_timeout_s, dead_wakeup, pending_requests, pending_asks,
  would_close, last_bg_bash_launched_at_age_s,
  last_schedule_wakeup_arm_delay)
- `claude.post_result_idle.tick_error` (warning + exc_info) on
  transient per-tick failures with one-interval backoff
- `claude.post_result_idle.task_exited` in a guaranteed `finally`
  with reason ∈ {reader_done, stdin_closed, cancelled, loop_exited}

Per-tick `try/except` (not loop-wide) mirrors `_subprocess_watchdog`
(runner.py:1010-1079) and `_drain_catalog_refresh` (claude.py:2586)
so a transient error never cancels the sibling `_iter_jsonl_events`
task in the task group. Verbose by design — at 30 s poll × hours =
O(120) lines, trivial; rate-limiting now would create ambiguity in
the next reproduction.

`last_bg_bash_launched_at` scalar (sibling latent defect):

`_clear_background_handle` pops `live_bg_bashes` on tool_result,
mirroring the original #507 ScheduleWakeup defect that #544 fixed via
a scalar high-water-mark. New `ClaudeStreamState.last_bg_bash_launched_at:
float | None` is set in `_register_background_handle` at the
`Bash + run_in_background` branch, NOT cleared in
`_clear_background_handle`, and reset on the same fresh-user-prompt
path that resets `last_schedule_wakeup_arm_delay`.

Critically a LAUNCH tracker, not a LIFETIME tracker — bg-bashes can
outlive multiple user turns (long `npm install`, `tail -f`) so the
ScheduleWakeup arm-delay analogy partially breaks. Observability-only
today; bridge's `_has_fresh_bash_output` / `_has_recent_bash_action`
(runner_bridge.py:1738, 1753) remain the higher-fidelity bash-liveness
proxies and the new scalar deliberately does NOT replace them in any
suppression path. Broader background-task-lifecycle refactor stays in
#374 for v0.35.4.

7 new tests in tests/test_claude_runner.py (mirror df1b793 structure):

- test_bg_bash_register_sets_launched_at
- test_bg_bash_tool_result_preserves_launched_at
- test_multiple_bg_bashes_use_most_recent_launched_at
- test_new_user_turn_resets_bg_bash_launched_at
- test_mixed_user_message_does_not_reset_bg_bash_launched_at
- test_post_result_idle_watchdog_emits_lifecycle_logs
- test_post_result_idle_watchdog_exits_reader_done_on_reader_done

Local smoke test: `untether-dev` restarted on rc17; two Claude runs
via `@untether_dev_bot` (chat 5284581592) both emitted
`task_started` at watchdog entry (timeout_s=600.0, poll_interval_s=30.0)
and `task_exited reason=reader_done` 30 s later on the normal-flow
exit path. Test suite: 441 passed across test_claude_runner +
test_exec_bridge + test_claude_control + test_exec_runner.

The actual fix for whatever the new instrumentation reveals lands in
a follow-up rc — rc17 IS the diagnostic.

Refs #333, #544, #347, #374.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…answer (#552)

The final-question branch of AskQuestionCommand.handle now strips the
inline keyboard via ctx.executor.edit after the structured answer is
sent. Previously the buttons stayed visible/clickable, and subsequent
clicks fired ask_question.flow_missing warnings since the flow state
was already cleaned up.

Failure modes preserved:
- answer_ask_question_with_options returns False -> keyboard NOT cleared
  (so the user can retry).
- ctx.executor.edit raises -> warning logged, answer-sent response still
  returned (cosmetic cleanup must not block the answer).

Adds 4 tests in tests/test_ask_user_question.py exercising the new path,
the failure-preserves-buttons path, the edit-raises path, and the full
2-question Q1->Q2->final flow.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#549) (#553)

* fix(claude): #333 — post-result hang fix (Tier 1 subcountdown + Tier 2 stall semantics + Tier 3 limbo telemetry) + Task 4a state machine

Root cause (confirmed via rc17 instrumentation + channelo session
8876c902 reproduction 2026-05-17, 26.6 min limbo): when Claude Code
v2.1.143 closes stdout while keeping the subprocess alive, the
_post_result_idle_watchdog exited early via task_exited reason=reader_done,
bypassing the 600 s post-result-idle countdown. Stall-detector suppression
cascades (post_result + active children from MCP heartbeats) hid the
limbo from auto-cancel indefinitely.

Tier 1 (claude.py) — _post_result_subcountdown:
- When reader_done fires while proc.returncode is None, enter a
  stdout-closed subcountdown instead of returning. Poll for natural
  exit, defer if pending control_request / ask_question, SIGTERM the
  process group after timeout_s, 5 s grace, SIGKILL if still alive.
- New task_exited reasons: reader_done_but_alive_timeout,
  subprocess_exited_during_subcountdown.
- New info logs: claude.post_result_idle.reader_done_but_alive,
  subcountdown_deferred, sigterm_after_timeout, sigkill_after_grace.

Tier 2 (runner_bridge.py) — defense-in-depth:
- _POST_RESULT_LIMBO_THRESHOLD_S = 660 s (post-result idle timeout + grace).
- When post-result idle age exceeds the limbo threshold AND no other
  expected-wait flag (ScheduleWakeup / Monitor / bash) is set, stop
  suppressing auto-cancel — the watchdog missed an edge case.
- One-shot warning: progress_edits.post_result_limbo_detected.

Tier 3 (claude.py) — runner.limbo_detected warning:
- Fired 30 s into the subcountdown when subprocess is still alive and
  no pending state holds the session open.
- Picked up automatically by untether-issue-watcher → auto:error-report
  GitHub issues for future regressions.

Task 4a (runner.py + claude.py) — subprocess lifecycle state machine:
- JsonlStreamState.lifecycle_state + lifecycle_state_entered_at.
- JsonlSubprocessRunner._transition_lifecycle() helper emits
  ``subprocess.state.<name>`` info logs at every transition.
- States emitted by the watchdog: reader_eof, subcountdown, limbo,
  sigterm_sent, sigkill_sent, exited. Other transitions (streaming,
  idle_post_result, tool_active) deferred to a future patch.

Tests (7 new):
- test_claude_runner.py:
  - test_333_reader_done_but_alive_triggers_subcountdown
  - test_333_subprocess_exits_during_subcountdown
  - test_333_subcountdown_defers_on_pending_request
  - test_333_lifecycle_state_transitions_logged
- test_exec_bridge.py:
  - test_333_post_result_limbo_lets_auto_cancel_fire
  - test_333_post_result_below_limbo_threshold_still_suppresses
  - test_333_post_result_with_pending_wakeup_keeps_suppression

Full suite: 2678 passed, 2 skipped (no regressions from rc17 baseline).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(exec_bridge): fix #333 Tier 2 wakeup test for CI clock frame

In fresh CI containers ``time.monotonic()`` returns small values (~50s),
but the test's fake clock starts at 1000.0. Computing the ScheduleWakeup
deadline from real monotonic time made it look already-expired against
the fake clock — so _has_pending_wakeup returned None in CI, _real_pending
went False, and auto-cancel fired (test asserted not fired).

Express the deadline in the fake clock's frame (1010 + 60 = 1070) so the
comparison is consistent regardless of host monotonic baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…: staging 0.35.3rc18 + Task 4b (#554)

Three independent rc18 changes shipped together. The two #333 / #550
fixes are on separate PRs (fix/333-post-result-hang, fix/550-ask-question-keyboard-clear).

- **Tier 0 — pre-swap outbox delivery (functional, ~3.6 % silent loss
  fix):** At the auto-continue trigger site (runner_bridge.py:~2935),
  call deliver_outbox_files BEFORE subprocess 2 spawns. Without this,
  files written by subprocess 1 during the stuck-after-tool-results
  window were orphaned (subprocess 2 starts fresh, never scans the
  outbox). Delivery is best-effort — a failure logs
  outbox.auto_continue_delivery_failed and does NOT block auto-continue.
- **Tier 1 — reworded notice (UX):** changed the chat-side text from
  "⚠️ Auto-continuing — Claude stopped before processing tool results"
  to "🔁 Auto-resuming session after upstream Claude Code event". The
  🔁 prefix signals recovery rather than failure and discourages
  /cancel-ing the salvage. Extracted into a small
  _format_auto_continue_notice() helper for testability.

Task 4b — stall-suppression counter:
- JsonlStreamState.stall_suppression_counts: dict[str, int].
- _bump_stall_suppression(reason) helper increments at three suppression
  sites: expected_wait (auto-cancel suppression), post_result
  (notification suppression), children_active (sleeping-main + active
  children).
- session.summary now includes
  stall_suppressions=expected_wait:N,post_result:N,children_active:N
  so log audits can see suppression cascades without parsing nested JSON.

chore: version bump 0.35.3rc17 → 0.35.3rc18 in pyproject.toml; uv.lock
synced. CHANGELOG.md entry for v0.35.3rc18 covers #333 + #550 + #551
(the other two PRs reference the same entry).

Tests (4 new):
- test_exec_bridge.py:
  - test_551_auto_continue_notice_first_attempt
  - test_551_auto_continue_notice_repeat_attempt
  - test_4b_bump_stall_suppression_records_counts
  - test_4b_stall_suppression_count_bumped_on_post_result

Full suite: 2675 passed, 2 skipped.

preservation) deferred per scope decision in the rc18 plan.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(telegram): #528 — Answered: echo no longer truncated to 100 chars

The `↩️ Answered:` confirmation after an AskUserQuestion text reply was
hard-sliced at [:100] with no ellipsis, so users couldn't see whether
their full message reached the agent (the agent path was unaffected and
always received the complete text). Replace the slice with a 300-char
soft cap + ellipsis via the new `_format_answered_echo` helper.

Regression tests in tests/test_loop_coverage.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): #525 — dedup cancel.requested triple-fire

Rapid double/triple-tap of the inline Cancel button delivered three
`cancel.requested` events for one user intent (Telegram delivered
duplicate callbacks before the keyboard cleared). Repeat
`cancel_requested.set()` was benign today, but log noise + future
side-effectful cancel actions would inherit the 3x fan-out.

Add a 1-second TTL dedup keyed on (chat_id, progress_message_id) in
all three cancel entry points (text-reply, text-fallback, callback).
Per-test autouse fixture clears the module-level dict so tests that
reuse (chat_id, msg_id) aren't surprised by silent drops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(runtime): #532 — consolidate per-engine setup.warning to single summary

Previously every `config.reload.applied` emitted one [warning]
setup.warning per engine not on PATH. On a single-engine host (e.g.
channelo runs only claude) that's 5 WARNs per reload, padding warn
filters in untether-issue-watcher, /monitor, and Grafana with
intentional install state.

Replace with one INFO `setup.summary` line per reload that lists
found/missing_on_path/bad_config engines. Loud WARN now reserved for
engines the user actively configured (non-empty [engines.<id>] block)
but that aren't on PATH — those are noteworthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(preamble): #547 axis 1 — warn agents against systemctl restart untether

Agents routinely follow `Edit untether.toml` with `Bash systemctl --user
restart untether` because their training data is full of "restart the
service after config changes". Untether already hot-reloads the file;
the restart shuts down the very session issuing the command, drain hits
the 120s timeout, and the agent's final answer to the user is silently
dropped via outbox.fail_pending.

Add a dedicated "Configuration changes (`untether.toml`)" section to
the default preamble explicitly telling agents NOT to restart after
editing config, with the consequence spelled out and the restart-only
key list (`bot_token`, `chat_id`, `session_mode`, `topics`,
`message_overflow`) provided as the genuine exception.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): #523 — recognise leading-dot typos for slash commands

\`.new\`, \`.cancel\`, etc. previously dispatched as fresh agent prompts —
full Claude cold-start cost (OAuth handshake, MCP catalog probe,
preamble injection) paid before the user could cancel. \`.\` and \`/\` are
adjacent on iOS/Android punctuation rows, and several mobile keyboards
auto-replace a leading \`/\` with \`.\` on autocorrect.

Add \`parse_dot_typo\` helper that recognises \`.<cmd>\` and \`.<cmd> args\`
shapes where <cmd> matches a registered slash command (case-insensitive,
ellipsis/path-prefix safe). Wired into route_message in a follow-up
commit so detection happens before agent dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(outbox): #524 — surface skipped outbox entries to the user

When an agent writes a directory (e.g. \`guides/\`) into \`.untether-outbox/\`,
the scanner logs \`outbox.skipped\` and drops it without surfacing
anything to the user. The agent's "I've prepared the guides folder for
you" final message becomes a silent lie.

Wire the skipped tuples through to a new \`_format_outbox_skipped_notice\`
helper in runner_bridge.py (added in the #547 axis 1 commit alongside
the preamble update) that composes a brief 📎 Outbox skipped block and
sends it as a follow-up message in the same chat. Gated by new
\`[transports.telegram.files].outbox_notify_skipped\` config flag (default
true so the surface fires automatically on upgrade).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(stall): #526 — demote stall WARN to INFO during approval-pending

The runner_bridge.py change shipped in the #547 axis 1 commit (same
file edited for both fixes); this commit adds the regression tests.

When threshold_reason == \"pending_approval\", emit a paced
\`subprocess.approval_pending\` INFO (max once per 30 min) instead of the
\`progress_edits.stall_detected\` WARN. The chat-side \"⏳ Awaiting your
approval (N min)\" message (#494-C) is unchanged — only the log-side
WARN is suppressed, so warn-filter dashboards and the
untether-issue-watcher daemon stop spamming on legitimate approval
waits. Also closes #533 as a duplicate (daemon-filed
subprocess.liveness_stall on nsd — same root cause).

Tests in test_exec_bridge.py assert:
- progress_edits.stall_detected WARN is NOT emitted when
  approval_pending is true
- subprocess.approval_pending INFO IS emitted with approval_pending=True
- The INFO fires at most once per 30-min window even with rapid ticks

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(reload): #547 axes 2+3 + #548 — Telegram notification after config.reload

The loop.py wiring (\`_notify_reload_applied\` + handle_reload changes)
shipped in the #528 commit (same file edited for both fixes); this
commit adds the formatting module, tests, and FAQ entry.

New module \`src/untether/config_reload_notification.py\` exposes three
message shapes (hot-reload-only / restart-required / partial-reload)
with literal "**No restart needed.**" / "**Restart required**"
headlines — agents read these messages in next-turn context and the
framing flips the trained-in reflex to \`systemctl restart\` after
editing config.

Broadcast follows the same project-chat + admin-DM fan-out pattern as
\`_notify_restart_required\` (#318) so the affirmation reaches whoever
edited the file even in project-routed deployments.

FAQ entry "Do I need to restart Untether after editing untether.toml?"
documents the hot-reload behaviour, restart-only key list, and the
agent-don't-restart guidance from #547 axis 1.

Axis 3 (drain self-restart heuristic) deferred to v0.35.4 — the obvious
"detect the active session ran systemctl" heuristic is fragile and
inverts cleanly to false-positive on legitimate sibling-unit restarts.
The robust path needs a bigger refactor. Axes 1+2 together break the
recurring pattern at its source.

Closes #548.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(telegram): #546 — bypass outbox for answer_callback_query

Rapid taps (e.g. approving plans in two chats inside ~2s) saw
callback-answer latency escalate 6-10x: 1st click ~220ms HTTP baseline,
2nd/3rd clicks 1.4-2.9s. Root cause was that \`answer_callback_query\`
went through \`enqueue_op(chat_id=None)\` and stacked on the shared
\`_next_at[None]\` per-chat pacing bucket (private_interval=1.0s) even
though Telegram doesn't rate-limit callback-answers per chat —
they're keyed off callback-query-id.

Route \`answer_callback_query\` directly through \`self._client.answer_callback_query\`,
bypassing the outbox semaphore + per-chat pacing. Retry-after handling
preserved (one retry on TelegramRetryAfter then fail-fast — better than
silent retry loops since the spinner expires after 30s anyway). Add
\`queue_wait_ms=0.0\` field to \`callback.answered\` instrumentation so
monitoring dashboards can confirm the bypass survived future refactors.

Regression test asserts the outbox.enqueue path is never reached during
answer_callback_query.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: staging 0.35.3rc19 — /monitor campaign issue sweep

Bundles 9 issues from the recent /monitor campaign (lba-1 staging,
2026-05-13 through 2026-05-16 runs) + the daemon-filed #533 dup:

- #528 — Answered: echo no longer truncated to 100 chars
- #525 — dedup cancel.requested triple-fire
- #532 — consolidate per-engine setup.warning to one summary
- #547 axis 1 — preamble warns agents against systemctl restart
- #523 — recognise leading-dot slash-command typos (\`.new\` etc.)
- #524 — surface skipped outbox entries (directories etc.) in chat
- #526 — demote stall WARN to INFO during approval-pending (+ #533)
- #547 axes 2+3 + #548 — hot-reload Telegram notification
- #546 — bypass outbox for answer_callback_query (latency fix)

Plus housekeeping: closed #544 / #497 (verified already fixed in
rc16 / rc14), closed #531 (label + monitor TOML config drift fixed
out-of-tree). Axis 3 of #547 (drain self-restart heuristic) deferred
to v0.35.4 — needs a bigger refactor than rc19 wants. #527 (umbrella
predicate refactor) deferred to v0.35.4 per user decision.

2737 tests pass / 82.58% coverage / ruff format + check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
)

Both issues shipped rc19 fixes (PR #555) but /monitor audits on
2026-05-18 showed each regression still firing in production because
the rc19 patch landed in only one of two code paths.

#524 — outbox silently drops directory entries

rc19 surfaced 📎 Outbox skipped notices on the normal-completion path
in handle_message but missed two adjacent paths: the pre-auto-continue
delivery (subprocess 1 stuck-after-tool-result recovery) and the
run_ok=False failed-run branch. Both still silently dropped the agent's
intended deliverable.

This commit extracts the surfacing logic into _surface_outbox_skipped
in runner_bridge.py and wires it into both gap paths. On a failed run
the code still skips the actual file send (preserving the original
gating) but does a cheap scan_outbox() to collect skipped items and
surface them, so the user always learns what the agent intended to
ship. Honours the existing outbox_notify_skipped config flag and
filters the "..." overflow pseudo-entry from the user-facing block.

#526 — approval-pending stalls misclassified

rc19 demoted the bridge-side WARN (progress_edits.stall_detected) to a
paced INFO (subprocess.approval_pending) when _has_pending_approval()
returned true. The watchdog-side detector in runner.py (which emits
subprocess.liveness_stall and is the actual signal untether-issue-watcher
auto-files on) was untouched, so the daemon kept filing GitHub issues
on routine approval-pending sessions and the nsd audit (2026-05-18)
showed a user cancelling a productive 15-minute investigation because
the chat-side reassurance came too late (1800s threshold).

This commit:
- Adds _recent_event_is_control_request helper in runner.py — uses the
  stream's recent_events ring buffer as the approval-pending signal,
  consistent with the bridge's inline-keyboard predicate but accessible
  to runner-scope code.
- Plumbs the predicate into _watchdog_loop: when the last JSONL event
  is control_request, emit subprocess.approval_pending INFO instead of
  liveness_stall WARN. Skip the auto-kill branch entirely. Pace INFO
  emission to once per 30 min via the shared _APPROVAL_PENDING_REFIRE_S
  constant (now defined once in runner.py and imported by the bridge).
- Splits _STALL_THRESHOLD_APPROVAL into _STALL_THRESHOLD_APPROVAL_FIRST
  (600s) and the existing 1800s refire so the user gets a reassuring
  "tap a button above" chat message at 10 min on first occurrence,
  matching the watchdog's liveness threshold and avoiding the nsd-style
  early cancellation.
- Rewords the chat-side approval reminder copy to make the "tap a
  button above to proceed (no action needed otherwise)" affordance
  explicit, directly quoting the audit's recommended text.

Tests cover both code paths:
- tests/test_outbox_delivery.py (existing) — format helper + settings
  default unchanged; no new file-level tests needed.
- tests/test_exec_bridge.py — failed-run surfacing, notify_skipped=false
  suppression, only-overflow filter, two-tier first-reminder threshold,
  reworded copy.
- tests/test_exec_runner.py — predicate truth-table coverage, watchdog
  demotion via integration with a fake codex script emitting control
  _request, watchdog WARN still fires when no control_request is recent.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.4 to 4.35.5.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@68bde55...9e0d7b8)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* release: v0.35.3

Prepares v0.35.3 for PyPI publish — bumps version from 0.35.3rc20, dates
the CHANGELOG, and lands release-prep doc updates.

Milestone v0.35.3: 58 closed, 0 open. The 9 issues left open from rc19
(#523, #524, #525, #526, #528, #532, #546, #547, #548) are all closed
against rc19/rc20 fixes; #547 axis 3 (drain-timeout heuristic) is
deferred to v0.35.4 (tracked as #559).

CHANGELOG.md:
- New rc20 entry covers #524 + #526 follow-up patches that extended the
  rc19 surfacing logic to all 3 completion paths and added the
  runner.py-side approval-pending detector.
- New rc19 entry bundles the 7 other /monitor-campaign fixes (#523,
  #525, #528, #532, #546, #547 axes 1+2, #548).
- Added [#N] issue refs to the rc18/rc19/rc20 parent bullets so the
  release validator's per-entry issue-link gate passes on the stable
  bump.

README.md: +1 feature bullet calling out hot-reload configuration +
env_extra_allow / env_extra_prefix_allow.

docs/faq/faq.md: +1 Q/A on outbox file delivery (hot-reload Q already
present from earlier work).

docs/how-to/interactive-approval.md: +1 paragraph on the /config →
Diff preview toggle.

pyproject.toml + uv.lock: version 0.35.3rc20 → 0.35.3.

Pre-commit validation:
- ruff format --check src/ tests/        clean (283 files)
- ruff check src/ tests/                 all checks passed
- uv lock --check                        clean
- python3 scripts/validate_release.py    3 passed, 0 failed
- FAQ shape: 15 question-shaped H2s (≥7 required)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* deps: bump idna 3.11 → 3.15 to clear CVE-2026-45409

pip-audit on PR #560 flagged CVE-2026-45409 (idna < 3.15). idna is a
transitive dep through httpx → httpcore → idna. uv lock --upgrade-package
idna picks up 3.15; local pip-audit run is now clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Important

Review skipped

Too many files!

This PR contains 166 files, which is 16 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3106b261-ac4b-4cbb-87f6-a4b78b36f468

📥 Commits

Reviewing files that changed from the base of the PR and between c8d2a7e and 540ca68.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (166)
  • .claude/hooks.json
  • .claude/hooks/help-faq-protect.sh
  • .claude/hooks/release-guard-protect.sh
  • .claude/rules/dev-workflow.md
  • .claude/rules/help-faq.md
  • .claude/rules/release-discipline.md
  • .claude/skills/release-coordination/SKILL.md
  • .github/workflows/ci.yml
  • .github/workflows/codeql.yml
  • .github/workflows/dependabot-auto-merge.yml
  • .github/workflows/prerelease-deps.yml
  • .github/workflows/release.yml
  • .gitignore
  • CHANGELOG.md
  • CLAUDE.md
  • README.md
  • SECURITY.md
  • contrib/com.littlebearapps.untether-issue-watcher.plist
  • contrib/untether-issue-watcher.service
  • docs/audits/2026-04-27-380-auto-approve-scope-review.md
  • docs/explanation/architecture.md
  • docs/explanation/module-map.md
  • docs/explanation/routing-and-sessions.md
  • docs/faq/faq.md
  • docs/how-to/cost-budgets.md
  • docs/how-to/file-transfer.md
  • docs/how-to/group-chat.md
  • docs/how-to/inline-settings.md
  • docs/how-to/interactive-approval.md
  • docs/how-to/operations.md
  • docs/how-to/schedule-tasks.md
  • docs/how-to/security.md
  • docs/how-to/subscription-usage.md
  • docs/how-to/troubleshooting.md
  • docs/how-to/verbose-progress.md
  • docs/how-to/webhooks-and-cron.md
  • docs/reference/commands-and-directives.md
  • docs/reference/config.md
  • docs/reference/dev-instance.md
  • docs/reference/env-vars.md
  • docs/reference/integration-testing.md
  • docs/reference/runners/amp/runner.md
  • docs/reference/runners/amp/untether-events.md
  • docs/reference/runners/claude/runner.md
  • docs/reference/runners/claude/stream-json-cheatsheet.md
  • docs/reference/runners/claude/untether-events.md
  • docs/reference/runners/gemini/runner.md
  • docs/reference/runners/pi/runner.md
  • docs/reference/specification.md
  • docs/reference/transports/telegram.md
  • docs/reference/triggers/triggers.md
  • docs/tutorials/install.md
  • llms-full.txt
  • pyproject.toml
  • scripts/fleet-rollback.sh
  • scripts/fleet-rollout.sh
  • scripts/run-integration-tests.sh
  • src/untether/config_migrations.py
  • src/untether/config_reload_notification.py
  • src/untether/cost_tracker.py
  • src/untether/logging.py
  • src/untether/loop_scheduler.py
  • src/untether/markdown.py
  • src/untether/presenter.py
  • src/untether/progress.py
  • src/untether/runner.py
  • src/untether/runner_bridge.py
  • src/untether/runners/amp.py
  • src/untether/runners/claude.py
  • src/untether/runners/gemini.py
  • src/untether/runners/pi.py
  • src/untether/runners/run_options.py
  • src/untether/runtime_loader.py
  • src/untether/schemas/claude.py
  • src/untether/session_stats.py
  • src/untether/settings.py
  • src/untether/telegram/at_scheduler.py
  • src/untether/telegram/backend.py
  • src/untether/telegram/bridge.py
  • src/untether/telegram/chat_prefs.py
  • src/untether/telegram/client.py
  • src/untether/telegram/commands/ask_question.py
  • src/untether/telegram/commands/cancel.py
  • src/untether/telegram/commands/config.py
  • src/untether/telegram/commands/dispatch.py
  • src/untether/telegram/commands/file_transfer.py
  • src/untether/telegram/commands/handlers.py
  • src/untether/telegram/commands/listen.py
  • src/untether/telegram/commands/media.py
  • src/untether/telegram/commands/menu.py
  • src/untether/telegram/commands/parse.py
  • src/untether/telegram/commands/ping.py
  • src/untether/telegram/commands/stats.py
  • src/untether/telegram/commands/topics.py
  • src/untether/telegram/commands/usage.py
  • src/untether/telegram/engine_overrides.py
  • src/untether/telegram/listen_mode.py
  • src/untether/telegram/loop.py
  • src/untether/telegram/onboarding.py
  • src/untether/telegram/topic_state.py
  • src/untether/triggers/cron.py
  • src/untether/triggers/dispatcher.py
  • src/untether/triggers/history.py
  • src/untether/triggers/manager.py
  • src/untether/triggers/server.py
  • src/untether/utils/env_audit.py
  • src/untether/utils/env_policy.py
  • src/untether/utils/usage_cache.py
  • tests/conftest.py
  • tests/test_amp_runner.py
  • tests/test_ask_user_question.py
  • tests/test_at_command.py
  • tests/test_bridge_config_reload.py
  • tests/test_build_args.py
  • tests/test_cancel_dedup.py
  • tests/test_claude_control.py
  • tests/test_claude_runner.py
  • tests/test_claude_schema.py
  • tests/test_cli_auto_router.py
  • tests/test_cli_chat_id.py
  • tests/test_cli_commands.py
  • tests/test_cli_config.py
  • tests/test_cli_doctor.py
  • tests/test_cli_helpers.py
  • tests/test_command_engine_gates.py
  • tests/test_config_command.py
  • tests/test_config_path_env.py
  • tests/test_config_reload_notification.py
  • tests/test_config_watch.py
  • tests/test_cost_tracker.py
  • tests/test_dot_typo_parse.py
  • tests/test_env_audit.py
  • tests/test_env_policy.py
  • tests/test_exec_bridge.py
  • tests/test_exec_runner.py
  • tests/test_logging_redaction.py
  • tests/test_loop_coverage.py
  • tests/test_loop_scheduler.py
  • tests/test_meta_line.py
  • tests/test_onboarding.py
  • tests/test_onboarding_interactive.py
  • tests/test_outbox_delivery.py
  • tests/test_pi_runner.py
  • tests/test_ping_command.py
  • tests/test_preamble.py
  • tests/test_projects_config.py
  • tests/test_runner_utils.py
  • tests/test_runtime_loader.py
  • tests/test_session_stats.py
  • tests/test_settings.py
  • tests/test_settings_contract.py
  • tests/test_stats_command.py
  • tests/test_telegram_agent_trigger_commands.py
  • tests/test_telegram_backend.py
  • tests/test_telegram_engine_overrides.py
  • tests/test_telegram_file_transfer_helpers.py
  • tests/test_telegram_media_command.py
  • tests/test_telegram_queue.py
  • tests/test_telegram_trigger_mode.py
  • tests/test_trigger_auth.py
  • tests/test_trigger_cron.py
  • tests/test_trigger_dispatcher.py
  • tests/test_trigger_manager.py
  • tests/test_triggers_history.py
  • tests/test_usage_cache.py
  • tests/test_verbose_progress.py

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…xt (#563)

Media-group (2+ file) uploads failed with "no project context available
for file upload" on single-project DM deployments. _handle_media_group
computed ambient_context from only topic_store + _topics_chat_project,
never consulting the per-chat ChatPrefsStore — unlike the single-file
path (loop.py:build_message_context), which has a topic-bound →
chat_prefs → default fallback ladder.

Thread chat_prefs through MediaGroupBuffer → _handle_media_group and
mirror build_message_context's fallback ladder. Add a regression test.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 21, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedmsgspec@​0.20.0100100100100100

View full report

Adds a v0.35.3 ### fixes entry for the media-group file-upload bug
fixed in #563. v0.35.3 is unreleased (not yet tagged), so this is the
current changelog section, not a retroactive edit.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant