Skip to content

feat(typing): Add TTL safety mechanism to prevent stuck typing indica…#27428

Closed
Crpdim wants to merge 1 commit intoopenclaw:mainfrom
Crpdim:feat/typing-ttl-safety
Closed

feat(typing): Add TTL safety mechanism to prevent stuck typing indica…#27428
Crpdim wants to merge 1 commit intoopenclaw:mainfrom
Crpdim:feat/typing-ttl-safety

Conversation

@Crpdim
Copy link
Contributor

@Crpdim Crpdim commented Feb 26, 2026

Summary

  • Problem: Typing indicators in Discord/Telegram get stuck indefinitely when session lifecycle doesn't trigger cleanup (e.g., NO_REPLY, rate limit errors, subagent timeouts)
  • Why it matters: Degraded UX - users see "typing..." forever with no response; makes assistant appear broken
  • What changed: Added maxDurationMs parameter to createTypingCallbacks() with 60s default TTL; auto-stops typing if cleanup not called
  • What did NOT change: No changes to typing start logic, keepalive intervals, or channel-specific implementations; purely additive safety mechanism

Change Type (select all)

  • Bug fix (defense-in-depth)
  • Feature (new TTL safety parameter)
  • Refactor
  • Docs
  • Security hardening (resource leak prevention)
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration (typing lifecycle)
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations (affects all channels using typing indicators)
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Before:

  • Typing indicator could persist indefinitely (>60s) if session cleanup failed
  • Required manual Gateway restart to clear stuck typing

After:

  • Typing auto-stops after 60s (configurable via maxDurationMs)
  • Console warning logged when TTL triggers: [typing] TTL exceeded (60000ms), auto-stopping typing indicator
  • Existing behavior unchanged for normal flows (cleanup triggers before TTL)

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Linux/macOS
  • Runtime/container: Node.js 22+
  • Model/provider: N/A (typing lifecycle issue)
  • Integration/channel: Discord, Telegram (affects all channels)
  • Relevant config: Default config

Steps

  1. Start OpenClaw Gateway with Discord channel enabled
  2. Send message that triggers subagent spawn (sessions_spawn)
  3. Main session replies but subagent continues running internally
  4. Observe typing indicator persists after reply (before fix)

Expected

  • Typing stops within 60s even if internal processes hang

Actual

  • ✅ With fix: Typing stops at 60s, warning logged
  • ✅ Without fix: Typing persists until subagent completes (can be minutes)

Evidence

  • Failing test/log before + passing after
    • Added 6 new TTL-specific tests in typing.test.ts
    • All 16 tests pass (10 existing + 6 new)
    • Test: "auto-stops typing after maxDurationMs" validates the fix
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)
    Test output:

✓ auto-stops typing after maxDurationMs
✓ does not auto-stop if idle is called before TTL
✓ uses default 60s TTL when not specified
✓ disables TTL when maxDurationMs is 0
✓ resets TTL timer on restart after idle

Human Verification (required)

Verified scenarios:

  • TTL triggers correctly after 60s when no cleanup called
  • Early cleanup (onIdle/onCleanup) cancels TTL timer
  • Default 60s TTL applies when parameter not specified
  • TTL disabled when maxDurationMs=0
  • Existing keepalive behavior unchanged (3s interval)

Edge cases checked:

  • Multiple start/stop cycles don't leak timers
  • Closed state prevents restart after cleanup
  • Error in stop() callback doesn't prevent TTL cleanup

What did NOT verify:

  • Actual Discord/Telegram API behavior (tested via unit tests only)
  • Performance impact under high load (timer overhead is minimal)

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Existing code without maxDurationMs automatically gets 60s protection. No changes required.

Failure Recovery (if this breaks)

  • How to disable: Set maxDurationMs: 0 in typing callback creation
  • Files to restore: Revert src/channels/typing.ts and src/channels/typing.test.ts
  • Bad symptoms: Typing stops too early (<60s) or warnings spammed (check TTL timer logic)

Risks and Mitigations

Risk: 60s default TTL might be too short for legitimate long-running operations

  • Mitigation: TTL only affects typing indicator, not the actual operation; user still receives response when ready. Can be customized per-channel if needed.
    Risk: Warning logs could be noisy if TTL triggers frequently
  • Mitigation: Warning only fires when TTL actually saves the day (indicates underlying lifecycle bug); should be investigated, not ignored.

Risk: Timer leak if fireStop() throws

  • Mitigation: TTL timer cleared in finally-equivalent path; error in stop() caught and logged.

AI-Assisted Development 🤖

This PR was vibe-coded with Claude (AI assistant).

  • Testing: Fully tested (6 new TTL tests + 10 existing tests all pass)
  • Code Review: Human-reviewed the logic before submission
  • Understanding: Yes, I understand the TTL mechanism and its integration with the typing lifecycle

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 26, 2026

Greptile Summary

Adds a TTL (Time-To-Live) safety mechanism to typing indicators that auto-stops them after 60 seconds if cleanup isn't triggered normally. This prevents typing indicators from getting stuck indefinitely when session lifecycle fails (e.g., during rate limits, subagent timeouts, or NO_REPLY scenarios).

Implementation:

  • Added maxDurationMs parameter to createTypingCallbacks() with 60s default
  • Timer starts when typing begins and auto-stops if not manually cleaned up
  • Timer properly cleared on normal cleanup paths (onIdle/onCleanup)
  • Disabled when maxDurationMs set to 0

Test Coverage:

  • 6 new comprehensive tests covering TTL triggers, early cleanup, default behavior, disabled mode, and timer lifecycle
  • All tests properly use fake timers and mock console warnings

Code Quality:

  • Clean timer management with proper cleanup in all paths
  • Defensive checks prevent race conditions
  • Single-threaded execution model eliminates concurrency issues
  • Follows repository TypeScript style guidelines

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-designed with proper timer cleanup, comprehensive test coverage (6 new tests), no breaking changes, and addresses a real UX issue. The default 60s TTL is conservative enough to not interfere with normal operations while preventing indefinite stuck typing states. Code follows repository conventions and handles all edge cases correctly.
  • No files require special attention

Last reviewed commit: c820e91

…tors

Add maxDurationMs parameter (default 60s) to createTypingCallbacks() as a
defense-in-depth protection against typing indicators getting stuck indefinitely.

When typing is started, a TTL timer begins. If onIdle()/onCleanup() is not
called within maxDurationMs, the typing indicator is auto-stopped and a
warning is logged.

This addresses multiple reports of stuck typing indicators:
- Discord typing persists after delivery (openclaw#27033)
- Typing stuck on NO_REPLY (openclaw#27011)
- Discord typing stuck after send (openclaw#26891)
- Telegram typing stuck on rate limit (openclaw#27360)

Changes:
- Add maxDurationMs parameter with 60s default
- Add TTL timer management (startTtlTimer/clearTtlTimer)
- Auto-stop typing when TTL exceeded with warning log
- Add comprehensive TTL test suite (6 new tests)

The fix is backward compatible - existing code without maxDurationMs
automatically gets the 60s safety protection.

Fixes openclaw#27011
Fixes openclaw#27033
Fixes openclaw#26891
Related openclaw#27360 openclaw#27347
@Crpdim Crpdim force-pushed the feat/typing-ttl-safety branch from 06b3781 to c820e91 Compare February 26, 2026 10:33
@Crpdim Crpdim closed this Feb 26, 2026
@Crpdim Crpdim reopened this Feb 26, 2026
@Crpdim Crpdim closed this Feb 26, 2026
@Crpdim Crpdim reopened this Feb 26, 2026
steipete added a commit that referenced this pull request Feb 26, 2026
…hanks @Crpdim)

Co-authored-by: Crpdim <crpdim@users.noreply.github.com>
@steipete
Copy link
Contributor

Landed on main via commit 0231cac95.

What was landed from this PR:

  • Added TTL/max-duration guard to shared typing callbacks to auto-stop stuck indicators when explicit lifecycle cleanup is missed.
  • Added comprehensive unit coverage for TTL stop behavior, restart behavior, and explicit-stop precedence.
  • Added changelog entry under 2026.2.26 (Unreleased) with attribution.

Original PR commit:

  • c820e91afaf14c66dc518ccbfaf08f395d71df1e

Validation run before landing (/landpr flow):

  • pnpm lint: pass
  • pnpm build: pass
  • pnpm test: pass

Thanks @Crpdim for the safety-net improvement.

@steipete steipete closed this Feb 26, 2026
This was referenced Feb 26, 2026
robbyczgw-cla pushed a commit to robbyczgw-cla/openclaw that referenced this pull request Feb 26, 2026
…27428, thanks @Crpdim)

Co-authored-by: Crpdim <crpdim@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

2 participants