fix(telegram): increase polling stall threshold from 90s to 300s#57737
Conversation
Greptile SummaryThis PR increases the Telegram polling stall watchdog threshold from 90 seconds to 5 minutes (300 seconds) to prevent false-positive gateway restarts during legitimate long-running LLM message generation. It also adds an optional The implementation is clean and correct:
The test suite remains thorough: stall fires when both Confidence Score: 5/5Safe to merge — targeted threshold increase with a clean configurability hook and fully passing test coverage. All changes are straightforward, correctly implemented, and well-tested. The threshold increase directly addresses the described false-positive restart problem. No regressions are introduced, the API is additive (optional parameter with a sensible default), and every stall-detection test scenario remains logically valid after the timestamp updates. No files require special attention.
|
| Filename | Overview |
|---|---|
| extensions/telegram/src/polling-session.ts | Threshold constant renamed and increased to 300s, stallThresholdMs option and #stallThresholdMs field added, watchdog uses instance variable throughout — all correct. |
| extensions/telegram/src/polling-session.test.ts | Mock timestamps updated from 120_001 to 310_001 to match the new 300s threshold; all test scenarios remain logically valid. |
Reviews (1): Last reviewed commit: "fix(telegram): increase polling stall th..." | Re-trigger Greptile
CI StatusAll checks related to this PR pass:
The remaining failures ( This PR touches only |
|
@steipete Ready for review. All telegram-related CI checks pass. The other failures are pre-existing on main (shard infra) or require internal secrets (build-dist, security). What this fixes: Telegram bot gateway restarts mid-message when LLM takes >90s to respond. Users lose their responses and have to wait 3-7 minutes for recovery. Reported by multiple users in #57660. The fix is 3 lines of logic — just raising the hardcoded 90s stall threshold to 300s and making it configurable for the future. Tests updated accordingly. Happy to adjust the threshold value if you prefer a different default. |
|
Update on CI failures: All failing jobs are pre-existing on
This PR touches only |
|
Maintainer triage from the current Telegram stall reports: I would not merge this as-is. Raising the single watchdog threshold from 90s to 300s helps #57660-style false positives during long model runs, but it also delays recovery for the active Better shape: split the thresholds instead of one global bump:
Recommendation: do not merge until it distinguishes false quiet windows from actual active socket hangs. |
|
Triage note after the recent Telegram polling work landed on Current main deliberately kept |
9619432 to
3f95fc8
Compare
|
Landed via squash merge onto main.
Adapted this to a middle-ground 120s default plus bounded Thanks @Vitalcheffe! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f95fc8c3e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| streaming: ChannelPreviewStreamingConfigSchema.optional(), | ||
| mediaMaxMb: z.number().positive().optional(), | ||
| timeoutSeconds: z.number().int().positive().optional(), | ||
| pollingStallThresholdMs: z.number().int().min(30_000).max(600_000).optional(), |
There was a problem hiding this comment.
Regenerate bundled schema metadata for pollingStallThresholdMs
Adding pollingStallThresholdMs to the Telegram Zod schema here without regenerating bundled config metadata leaves the static channel schema out of sync (the Telegram entry in src/config/bundled-channel-config-metadata.generated.ts still has timeoutSeconds but no pollingStallThresholdMs). That drift means consumers that depend on generated channel metadata for schema/hint surfaces can miss the new option, so config discovery and validation messaging become inconsistent.
Useful? React with 👍 / 👎.
Summary
The Telegram polling stall detector fires at 90 seconds of API inactivity, causing false gateway restarts during legitimate LLM message processing (fixes #57660).
Root cause
POLL_STALL_THRESHOLD_MSinextensions/telegram/src/polling-session.tswas hardcoded to 90 seconds. When the bot processes a message that requires a long LLM response (2-5 minutes), no Telegram API calls are made during that time. The watchdog interprets this silence as a polling stall and forces a gateway restart, which interrupts message generation and causes 3-7 minutes of delivery failures.Fix
stallThresholdMsparameter toTelegramPollingSessionso the threshold can be tuned without code changes in the futureTesting
extension-fast (telegram)✅ — passes in CIcheck/check-additional/build-smoke/build-artifacts✅ — all passpolling-session.test.tsandmonitor.test.tsto use the new 300s thresholdNote:
build-distandsecurityjobs require maintainer secrets — this is expected for external contributions. The remainingchecks-node-test-*andchecks-windows-node-test-*failures are pre-existing onmain(CI shard infrastructure issues, same failures on latest main branch run).Changes
extensions/telegram/src/polling-session.ts: threshold constant, options type, class field, watchdog logic (13 insertions, 4 deletions)extensions/telegram/src/polling-session.test.ts: updated mock timestamps (10 lines changed)extensions/telegram/src/monitor.test.ts: updated timer advances (3 lines changed)🤖 AI-assisted (OpenClaw agent).