feat: chat_id in API error logs + bot_telegram_api_calls_total counter#118
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
Adds observability around outbound Telegram API usage to better diagnose/quantify 429 cascades by (1) enriching selected warn logs with chat context and (2) introducing a denominator metric for error-rate ratios.
Changes:
- Add chat context (
chat_id,message_thread_id) toRate limitedandHTTP errortelegram-apiwarn logs. - Introduce
bot_telegram_api_calls_total{method,binding}Prometheus counter (binding-labelled withnone/unboundsentinels). - Add unit tests covering chat-context extraction/log formatting and call-counter labeling semantics.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents the new Telegram API call counter and the updated warn-log format. |
| docs/plans/completed/2026-05-15-telegram-api-call-observability.md | Adds the completed plan / rationale and acceptance checklist for the observability work. |
| bot/src/telegram-bot.ts | Extracts API transformer into a testable factory; adds chat-context helpers and binding-label resolution; wires in call counting. |
| bot/src/metrics.ts | Adds telegramApiCalls counter + recordTelegramApiCall recorder. |
| bot/src/tests/telegram-bot.test.ts | Adds tests for chat-context helpers, transformer logging behavior, and call-counter labeling/cardinality guards. |
| bot/src/tests/metrics.test.ts | Adds tests for recordTelegramApiCall. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+13
to
+16
| ## Reference: Current rate-limit logging | ||
|
|
||
| `bot/src/telegram-bot.ts:519-534` — inner transformer that logs 429s and records `bot_telegram_api_errors_total`: | ||
|
|
Comment on lines
+610
to
+613
| // Log Telegram API errors, especially 429 rate limits, and count every API | ||
| // call attempt by binding (inner transformer — sees each individual attempt | ||
| // before autoRetry decides whether to retry) | ||
| bot.api.config.use(createApiErrorLoggingTransformer({ bindings: config.bindings }) as Parameters<typeof bot.api.config.use>[0]); |
… counter Two observability improvements for diagnosing 429 cascades on sendMessageDraft (issue #117). Behavior-only changes for diagnosis; no retry/throttle modifications. - Rate-limited and HTTP-error warn logs from the inner Telegram API transformer now carry chat_id and message_thread_id when the API payload has them. Methods without a chat_id (getUpdates, getMe, setWebhook, etc.) continue to log cleanly. This makes a single grep over the bot stderr log sufficient to attribute a burst to its binding, without cross-referencing JSONL modification times. - New Prometheus counter bot_telegram_api_calls_total{method, binding} exposes the denominator for error-rate ratios. The binding label uses the resolved binding's label for chat-targeted methods, "none" for chatless methods, and "unbound" for chat-targeted calls whose chat_id matches no configured binding. - README documents both new diagnostic surfaces. Tests cover the chat_id formatting branch, the counter increment on success and 429, and the sentinel-label branches. This branch was force-pushed to scrub PII flagged by the repository's Gitleaks scan: a real numeric chat_id in tests and README examples, the production binding label "Yulia DM", and one home-directory path in the plan reference section. The functional changes are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2e6022a to
a9ff6f4
Compare
- bot/src/telegram-bot.ts: import grammy's Transformer type and use it as the return type of createApiErrorLoggingTransformer. Removes the Parameters<typeof bot.api.config.use>[0] cast at the call site so the transformer participates in grammy's per-method type checking. This immediately surfaced 5 mock-payload type errors in tests where sendMessageDraft was called without the required draft_id field. - bot/src/__tests__/telegram-bot.test.ts: add the now-required draft_id to the affected sendMessageDraft mock payloads. - docs/plans/.../2026-05-15-telegram-api-call-observability.md: add a note at the top clarifying that Reference sections describe the pre-PR state (file:line citations point at the legacy inline transformer that was refactored into the factory during implementation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+84
to
+89
| numericChatId = ctx.chatId; | ||
| } else { | ||
| // Numeric strings (rare but valid per Bot API) — resolve normally. | ||
| // Channel-username strings (@-prefixed) cannot match a numeric binding. | ||
| const parsed = Number(ctx.chatId); | ||
| if (!Number.isInteger(parsed)) return BINDING_LABEL_UNBOUND; |
| } | ||
| const binding = resolveBinding(numericChatId, bindings, ctx.messageThreadId); | ||
| if (!binding) return BINDING_LABEL_UNBOUND; | ||
| return binding.label ?? binding.agentId; |
4 tasks
fitz123
added a commit
that referenced
this pull request
May 27, 2026
…127) ## Summary Closes #117. Wrap `autoRetry` in a method-gated transformer that bypasses retries for `sendMessageDraft` only. Drafts are cosmetic fire-and-forget calls — a 3-10 s late retry lands on a stream that has already moved on, and the 5× amplification turns one rate-limited draft into five log lines and five Prometheus error-counter increments. Every other Telegram API method retains the full `AUTO_RETRY_OPTIONS` retry behavior unchanged. Also fixes the misleading "Drafts are cosmetic — no rate limits" comment in `bot/src/stream-relay.ts` that originally motivated the 300 ms debounce, and adds a one-line README caveat that `bot_telegram_api_calls_total{method="sendMessageDraft"}` no longer counts per-attempt (one call = one increment, vs other methods' up-to-5). ## Plan and pipeline - Plan: `docs/plans/2026-05-26-telegram-draft-autoretry-skip.md` - Ralphex pipeline ran: implementation + 7-agent multi-review (claude) passed clean, then a 2nd-pass claude review picked up two findings — one false positive (rejected), one acted on (README per-attempt caveat). Codex external review: `NO ISSUES FOUND`. - Pipeline terminated on the final "claude evaluating codex findings" step due to a Claude session-window limit; all substantive work was complete by that point. ## What's NOT in scope - Changing `DRAFT_DEBOUNCE_MS` (Option A from #117). Deferred until residual `sendMessageDraft` 429 ratio is measured against the new `bot_telegram_api_calls_total` counter (deployed in #118). ## Test plan - [x] `cd bot && npx tsc --noEmit` — passes - [x] `cd bot && npm test` — 1068 / 1068 pass - [ ] After merge + restart: trigger a long streaming reply in a DM, confirm `Rate limited: method=sendMessageDraft` warn lines drop ~5× and `sendMessage`/`sendChatAction` still retry on 429 - [ ] After merge + restart: `curl -s http://127.0.0.1:9091/metrics | grep bot_telegram_api_calls_total` — verify `sendMessageDraft` rows still increment (one per logical draft, not per retry) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: fitz123 <fitz123@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Observability-only changes for diagnosing Telegram API 429 cascades (issue #117):
Rate limitedandHTTP errorwarn logs now carrychat_idandmessage_thread_id(when the payload has them), so a singlegrepovertelegram-bot.stderr.logidentifies the originating binding without external correlation.bot_telegram_api_calls_total{method, binding}exposes the denominator for error-rate ratios. Thebindinglabel uses the resolved binding'slabelfor chat-targeted methods,"none"for chatless methods (getUpdates,getMe), and"unbound"for chat-targeted calls whose chat_id has no configured binding.No retry/throttle behavior changed. The autoRetry skip for
sendMessageDraft(the actual fix for the 429 cascade) is a separate follow-up — this PR is the data plane that lets us measure the effect.Plan and review
docs/plans/completed/2026-05-15-telegram-api-call-observability.md(moved tocompleted/by ralphex after pipeline)Test plan
cd bot && npx tsc --noEmit— passescd bot && npm test— 1055/1056 tests pass; one unrelated pre-existing failure in whisper-model path testgrep "Rate limited" telegram-bot.stderr.logshould showchat_id=<id>on rate-limit lines (if any fire)curl -s http://127.0.0.1:9091/metrics | grep bot_telegram_api_calls_totalshould expose the new counter with per-binding labels🤖 Generated with Claude Code