feat: chat_id in API error logs + bot_telegram_api_calls_total counter by fitz123 · Pull Request #118 · fitz123/claude-code-bot

fitz123 · 2026-05-15T21:24:25Z

Summary

Observability-only changes for diagnosing Telegram API 429 cascades (issue #117):

Rate limited and HTTP error warn logs now carry chat_id and message_thread_id (when the payload has them), so a single grep over telegram-bot.stderr.log identifies the originating binding without external correlation.
New Prometheus counter bot_telegram_api_calls_total{method, binding} exposes the denominator for error-rate ratios. The binding label uses the resolved binding's label for chat-targeted methods, "none" for chatless methods (getUpdates, getMe), and "unbound" for chat-targeted calls whose chat_id has no configured binding.

No retry/throttle behavior changed. The autoRetry skip for sendMessageDraft (the actual fix for the 429 cascade) is a separate follow-up — this PR is the data plane that lets us measure the effect.

Plan and review

Plan: docs/plans/completed/2026-05-15-telegram-api-call-observability.md (moved to completed/ by ralphex after pipeline)
Ralphex multi-agent pipeline: 4 review rounds (Claude quality+implementation × 2, codex × 1, final pass), all clean — no critical/major issues found.

Test plan

cd bot && npx tsc --noEmit — passes
cd bot && npm test — 1055/1056 tests pass; one unrelated pre-existing failure in whisper-model path test
After merge: restart bot, send a message in any DM, trigger a streaming reply, then grep "Rate limited" telegram-bot.stderr.log should show chat_id=<id> on rate-limit lines (if any fire)
After merge: curl -s http://127.0.0.1:9091/metrics | grep bot_telegram_api_calls_total should expose the new counter with per-binding labels

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds observability around outbound Telegram API usage to better diagnose/quantify 429 cascades by (1) enriching selected warn logs with chat context and (2) introducing a denominator metric for error-rate ratios.

Changes:

Add chat context (chat_id, message_thread_id) to Rate limited and HTTP error telegram-api warn logs.
Introduce bot_telegram_api_calls_total{method,binding} Prometheus counter (binding-labelled with none/unbound sentinels).
Add unit tests covering chat-context extraction/log formatting and call-counter labeling semantics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
README.md	Documents the new Telegram API call counter and the updated warn-log format.
docs/plans/completed/2026-05-15-telegram-api-call-observability.md	Adds the completed plan / rationale and acceptance checklist for the observability work.
bot/src/telegram-bot.ts	Extracts API transformer into a testable factory; adds chat-context helpers and binding-label resolution; wires in call counting.
bot/src/metrics.ts	Adds `telegramApiCalls` counter + `recordTelegramApiCall` recorder.
bot/src/tests/telegram-bot.test.ts	Adds tests for chat-context helpers, transformer logging behavior, and call-counter labeling/cardinality guards.
bot/src/tests/metrics.test.ts	Adds tests for `recordTelegramApiCall`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+## Reference: Current rate-limit logging
+
+`bot/src/telegram-bot.ts:519-534` — inner transformer that logs 429s and records `bot_telegram_api_errors_total`:
+


+  // Log Telegram API errors, especially 429 rate limits, and count every API
+  // call attempt by binding (inner transformer — sees each individual attempt
+  // before autoRetry decides whether to retry)
+  bot.api.config.use(createApiErrorLoggingTransformer({ bindings: config.bindings }) as Parameters<typeof bot.api.config.use>[0]);


… counter Two observability improvements for diagnosing 429 cascades on sendMessageDraft (issue #117). Behavior-only changes for diagnosis; no retry/throttle modifications. - Rate-limited and HTTP-error warn logs from the inner Telegram API transformer now carry chat_id and message_thread_id when the API payload has them. Methods without a chat_id (getUpdates, getMe, setWebhook, etc.) continue to log cleanly. This makes a single grep over the bot stderr log sufficient to attribute a burst to its binding, without cross-referencing JSONL modification times. - New Prometheus counter bot_telegram_api_calls_total{method, binding} exposes the denominator for error-rate ratios. The binding label uses the resolved binding's label for chat-targeted methods, "none" for chatless methods, and "unbound" for chat-targeted calls whose chat_id matches no configured binding. - README documents both new diagnostic surfaces. Tests cover the chat_id formatting branch, the counter increment on success and 429, and the sentinel-label branches. This branch was force-pushed to scrub PII flagged by the repository's Gitleaks scan: a real numeric chat_id in tests and README examples, the production binding label "Yulia DM", and one home-directory path in the plan reference section. The functional changes are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- bot/src/telegram-bot.ts: import grammy's Transformer type and use it as the return type of createApiErrorLoggingTransformer. Removes the Parameters<typeof bot.api.config.use>[0] cast at the call site so the transformer participates in grammy's per-method type checking. This immediately surfaced 5 mock-payload type errors in tests where sendMessageDraft was called without the required draft_id field. - bot/src/__tests__/telegram-bot.test.ts: add the now-required draft_id to the affected sendMessageDraft mock payloads. - docs/plans/.../2026-05-15-telegram-api-call-observability.md: add a note at the top clarifying that Reference sections describe the pre-PR state (file:line citations point at the legacy inline transformer that was refactored into the factory during implementation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

+    numericChatId = ctx.chatId;
+  } else {
+    // Numeric strings (rare but valid per Bot API) — resolve normally.
+    // Channel-username strings (@-prefixed) cannot match a numeric binding.
+    const parsed = Number(ctx.chatId);
+    if (!Number.isInteger(parsed)) return BINDING_LABEL_UNBOUND;


+  }
+  const binding = resolveBinding(numericChatId, bindings, ctx.messageThreadId);
+  if (!binding) return BINDING_LABEL_UNBOUND;
+  return binding.label ?? binding.agentId;


…127) ## Summary Closes #117. Wrap `autoRetry` in a method-gated transformer that bypasses retries for `sendMessageDraft` only. Drafts are cosmetic fire-and-forget calls — a 3-10 s late retry lands on a stream that has already moved on, and the 5× amplification turns one rate-limited draft into five log lines and five Prometheus error-counter increments. Every other Telegram API method retains the full `AUTO_RETRY_OPTIONS` retry behavior unchanged. Also fixes the misleading "Drafts are cosmetic — no rate limits" comment in `bot/src/stream-relay.ts` that originally motivated the 300 ms debounce, and adds a one-line README caveat that `bot_telegram_api_calls_total{method="sendMessageDraft"}` no longer counts per-attempt (one call = one increment, vs other methods' up-to-5). ## Plan and pipeline - Plan: `docs/plans/2026-05-26-telegram-draft-autoretry-skip.md` - Ralphex pipeline ran: implementation + 7-agent multi-review (claude) passed clean, then a 2nd-pass claude review picked up two findings — one false positive (rejected), one acted on (README per-attempt caveat). Codex external review: `NO ISSUES FOUND`. - Pipeline terminated on the final "claude evaluating codex findings" step due to a Claude session-window limit; all substantive work was complete by that point. ## What's NOT in scope - Changing `DRAFT_DEBOUNCE_MS` (Option A from #117). Deferred until residual `sendMessageDraft` 429 ratio is measured against the new `bot_telegram_api_calls_total` counter (deployed in #118). ## Test plan - [x] `cd bot && npx tsc --noEmit` — passes - [x] `cd bot && npm test` — 1068 / 1068 pass - [ ] After merge + restart: trigger a long streaming reply in a DM, confirm `Rate limited: method=sendMessageDraft` warn lines drop ~5× and `sendMessage`/`sendChatAction` still retry on 429 - [ ] After merge + restart: `curl -s http://127.0.0.1:9091/metrics | grep bot_telegram_api_calls_total` — verify `sendMessageDraft` rows still increment (one per logical draft, not per retry) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: fitz123 <fitz123@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 15, 2026 21:24

Copilot started reviewing on behalf of fitz123 May 15, 2026 21:24 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

fitz123 force-pushed the feat/telegram-api-chat-id-log branch from 2e6022a to a9ff6f4 Compare May 15, 2026 21:34

Copilot AI review requested due to automatic review settings May 15, 2026 21:42

Copilot started reviewing on behalf of fitz123 May 15, 2026 21:43 View session

fitz123 merged commit a9e6bb4 into main May 15, 2026
4 checks passed

fitz123 deleted the feat/telegram-api-chat-id-log branch May 15, 2026 21:45

Copilot AI reviewed May 15, 2026

View reviewed changes

fitz123 mentioned this pull request May 26, 2026

feat: skip autoRetry for sendMessageDraft to stop 429 amplification #127

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: chat_id in API error logs + bot_telegram_api_calls_total counter#118

feat: chat_id in API error logs + bot_telegram_api_calls_total counter#118
fitz123 merged 2 commits into
mainfrom
feat/telegram-api-chat-id-log

fitz123 commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		## Reference: Current rate-limit logging

		`bot/src/telegram-bot.ts:519-534` — inner transformer that logs 429s and records `bot_telegram_api_errors_total`:

Conversation

fitz123 commented May 15, 2026

Summary

Plan and review

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants