Skip to content

feat: chat_id in API error logs + bot_telegram_api_calls_total counter#118

Merged
fitz123 merged 2 commits into
mainfrom
feat/telegram-api-chat-id-log
May 15, 2026
Merged

feat: chat_id in API error logs + bot_telegram_api_calls_total counter#118
fitz123 merged 2 commits into
mainfrom
feat/telegram-api-chat-id-log

Conversation

@fitz123
Copy link
Copy Markdown
Owner

@fitz123 fitz123 commented May 15, 2026

Summary

Observability-only changes for diagnosing Telegram API 429 cascades (issue #117):

  • Rate limited and HTTP error warn logs now carry chat_id and message_thread_id (when the payload has them), so a single grep over telegram-bot.stderr.log identifies the originating binding without external correlation.
  • New Prometheus counter bot_telegram_api_calls_total{method, binding} exposes the denominator for error-rate ratios. The binding label uses the resolved binding's label for chat-targeted methods, "none" for chatless methods (getUpdates, getMe), and "unbound" for chat-targeted calls whose chat_id has no configured binding.

No retry/throttle behavior changed. The autoRetry skip for sendMessageDraft (the actual fix for the 429 cascade) is a separate follow-up — this PR is the data plane that lets us measure the effect.

Plan and review

  • Plan: docs/plans/completed/2026-05-15-telegram-api-call-observability.md (moved to completed/ by ralphex after pipeline)
  • Ralphex multi-agent pipeline: 4 review rounds (Claude quality+implementation × 2, codex × 1, final pass), all clean — no critical/major issues found.

Test plan

  • cd bot && npx tsc --noEmit — passes
  • cd bot && npm test — 1055/1056 tests pass; one unrelated pre-existing failure in whisper-model path test
  • After merge: restart bot, send a message in any DM, trigger a streaming reply, then grep "Rate limited" telegram-bot.stderr.log should show chat_id=<id> on rate-limit lines (if any fire)
  • After merge: curl -s http://127.0.0.1:9091/metrics | grep bot_telegram_api_calls_total should expose the new counter with per-binding labels

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 15, 2026 21:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds observability around outbound Telegram API usage to better diagnose/quantify 429 cascades by (1) enriching selected warn logs with chat context and (2) introducing a denominator metric for error-rate ratios.

Changes:

  • Add chat context (chat_id, message_thread_id) to Rate limited and HTTP error telegram-api warn logs.
  • Introduce bot_telegram_api_calls_total{method,binding} Prometheus counter (binding-labelled with none/unbound sentinels).
  • Add unit tests covering chat-context extraction/log formatting and call-counter labeling semantics.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
README.md Documents the new Telegram API call counter and the updated warn-log format.
docs/plans/completed/2026-05-15-telegram-api-call-observability.md Adds the completed plan / rationale and acceptance checklist for the observability work.
bot/src/telegram-bot.ts Extracts API transformer into a testable factory; adds chat-context helpers and binding-label resolution; wires in call counting.
bot/src/metrics.ts Adds telegramApiCalls counter + recordTelegramApiCall recorder.
bot/src/tests/telegram-bot.test.ts Adds tests for chat-context helpers, transformer logging behavior, and call-counter labeling/cardinality guards.
bot/src/tests/metrics.test.ts Adds tests for recordTelegramApiCall.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +13 to +16
## Reference: Current rate-limit logging

`bot/src/telegram-bot.ts:519-534` — inner transformer that logs 429s and records `bot_telegram_api_errors_total`:

Comment thread bot/src/telegram-bot.ts Outdated
Comment on lines +610 to +613
// Log Telegram API errors, especially 429 rate limits, and count every API
// call attempt by binding (inner transformer — sees each individual attempt
// before autoRetry decides whether to retry)
bot.api.config.use(createApiErrorLoggingTransformer({ bindings: config.bindings }) as Parameters<typeof bot.api.config.use>[0]);
… counter

Two observability improvements for diagnosing 429 cascades on
sendMessageDraft (issue #117). Behavior-only changes for diagnosis;
no retry/throttle modifications.

- Rate-limited and HTTP-error warn logs from the inner Telegram API
  transformer now carry chat_id and message_thread_id when the API
  payload has them. Methods without a chat_id (getUpdates, getMe,
  setWebhook, etc.) continue to log cleanly. This makes a single grep
  over the bot stderr log sufficient to attribute a burst to its
  binding, without cross-referencing JSONL modification times.

- New Prometheus counter bot_telegram_api_calls_total{method, binding}
  exposes the denominator for error-rate ratios. The binding label
  uses the resolved binding's label for chat-targeted methods, "none"
  for chatless methods, and "unbound" for chat-targeted calls whose
  chat_id matches no configured binding.

- README documents both new diagnostic surfaces. Tests cover the
  chat_id formatting branch, the counter increment on success and
  429, and the sentinel-label branches.

This branch was force-pushed to scrub PII flagged by the repository's
Gitleaks scan: a real numeric chat_id in tests and README examples,
the production binding label "Yulia DM", and one home-directory path
in the plan reference section. The functional changes are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fitz123 fitz123 force-pushed the feat/telegram-api-chat-id-log branch from 2e6022a to a9ff6f4 Compare May 15, 2026 21:34
- bot/src/telegram-bot.ts: import grammy's Transformer type and use it
  as the return type of createApiErrorLoggingTransformer. Removes the
  Parameters<typeof bot.api.config.use>[0] cast at the call site so the
  transformer participates in grammy's per-method type checking. This
  immediately surfaced 5 mock-payload type errors in tests where
  sendMessageDraft was called without the required draft_id field.

- bot/src/__tests__/telegram-bot.test.ts: add the now-required draft_id
  to the affected sendMessageDraft mock payloads.

- docs/plans/.../2026-05-15-telegram-api-call-observability.md: add a
  note at the top clarifying that Reference sections describe the
  pre-PR state (file:line citations point at the legacy inline
  transformer that was refactored into the factory during
  implementation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 21:42
@fitz123 fitz123 merged commit a9e6bb4 into main May 15, 2026
4 checks passed
@fitz123 fitz123 deleted the feat/telegram-api-chat-id-log branch May 15, 2026 21:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment thread bot/src/telegram-bot.ts
Comment on lines +84 to +89
numericChatId = ctx.chatId;
} else {
// Numeric strings (rare but valid per Bot API) — resolve normally.
// Channel-username strings (@-prefixed) cannot match a numeric binding.
const parsed = Number(ctx.chatId);
if (!Number.isInteger(parsed)) return BINDING_LABEL_UNBOUND;
Comment thread bot/src/telegram-bot.ts
}
const binding = resolveBinding(numericChatId, bindings, ctx.messageThreadId);
if (!binding) return BINDING_LABEL_UNBOUND;
return binding.label ?? binding.agentId;
fitz123 added a commit that referenced this pull request May 27, 2026
…127)

## Summary

Closes #117. Wrap `autoRetry` in a method-gated transformer that
bypasses retries for `sendMessageDraft` only. Drafts are cosmetic
fire-and-forget calls — a 3-10 s late retry lands on a stream that has
already moved on, and the 5× amplification turns one rate-limited draft
into five log lines and five Prometheus error-counter increments. Every
other Telegram API method retains the full `AUTO_RETRY_OPTIONS` retry
behavior unchanged.

Also fixes the misleading "Drafts are cosmetic — no rate limits" comment
in `bot/src/stream-relay.ts` that originally motivated the 300 ms
debounce, and adds a one-line README caveat that
`bot_telegram_api_calls_total{method="sendMessageDraft"}` no longer
counts per-attempt (one call = one increment, vs other methods'
up-to-5).

## Plan and pipeline

- Plan: `docs/plans/2026-05-26-telegram-draft-autoretry-skip.md`
- Ralphex pipeline ran: implementation + 7-agent multi-review (claude)
passed clean, then a 2nd-pass claude review picked up two findings — one
false positive (rejected), one acted on (README per-attempt caveat).
Codex external review: `NO ISSUES FOUND`.
- Pipeline terminated on the final "claude evaluating codex findings"
step due to a Claude session-window limit; all substantive work was
complete by that point.

## What's NOT in scope

- Changing `DRAFT_DEBOUNCE_MS` (Option A from #117). Deferred until
residual `sendMessageDraft` 429 ratio is measured against the new
`bot_telegram_api_calls_total` counter (deployed in #118).

## Test plan

- [x] `cd bot && npx tsc --noEmit` — passes
- [x] `cd bot && npm test` — 1068 / 1068 pass
- [ ] After merge + restart: trigger a long streaming reply in a DM,
confirm `Rate limited: method=sendMessageDraft` warn lines drop ~5× and
`sendMessage`/`sendChatAction` still retry on 429
- [ ] After merge + restart: `curl -s http://127.0.0.1:9091/metrics |
grep bot_telegram_api_calls_total` — verify `sendMessageDraft` rows
still increment (one per logical draft, not per retry)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: fitz123 <fitz123@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants