Skip to content

feat(diagnostics): add outbound delivery lifecycle events#71471

Merged
vincentkoc merged 2 commits intomainfrom
feat/otel-delivery-diagnostics
Apr 25, 2026
Merged

feat(diagnostics): add outbound delivery lifecycle events#71471
vincentkoc merged 2 commits intomainfrom
feat/otel-delivery-diagnostics

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • add message.delivery.started, message.delivery.completed, and message.delivery.error diagnostic events at the shared outbound delivery seam
  • export delivery attempts as openclaw.message.delivery.started metrics, openclaw.message.delivery.duration_ms histograms, and openclaw.message.delivery spans
  • update stability snapshots, logging docs, changelog, SDK API baseline, and focused tests

Privacy / Cardinality

  • delivery events include channel, deliveryKind, optional diagnostic sessionKey, duration/result count, and bounded error category
  • OTEL export keeps only low-cardinality attrs: channel, delivery kind, outcome, error category, and result count
  • intentionally omits body text, recipient, room/conversation id, account id, message id, raw channel result, phone/email, and media paths

Context

Validation

  • pnpm docs:list
  • pnpm test src/infra/outbound/deliver.test.ts extensions/diagnostics-otel/src/service.test.ts src/logging/diagnostic.test.ts
  • pnpm plugin-sdk:api:check
  • pnpm check:changed after expanding the worktree with gwt sparse full
  • after final rebase: targeted tests + pnpm plugin-sdk:api:check

@vincentkoc vincentkoc self-assigned this Apr 25, 2026
@vincentkoc vincentkoc marked this pull request as ready for review April 25, 2026 07:57
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR adds three new diagnostic events (message.delivery.started, message.delivery.completed, message.delivery.error) at the shared outbound delivery seam in deliver.ts, exports them as OTEL counters, histograms, and spans in the diagnostics-otel extension, and updates docs, stability snapshots, and tests accordingly. The implementation is well-structured, follows established patterns in the codebase, and deliberately omits high-cardinality identifiers (message body, recipient, room ID, media paths) from diagnostic payloads.

Confidence Score: 5/5

Safe to merge — changes are additive, well-tested, and consistent with established patterns in the codebase.

No P0 or P1 issues found. The implementation correctly uses guard flags (deliveryStarted/deliveryFinished) to prevent duplicate or missing diagnostic events, follows the same spanWithDuration + span.end(evt.ts) pattern used by all other OTEL event handlers, and the privacy controls (no body text, recipient, room ID, or media path in spans/metrics) are verified by targeted assertions in the tests.

No files require special attention.

Reviews (1): Last reviewed commit: "Merge branch 'main' into feat/otel-deliv..." | Re-trigger Greptile

@openclaw-barnacle openclaw-barnacle Bot added docs Improvements or additions to documentation extensions: diagnostics-otel Extension: diagnostics-otel size: M maintainer Maintainer-authored PR labels Apr 25, 2026
@vincentkoc vincentkoc force-pushed the feat/otel-delivery-diagnostics branch from f614a30 to 5f990d9 Compare April 25, 2026 08:18
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 25, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Diagnostic message delivery events expose sessionKey/policyKey to plugin SDK listeners
1. 🟡 Diagnostic message delivery events expose sessionKey/policyKey to plugin SDK listeners
Property Value
Severity Medium
CWE CWE-201
Location src/infra/outbound/deliver.ts:374-405

Description

The new outbound delivery diagnostics emit message.delivery.* events that can include sessionKey derived from mirror.sessionKey, session.key, or session.policyKey.

Because the plugin SDK publicly exports onDiagnosticEvent, any installed plugin can subscribe to these diagnostic events and read the sessionKey value.

  • Input/source: params.mirror.sessionKey, params.session.key, or params.session.policyKey
  • Propagation: attached verbatim to diagnostic payloads via emitDiagnosticEvent({ ..., sessionKey })
  • Sink/exposure: onDiagnosticEvent is exported on the public plugin SDK surface, enabling third-party plugins to receive the full diagnostic event payload.

Vulnerable code (new behavior):

return params.mirror?.sessionKey ?? params.session?.key ?? params.session?.policyKey;
...
emitDiagnosticEvent({
  type: "message.delivery.started",
  channel: params.channel,
  deliveryKind: params.deliveryKind,
  ...(params.sessionKey ? { sessionKey: params.sessionKey } : {}),
});

While the bundled OTEL exporter and stability snapshots currently avoid exporting sessionKey, this is still a data exposure to any diagnostic listener (including third-party plugins) that consumes DiagnosticEventPayload directly.

Recommendation

Avoid exposing stable per-user/session identifiers to untrusted diagnostic listeners.

Options:

  1. Do not include sessionKey in onDiagnosticEvent payloads (keep it only for internal listeners) by splitting the event bus into internal vs public, or by stripping sensitive fields before dispatch to public listeners.

  2. Hash/anonymize the session identifier for diagnostics (rotating salt) so it remains useful for correlation but cannot be used as an identifier outside the process.

Example (strip before public listeners):

export function onDiagnosticEvent(listener: (evt: DiagnosticEventPayload) => void): () => void {
  return onInternalDiagnosticEvent((event) => {
    if (event.type === "log.record") return;// Remove sensitive identifiers from public plugin surface
    const { sessionKey, sessionId, ...rest } = event as any;
    listener(rest);
  });
}

Also consider removing session.policyKey from diagnostic correlation entirely if it may represent an authorization/policy secret.


Analyzed PR: #71471 at commit 5f990d9

Last updated on: 2026-04-25T08:20:49Z

@vincentkoc vincentkoc merged commit bd32b1a into main Apr 25, 2026
65 checks passed
@vincentkoc vincentkoc deleted the feat/otel-delivery-diagnostics branch April 25, 2026 08:26
vincentkoc added a commit that referenced this pull request Apr 25, 2026
… Unreleased

Three of my (vincentkoc) entries were missing closing PR refs, and
several maintainer-fix entries were missing credit for the user who
reported the underlying issue:

- Diagnostics/OTEL outbound delivery: add (#71471) and credit @jlapenna
  whose #70424 framed the broader tracing work.
- Cron malformed legacy jobs: add (#71509).
- OpenAI/Codex OAuth region failures: add (#71501) and credit reporter
  @wulala-xjj (#51175).
- Telegram duplicate pollers: credit reporter @Co-Messi (#56230).
- MCP/CLI one-shot retire: credit reporter @spartoviMD (#71457).
- OpenAI/Codex image baseUrl canonicalize: credit reporter @GodsBoy
  (#71460).
- Feishu TTS Ogg/Opus: credit reporters @sg1416-zg (#61249) and
  @ycjlb2023-peteryi (#37868).
- MiniMax TTS portal OAuth: credit reporter @zx15210404690-hash
  (#55017).
- MCP config reload disposal: credit reporter @xieyuanqing (#60656).
Angfr95 pushed a commit to Angfr95/openclaw that referenced this pull request Apr 25, 2026
… Unreleased

Three of my (vincentkoc) entries were missing closing PR refs, and
several maintainer-fix entries were missing credit for the user who
reported the underlying issue:

- Diagnostics/OTEL outbound delivery: add (openclaw#71471) and credit @jlapenna
  whose openclaw#70424 framed the broader tracing work.
- Cron malformed legacy jobs: add (openclaw#71509).
- OpenAI/Codex OAuth region failures: add (openclaw#71501) and credit reporter
  @wulala-xjj (openclaw#51175).
- Telegram duplicate pollers: credit reporter @Co-Messi (openclaw#56230).
- MCP/CLI one-shot retire: credit reporter @spartoviMD (openclaw#71457).
- OpenAI/Codex image baseUrl canonicalize: credit reporter @GodsBoy
  (openclaw#71460).
- Feishu TTS Ogg/Opus: credit reporters @sg1416-zg (openclaw#61249) and
  @ycjlb2023-peteryi (openclaw#37868).
- MiniMax TTS portal OAuth: credit reporter @zx15210404690-hash
  (openclaw#55017).
- MCP config reload disposal: credit reporter @xieyuanqing (openclaw#60656).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation extensions: diagnostics-otel Extension: diagnostics-otel maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant