Skip to content

OpenTelemetry metrics for WebFinger and actor discovery #739

@dahlia

Description

@dahlia

Summary

Add OpenTelemetry metrics for WebFinger lookup and actor discovery paths, including lookup counts, durations, and outcomes.

Current state

Fedify already instruments webfinger.lookup, webfinger.handle, and activitypub.get_actor_handle as spans. These spans are useful when debugging one lookup, but they do not provide sample-independent metrics for discovery reliability or latency.

Discovery failures are operationally noisy. A server can appear broken to users when handle resolution is slow, blocked by a remote firewall, or repeatedly returning malformed resource descriptors. Operators need aggregate metrics for those paths.

Proposed solution

Once #619 adds metrics support, add counters and histograms for WebFinger and actor discovery.

Proposed instruments:

  • webfinger.lookup: counter, incremented for outgoing WebFinger lookup attempts.
  • webfinger.lookup.duration: histogram, recording outgoing lookup duration in milliseconds.
  • webfinger.handle: counter, incremented for incoming WebFinger requests handled by Fedify.
  • webfinger.handle.duration: histogram, recording request handling duration in milliseconds.
  • activitypub.actor.discovery: counter, incremented for actor handle discovery attempts.
  • activitypub.actor.discovery.duration: histogram, recording actor discovery duration in milliseconds.

Proposed attributes:

  • webfinger.resource.scheme: acct, https, or another URI scheme.
  • activitypub.discovery.result: resolved, not_found, invalid, network_error, not_acceptable, or error.
  • activitypub.remote.host: hostname only for outgoing lookup targets, when available.
  • http.response.status_code, when a remote HTTP response exists.

Do not include full handles, resource URIs, actor IDs, or lookup URLs as metric attributes.

Scope

  • Instrument outgoing WebFinger lookup APIs.
  • Instrument incoming WebFinger handler paths served by Fedify.
  • Instrument actor-handle lookup paths that resolve handles into actor URLs or actor objects.
  • Keep NodeInfo metrics out of scope unless they share implementation paths with WebFinger handling.
  • Update docs/manual/opentelemetry.md with metric names, units, and cardinality guidance.

Acceptance criteria

  • WebFinger lookup count and duration metrics are emitted for success and failure paths.
  • Incoming WebFinger handling metrics are emitted without exposing the queried resource string as a metric attribute.
  • Actor discovery metrics classify resolved, not-found, invalid, network-error, and thrown-error paths where Fedify can distinguish them.
  • Metrics use host-only remote attributes and avoid full handles or URLs.
  • Tests cover at least one successful WebFinger lookup and one failed lookup.
  • Documentation describes which discovery paths are covered.

Open questions

  • Should incoming WebFinger request metrics be folded into the HTTP request metrics from OpenTelemetry metrics for Fedify HTTP request performance #736, or kept as WebFinger-specific metrics?
  • Should actor discovery metrics distinguish handle-to-URL resolution from URL-to-object resolution?
  • Should NodeInfo lookup metrics be handled in this issue or a separate NodeInfo observability issue?

Metadata

Metadata

Assignees

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions