Skip to content

geoprobe-agent: add Prometheus metrics instrumentation#3371

Merged
nikw9944 merged 3 commits intomainfrom
nikw9944/doublezero-2941
Mar 24, 2026
Merged

geoprobe-agent: add Prometheus metrics instrumentation#3371
nikw9944 merged 3 commits intomainfrom
nikw9944/doublezero-2941

Conversation

@nikw9944
Copy link
Contributor

Resolves: #2941

Summary of Changes

  • Add opt-in Prometheus metrics to the geoprobe agent via --metrics-enable and --metrics-addr flags
  • Instrument key operational signals: error counts by type, offset receive/reject/send counters, discovery gauges, and duration histograms for discovery and measurement cycles
  • Expose a /metrics HTTP endpoint using promhttp.Handler() when enabled

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 1 +43 / -0 +43
Scaffolding 1 +112 / -0 +112

Mostly metric definitions (scaffolding); the core logic is lightweight instrumentation calls wired into existing code paths.

Key files (click to expand)
  • controlplane/telemetry/internal/metrics/geolocation_metrics.go — new file defining all Prometheus metric vars, constants for error types and rejection reasons
  • controlplane/telemetry/cmd/geoprobe-agent/main.go — adds --metrics-enable/--metrics-addr flags, HTTP metrics server goroutine, and metric increment/observe calls at key points

Testing Verification

  • Verified make go-build compiles cleanly with the new metrics package and imports
  • Verified make go-lint passes with no new warnings
  • Metrics are opt-in (--metrics-enable defaults to false), so existing deployments are unaffected

Copy link
Contributor

@ben-dz ben-dz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A few notes, but could be merged as is.

Future improvement might be bubble up reporting from the signed twamp reflector and the outbound pinger.

@nikw9944
Copy link
Contributor Author

Addressed PR feedback:

  • Removed unused error type constants (GeoProbeErrorTypeParentDiscovery, GeoProbeErrorTypeTargetDiscovery)
  • Added custom histogram buckets for discovery operations (0.1s–60s) and measurement cycles (0.5s–120s) to better capture RPC-heavy operation timings

@nikw9944 nikw9944 force-pushed the nikw9944/doublezero-2941 branch from cd18020 to 2fd540d Compare March 24, 2026 20:48
Add build info, error counters, operation duration histograms, offset
tracking counters, and discovery gauges to the geoprobe-agent. Metrics
are defined in a new geolocation_metrics.go file and exposed via an
HTTP /metrics endpoint when --metrics-enable is set.
Remove unused error type constants (parent_discovery, target_discovery)
and add custom histogram buckets for discovery and measurement cycle
duration metrics to better capture RPC-heavy operations.
@nikw9944 nikw9944 force-pushed the nikw9944/doublezero-2941 branch from 2fd540d to e448651 Compare March 24, 2026 20:49
@nikw9944 nikw9944 enabled auto-merge (squash) March 24, 2026 20:50
@nikw9944 nikw9944 merged commit ee19a68 into main Mar 24, 2026
33 checks passed
@nikw9944 nikw9944 deleted the nikw9944/doublezero-2941 branch March 24, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Prometheus metrics to geoProbe-Agent

2 participants