Skip to content

fix(networking): drop metrics port from external Service#120

Merged
bdchatham merged 1 commit intomainfrom
fix/servicemonitor-per-pod-selector
Apr 22, 2026
Merged

fix(networking): drop metrics port from external Service#120
bdchatham merged 1 commit intomainfrom
fix/servicemonitor-per-pod-selector

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented Apr 22, 2026

Summary

Drops metrics:26660 from the external (public-facing) Service's port list, fixing a double-scrape of every SeiNode pod caused by the ServiceMonitor selector matching both the per-pod and external Services.

Context

Recommended by a K8s review of the duplicate-scrape observation:

  • Per-pod headless Service (<sndname>-<ordinal>) carries labels {sei.io/nodedeployment, sei.io/node, sei.io/revision} and exposes all ports (including metrics)
  • Aggregate external Service (<sndname>-external) carries only {sei.io/nodedeployment} but was generated with the same port list — including metrics
  • ServiceMonitor's selector.matchLabels: {sei.io/nodedeployment: ...} matches both → prometheus-operator targets both → Prometheus scrapes each pod twice

Observed live on a gprusak soak test: 3 RPC pods rendered as 6 series on tendermint_consensus_latest_block_height count panels.

The internal ClusterIP aggregate (<sndname>-internal) correctly omits the metrics port already, so this isn't a cross-cutting issue — only the external path was wrong.

Approach

Added externalPortsForMode(mode) that wraps portsForMode(mode) and filters out metrics. Tactical — a better home for this rule is sei-protocol/sei-config next to NodePortsForMode. Followed up with sei-config#7 to add a proper ExternalServicePorts helper; this PR's filter will call through to it once that lands (TODO comment on the helper).

Scope

  • internal/controller/nodedeployment/networking.go — new externalPortsForMode helper; generateExternalService uses it.
  • internal/controller/nodedeployment/networking_test.go — updated two existing expectations (AllPortsForFullMode now has 6 ports not 7; ValidatorModePorts is p2p only).
  • No CRD changes. No change to per-pod Service. No change to internal aggregate Service.

Test plan

  • go test ./internal/controller/nodedeployment/ passes.
  • After merge + image bump + Flux reconcile: kubectl -n gprusak describe svc v6-5-run-3-rpc-0-external shows no metrics port; count by(pod)(tendermint_consensus_latest_block_height{chain_id="v6-5-run-3"}) returns 1 per pod instead of 2.

Refs: sei-protocol/sei-config#7 (upstream helper), #116 / #118 (prior component / chain_id label work).

🤖 Generated with Claude Code

The external (public-facing) Service was carrying the same port list as
the per-pod headless Service, including `metrics:26660`. ServiceMonitor
selectors then matched both Services, so every pod was scraped twice —
visible as 2x the expected series count on dashboards (e.g. 3 RPC pods
rendering as 6 on count-based panels).

Adds a tactical filter in the external-service generator that drops
`metrics` from the port list. Per-pod Services remain scrape targets
unchanged.

Longer-term, the "which ports belong on an external Service" rule
should live in sei-config next to the port definitions themselves; see
sei-protocol/sei-config#7 for the upstream helper proposal. This PR's
externalPortsForMode will call through to that helper once it lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit ab7faf0 into main Apr 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant