fix(networking): drop metrics port from external Service#120
Merged
Conversation
The external (public-facing) Service was carrying the same port list as the per-pod headless Service, including `metrics:26660`. ServiceMonitor selectors then matched both Services, so every pod was scraped twice — visible as 2x the expected series count on dashboards (e.g. 3 RPC pods rendering as 6 on count-based panels). Adds a tactical filter in the external-service generator that drops `metrics` from the port list. Per-pod Services remain scrape targets unchanged. Longer-term, the "which ports belong on an external Service" rule should live in sei-config next to the port definitions themselves; see sei-protocol/sei-config#7 for the upstream helper proposal. This PR's externalPortsForMode will call through to that helper once it lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drops
metrics:26660from the external (public-facing) Service's port list, fixing a double-scrape of every SeiNode pod caused by the ServiceMonitor selector matching both the per-pod and external Services.Context
Recommended by a K8s review of the duplicate-scrape observation:
<sndname>-<ordinal>) carries labels{sei.io/nodedeployment, sei.io/node, sei.io/revision}and exposes all ports (includingmetrics)<sndname>-external) carries only{sei.io/nodedeployment}but was generated with the same port list — includingmetricsselector.matchLabels: {sei.io/nodedeployment: ...}matches both → prometheus-operator targets both → Prometheus scrapes each pod twiceObserved live on a gprusak soak test: 3 RPC pods rendered as 6 series on
tendermint_consensus_latest_block_heightcount panels.The internal ClusterIP aggregate (
<sndname>-internal) correctly omits the metrics port already, so this isn't a cross-cutting issue — only the external path was wrong.Approach
Added
externalPortsForMode(mode)that wrapsportsForMode(mode)and filters outmetrics. Tactical — a better home for this rule issei-protocol/sei-confignext toNodePortsForMode. Followed up with sei-config#7 to add a properExternalServicePortshelper; this PR's filter will call through to it once that lands (TODO comment on the helper).Scope
internal/controller/nodedeployment/networking.go— newexternalPortsForModehelper;generateExternalServiceuses it.internal/controller/nodedeployment/networking_test.go— updated two existing expectations (AllPortsForFullModenow has 6 ports not 7;ValidatorModePortsisp2ponly).Test plan
go test ./internal/controller/nodedeployment/passes.kubectl -n gprusak describe svc v6-5-run-3-rpc-0-externalshows nometricsport;count by(pod)(tendermint_consensus_latest_block_height{chain_id="v6-5-run-3"})returns 1 per pod instead of 2.Refs: sei-protocol/sei-config#7 (upstream helper), #116 / #118 (prior component / chain_id label work).
🤖 Generated with Claude Code