-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-proxy metrics cleanup (and stuff) #124557
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/proxy/metrics/metrics.go
Outdated
switch mode { | ||
case kubeproxyconfig.ProxyModeIPTables: | ||
legacyregistry.MustRegister(SyncFullProxyRulesLatency) | ||
legacyregistry.MustRegister(SyncPartialProxyRulesLatency) | ||
legacyregistry.MustRegister(IptablesRestoreFailuresTotal) | ||
legacyregistry.MustRegister(IptablesPartialRestoreFailuresTotal) | ||
legacyregistry.MustRegister(IptablesRulesTotal) | ||
legacyregistry.MustRegister(IptablesRulesLastSync) | ||
|
||
case kubeproxyconfig.ProxyModeIPVS: | ||
legacyregistry.MustRegister(IptablesRestoreFailuresTotal) | ||
|
||
case kubeproxyconfig.ProxyModeNFTables: | ||
// FIXME: should not use the iptables-specific metric | ||
legacyregistry.MustRegister(IptablesRestoreFailuresTotal) | ||
|
||
case kubeproxyconfig.ProxyModeKernelspace: | ||
// currently no winkernel-specific metrics | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgrisonnet since you helped with kube-proxy metrics on another PR...
does this make sense? It seemed wrong to me to be registering metrics in modes where they don't get used (eg, registering the "iptables-restore failure" metric in windows or nftables mode), but maybe there's a rule that says that we should always register all the same metrics, even if some of them will always be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no such rule, registering conditionally like you did sounds like a cleaner approach to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from sig-instrumentation
Windows proxy metric registration was in a separate file, which had led to some metrics (eg the new ProxyHealthzTotal and ProxyLivezTotal) not being registered for Windows even though they were implemented by platform-generic code. (A few other metrics were neither registered on, nor implemented on Windows, and that's probably a bug.) Also, beyond linux-vs-windows, make it clearer which metrics are specific to individual backends.
pkg/proxy/nftables/proxier.go
Outdated
|
||
// staleChains is now incorrect since we didn't actually flush the | ||
// chains in it. We can recompute it next time. | ||
proxier.staleChains = make(map[string]time.Time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used to do this thing to avoid relocating memory, no?
proxier.staleChains = proxier.staleChains[:0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... you can't do that with a map through...
ah, no, apparently as of golang 1.21 you can do clear(proxier.staleChains)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest, I didn''t realize it was a map when I commented, just saw the make
and remembered the allocation problems ... glad it helped anyway
lgtm once comment #124557 (comment) is resolved gofmt errors are legit |
2dc77f2
to
7f9a3a9
Compare
NFTablesSyncFailuresTotal = metrics.NewCounter( | ||
&metrics.CounterOpts{ | ||
Subsystem: kubeProxySubsystem, | ||
Name: "sync_proxy_rules_nftables_sync_failures_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kubeProxySubsystem
will add kubeproxy
prefix and the final metric would be kubeproxy_ sync_proxy_rules_nftables_sync_failures_total.
How about kubeproxy_ sync_nftables_rules_failures_total or kubeproxy_ nftables_sync_rules_failures_total?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding iptables metric is sync_proxy_rules_iptables_restore_failures_total
... I was trying to keep it parallel with that.
Admittedly, I did drop the second sync
originally, making it sync_proxy_rules_nftables_failures_total
, but then when I added the cleanup failures metric, it seemed unbalanced/ambiguous, so I put it back...
I don't have really strong opinions about the names though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(FWIW, it makes more sense if you realize that the sync_proxy_rules
prefix refers to the syncProxyRules
function that the metric is coming from...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm bad at naming ... but consistency sounds better for people that already has scripts or dashboards so they just need to copy paste and s/iptables/nftables/ ... I don't think anybody reference this by memory
NFTablesCleanupFailuresTotal = metrics.NewCounter( | ||
&metrics.CounterOpts{ | ||
Subsystem: kubeProxySubsystem, | ||
Name: "sync_proxy_rules_nftables_cleanup_failures_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something similar for this maybe?
If the sync fails, don't try to cleanup, since it's guaranteed to fail too.
7f9a3a9
to
c4dd2c5
Compare
/lgtm (all threads resolved) /hold for pull-kubernetes-e2e-capz-windows-master |
/test pull-kubernetes-e2e-capz-windows-master |
/lgtm |
LGTM label has been added. Git tree hash: 6b1b4fa81ab759e15f83a101f8651169855ce23d
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind bug
/kind cleanup
What this PR does / why we need it:
Um, yeah, it's the not the most focused PR ever...
Does this PR introduce a user-facing change?
/sig network
/area kube-proxy
/assign @aojea @aroradaman