Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iptables restore failure metric #81210

Merged
merged 1 commit into from Aug 9, 2019

Conversation

@figo
Copy link
Member

commented Aug 9, 2019

as mentioned in issue #80061, in iptables lock contention case,
we can see increasing rate of iptables restore failures because it trying
to grab iptables file lock, this new metric can provide more insight to administrators,
metrics will be collected in both kube-proxy iptables and ipvs modes

Signed-off-by: Hui Luo luoh@vmware.com

What type of PR is this?
/kind feature

What this PR does / why we need it:
added an new Prometheus counter metric "sync_proxy_rules_iptables_restore_failures_total" for kube-proxy iptables-restore failures (both ipvs and iptables modes)

Which issue(s) this PR fixes:
Fixes #80061

Special notes for your reviewer:
N/A

Does this PR introduce a user-facing change?:
added an new Prometheus counter metric "sync_proxy_rules_iptables_restore_failures_total" for kube-proxy iptables-restore failures (both ipvs and iptables modes)

added an new Prometheus counter metric "sync_proxy_rules_iptables_restore_failures_total" for kube-proxy iptables-restore failures (both ipvs and iptables modes)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig network
/cc @squeed @vllry @andrewsykim

@figo

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

/test pull-kubernetes-integration

@figo

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

/test pull-kubernetes-bazel-build

@squeed

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

Thanks for this PR!
Are there any other proxy implementations where this metric is useful? I know the userspace proxier executes iptables. What about ipvs?

@figo figo force-pushed the figo:metrics branch from 79491f8 to 07e790f Aug 9, 2019

@figo

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

@squeed when i look at the code, only iptables-restore is grabbing a file lock via grabIptablesLocks(), i think this is where contention can lead to failure, i don't see iptables-save grabbing file lock at all except acquiring mutex, do you see any evidence that iptables-save is failing?

at this moment, i only keep the iptables-restore failure metric, thx

@figo figo force-pushed the figo:metrics branch from 07e790f to f81844a Aug 9, 2019

@figo figo changed the title Add iptable save/restore failure metrics Add iptable restore failure metric Aug 9, 2019

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

Should we increment the iptables restore metrics for clean up https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L429-L433 ?

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

Are there plans for iptables-save metrics? maybe in a follow-up PR?

Add iptables restore failure metrics
As mentioned in issue #80061, in iptables lock contention case,
we can see increasing rate of iptables restore failures because it
need to grab iptables file lock.

The failure metric can provide administrators more insight

Metrics will be collected in kube-proxy iptables and ipvs modes

Signed-off-by: Hui Luo <luoh@vmware.com>

@figo figo force-pushed the figo:metrics branch from f81844a to a2ef00c Aug 9, 2019

@figo figo changed the title Add iptable restore failure metric Add iptables restore failure metric Aug 9, 2019

@figo

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

Should we increment the iptables restore metrics for clean up https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L429-L433 ?

cleanup should not been called frequently, i added it as you suggested because it does not hurt, thx

@figo

This comment has been minimized.

Copy link
Member Author

commented Aug 9, 2019

Are there plans for iptables-save metrics? maybe in a follow-up PR?

iptables-save does not acquire file lock, so it less likely to fail due to contention as iptables-restore, it could be added if there is concrete use case lead to frequent ipstables-save failures, thx

@andrewsykim

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

/lgtm
/assign @thockin

The release notes section needs to be fixed

@thockin
Copy link
Member

left a comment

Thanks!

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Aug 9, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: figo, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 4069042 into kubernetes:master Aug 9, 2019

22 of 23 checks passed

pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
cla/linuxfoundation figo authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-iscsi Skipped.
pull-kubernetes-e2e-gce-iscsi-serial Skipped.
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.