Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-proxy] add log verbosity to endpoint topology hint loop. #123322

Merged
merged 1 commit into from Feb 15, 2024

Conversation

bjhaid
Copy link
Contributor

@bjhaid bjhaid commented Feb 15, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

We enabled topology hint on one of our services and this log line was emitted ~92 million times in one day from one cluster tripping our log quota for that cluster, as it is the log line cannot be disabled via the -v flag because it does not specify verbosity.

I think more log locations need to set verbosity at which they are logged, but this one is currently hurting the most.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 15, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @bjhaid. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 15, 2024
@k8s-ci-robot k8s-ci-robot added area/kube-proxy sig/network Categorizes an issue or PR as relevant to SIG Network. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 15, 2024
@thockin
Copy link
Member

thockin commented Feb 15, 2024

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9acea8b3a387700d16f806d8ba2df268b862473b

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bjhaid, priestjim, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 15, 2024
@danwinship
Copy link
Contributor

Does this really solve anything? It seems like the real problem is that we need to throttle this notification

@@ -166,7 +166,7 @@ func canUseTopology(endpoints []Endpoint, svcInfo ServicePort, nodeLabels map[st
continue
}
if endpoint.ZoneHints().Len() == 0 {
klog.InfoS("Skipping topology aware endpoint filtering since one or more endpoints is missing a zone hint", "endpoint", endpoint)
klog.V(2).InfoS("Skipping topology aware endpoint filtering since one or more endpoints is missing a zone hint", "endpoint", endpoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we switch all log statements to V(2) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @robscott

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, outside of all the startup logs most or all log sites should be V(N)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to change it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the two other logs statements in this function, they will have the same issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to change it?

I didn't want to start combing through all kube-proxy logs sites to update in this PR

I mean, the two other logs statements in this function, they will have the same issue

I can update those as well, just started with the most annoying, the other 2 combined for ~6M lines/day which is way less than the one I started with. I'll push changes to update the other log statements as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the other 2 locations in this file. If I can find time I'll look into sending a PR for all of kube-proxy, but I can't guarantee that will happen today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems like a good improvement, thanks @bjhaid!

@bjhaid
Copy link
Contributor Author

bjhaid commented Feb 15, 2024

Does this really solve anything? It seems like the real problem is that we need to throttle this notification

the logs is maybe useful for debugging, but it is noise for me as a cluster operator. I believe one should be able to turn off all none fatal/bootup logs. So yes while throttling might help, the log and a bunch of logs from kube-proxy today is noise, I almost never look at them unless I am troubleshooting stuff and I'll like to be able to turn most of them of and on as I wish.

We enabled topology hint on one of our services and this log line was
emitted ~92 million times in one day from one cluster tripping our log
quota for that cluster, as it is the log line cannot be disabled via the
`-v` flag because it does not specify verbosity.

I think more log locations need to set verbosity at which they are
logged, but this one is currently hurting the most.
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2024
@aojea
Copy link
Member

aojea commented Feb 15, 2024

/ok-to-test

@danwinship I think that is fair ask to be able to disable them , are you ok with this?

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 15, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 15, 2024
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bjhaid! This is a great fix - I'd also be open to @danwinship's suggestion of some kind of throttling here, but setting a log level like this does seems like a great improvement that we could build on in the future.

/lgtm

@@ -166,7 +166,7 @@ func canUseTopology(endpoints []Endpoint, svcInfo ServicePort, nodeLabels map[st
continue
}
if endpoint.ZoneHints().Len() == 0 {
klog.InfoS("Skipping topology aware endpoint filtering since one or more endpoints is missing a zone hint", "endpoint", endpoint)
klog.V(2).InfoS("Skipping topology aware endpoint filtering since one or more endpoints is missing a zone hint", "endpoint", endpoint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems like a good improvement, thanks @bjhaid!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e854e0f912af8cff0490b1b035871d0ba41664dc

@robscott
Copy link
Member

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 15, 2024
@k8s-ci-robot k8s-ci-robot merged commit ad6477e into kubernetes:master Feb 15, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 15, 2024
@bjhaid bjhaid deleted the bjhaid-topology-verbosity branch February 15, 2024 21:01
k8s-ci-robot added a commit that referenced this pull request Apr 25, 2024
…f-#123322-upstream-release-1.28

Automated cherry pick of #123322: add log verbosity to endpoint topology hint
k8s-ci-robot added a commit that referenced this pull request Apr 25, 2024
…f-#123322-upstream-release-1.29

Automated cherry pick of #123322: add log verbosity to endpoint topology hint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants