New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log get PVC/PV errors in MaxPD predicate only at high verbosity #48226
Conversation
2497034
to
8158de8
Compare
8158de8
to
9fbb705
Compare
@@ -237,7 +236,7 @@ func (c *MaxPDVolumeCountChecker) filterVolumes(volumes []v1.Volume, namespace s | |||
if err != nil { | |||
// if the PVC is not found, log the error and count the PV towards the PV limit | |||
// generate a random volume ID since its required for de-dup | |||
utilruntime.HandleError(fmt.Errorf("Unable to look up PVC info for %s/%s, assuming PVC matches predicate when counting limits: %v", namespace, pvcName, err)) | |||
glog.V(4).Infof("Unable to look up PVC info for %s/%s, assuming PVC matches predicate when counting limits: %v", namespace, pvcName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HandleError has rate limiting of max one error per mili-sec. It also shows call-stack. I am not so sure how all those will be useful, but at least converting an error log to an info log does not seem right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it is not an error, it is basically useless to the reader. It's misleading because error implies something has broken but the error is ignored by the code. All this code does is it counts the # of gce PVs/PVCs used on a given node, the reader doesn't care if there was an error getting just one of those PVs/PVCs, the count will continue regardless, so this quickly becomes spam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. It may still deserve a warning log. More importantly we are going to completely remove this in productions logs, as it is reported only at V(4).
@davidopp What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you would warn in this case, it looks like spam for the control flow.
@kubernetes/sig-scheduling-pr-reviews PTAL, the log spam is severe, up to 40000 entries a day from a single PVC! |
I agree this can cause lot of spam and we have a bug opened for this: https://bugzilla.redhat.com/show_bug.cgi?id=1475558 I am personally fine with this change. Though another option to explore would be if these errors could be aggregated in utilruntime.HandleError, or in scheduler itself. So instead of output thousands of the same error, we could just output an error message and its count. |
Question 1: The answers to those 3 questions should make it clear the right path forward. |
|
/assign |
Need approval. This is not an error. It is not something the cluster operator (the person reading the logs) can or should do anything about. It is purely debug. V(4) might be too low. Heck we could probably get rid of it entirely and be fine. This generates >80% of all logs on nodes in a real cluster. Gigs and gigs of useless repeating messages at V(2). |
/retest |
1 similar comment
/retest |
@davidopp @timothysc This is in the 1.8 milestone. Can we get approval on this? /retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@timothysc thanks! can i get "approve no-issue"? |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eparis, timothysc, wongma7 Associated issue requirement bypassed by: timothysc The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
1 similar comment
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352) |
Automatic merge from submit-queue UPSTREAM: 48226: Log get PVC/PV errors in MaxPD predicate only at high verbosity. kubernetes/kubernetes#48226 xref https://bugzilla.redhat.com/show_bug.cgi?id=1475558 @sjenning @eparis @derekwaynecarr
The error is effectively ignored since even if a PVC/PV doesn't exist it gets counted, and it's rarely actionable either so let's reduce the verbosity.
Basically a user somewhere on the cluster will have to have done something "wrong" for this error to occur, e.g. if *,while the pod is running, pod's PVC is deleted or pods' PVC's PV is deleted. And from that point forward the logs will be spammed every time the predicate is evaluated on a node where that "wrong" pod exists
Release note: