Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log get PVC/PV errors in MaxPD predicate only at high verbosity #48226

Merged
merged 1 commit into from Sep 13, 2017

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Jun 28, 2017

The error is effectively ignored since even if a PVC/PV doesn't exist it gets counted, and it's rarely actionable either so let's reduce the verbosity.

Basically a user somewhere on the cluster will have to have done something "wrong" for this error to occur, e.g. if *,while the pod is running, pod's PVC is deleted or pods' PVC's PV is deleted. And from that point forward the logs will be spammed every time the predicate is evaluated on a node where that "wrong" pod exists

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 28, 2017
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Jun 28, 2017
@@ -237,7 +236,7 @@ func (c *MaxPDVolumeCountChecker) filterVolumes(volumes []v1.Volume, namespace s
if err != nil {
// if the PVC is not found, log the error and count the PV towards the PV limit
// generate a random volume ID since its required for de-dup
utilruntime.HandleError(fmt.Errorf("Unable to look up PVC info for %s/%s, assuming PVC matches predicate when counting limits: %v", namespace, pvcName, err))
glog.V(4).Infof("Unable to look up PVC info for %s/%s, assuming PVC matches predicate when counting limits: %v", namespace, pvcName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HandleError has rate limiting of max one error per mili-sec. It also shows call-stack. I am not so sure how all those will be useful, but at least converting an error log to an info log does not seem right to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is not an error, it is basically useless to the reader. It's misleading because error implies something has broken but the error is ignored by the code. All this code does is it counts the # of gce PVs/PVCs used on a given node, the reader doesn't care if there was an error getting just one of those PVs/PVCs, the count will continue regardless, so this quickly becomes spam

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. It may still deserve a warning log. More importantly we are going to completely remove this in productions logs, as it is reported only at V(4).
@davidopp What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why you would warn in this case, it looks like spam for the control flow.

@wongma7
Copy link
Contributor Author

wongma7 commented Sep 1, 2017

@kubernetes/sig-scheduling-pr-reviews PTAL, the log spam is severe, up to 40000 entries a day from a single PVC!

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 1, 2017
@aveshagarwal
Copy link
Member

I agree this can cause lot of spam and we have a bug opened for this: https://bugzilla.redhat.com/show_bug.cgi?id=1475558

I am personally fine with this change. Though another option to explore would be if these errors could be aggregated in utilruntime.HandleError, or in scheduler itself. So instead of output thousands of the same error, we could just output an error message and its count.

@eparis
Copy link
Contributor

eparis commented Sep 5, 2017

Question 1: is this a error?
Question 2: who should do something about this condition?
The cluster admin? Some user of kube?
Question 3: how often does that person need to know about this condition?

The answers to those 3 questions should make it clear the right path forward.
@wongma7 can you take a stab at making it obvious to us the answers to those questions?

@wongma7
Copy link
Contributor Author

wongma7 commented Sep 5, 2017

  1. No. get pvc != scheduler error, it is only an indication that there exists a pod on a node trying to use a nonexistent pvc (e.g. the pvc was deleted while the pod was running).
  2. The user who created the pod using the nonexistent pvc. They should edit their pod to use a different pvc or (re)create the pvc the pod is trying to use.
  3. Once

@eparis
Copy link
Contributor

eparis commented Sep 6, 2017

/assign
/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2017
@eparis eparis added this to the v1.8 milestone Sep 6, 2017
@eparis
Copy link
Contributor

eparis commented Sep 6, 2017

Need approval. This is not an error. It is not something the cluster operator (the person reading the logs) can or should do anything about. It is purely debug. V(4) might be too low. Heck we could probably get rid of it entirely and be fine. This generates >80% of all logs on nodes in a real cluster. Gigs and gigs of useless repeating messages at V(2).

@davidopp @timothysc @wojtek-t @k82cn

@dims
Copy link
Member

dims commented Sep 9, 2017

/retest

1 similar comment
@wongma7
Copy link
Contributor Author

wongma7 commented Sep 11, 2017

/retest

@sjenning
Copy link
Contributor

@davidopp @timothysc This is in the 1.8 milestone. Can we get approval on this?

/retest

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@timothysc timothysc self-assigned this Sep 12, 2017
@sjenning
Copy link
Contributor

@timothysc thanks! can i get "approve no-issue"?

@timothysc
Copy link
Member

/approve no-issue

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eparis, timothysc, wongma7

Associated issue requirement bypassed by: timothysc

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2017
@sjenning
Copy link
Contributor

/retest

1 similar comment
@ravisantoshgudimetla
Copy link
Contributor

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 48226, 52046, 52231, 52344, 52352)

@k8s-github-robot k8s-github-robot merged commit 83b4c0a into kubernetes:master Sep 13, 2017
openshift-merge-robot added a commit to openshift/origin that referenced this pull request Sep 16, 2017
Automatic merge from submit-queue

UPSTREAM: 48226: Log get PVC/PV errors in MaxPD predicate only at high verbosity.

kubernetes/kubernetes#48226

xref https://bugzilla.redhat.com/show_bug.cgi?id=1475558

@sjenning @eparis @derekwaynecarr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet