-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust V level for scheduler messages #17438
Conversation
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
1 similar comment
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
/cc @kubernetes/sig-scalability Started running into #14216 behavior again. I'm pretty confident something is wrong with our journald setup, but in the meantime, this seems to better follow the conventions from https://github.com/kubernetes/kubernetes/blob/master/docs/devel/logging.md |
Labelling this PR as size/XS |
@@ -34,7 +34,7 @@ func calculateScore(requested int64, capacity int64, node string) int { | |||
return 0 | |||
} | |||
if requested > capacity { | |||
glog.Infof("Combined requested resources %d from existing pods exceeds capacity %d on node %s", | |||
glog.V(4).Infof("Combined requested resources %d from existing pods exceeds capacity %d on node %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn between if this should be V(3)
(Extended information about changes) or V(4)
(Debug level verbosity). I veer towards V(3)
as it's extended information even though it's about why the pod didn't land on a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me. Could probably stand to have someone check out V level consistency across the codebase rather than case-by-case. FWIW I believe you get all of the reasons a pod failed to fit on a given node in the other V(2)
statement in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea. I'll check to see if there is an issue already created for that.
57bf149
to
a717857
Compare
Looks good to me. |
/cc @wojtek-t |
lgtm. |
@@ -34,7 +34,7 @@ func calculateScore(requested int64, capacity int64, node string) int { | |||
return 0 | |||
} | |||
if requested > capacity { | |||
glog.Infof("Combined requested resources %d from existing pods exceeds capacity %d on node %s", | |||
glog.V(3).Infof("Combined requested resources %d from existing pods exceeds capacity %d on node %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change it to V(2)?
ok to test |
@spiffxp High level question: do you observe a lot of those logs in your experiments? Those shouldn't be that "popular". |
@wojtek-t Yes, I start seeing more and more of these as the cluster fills up. In my case, they start becoming excessive to the point of blocking after ~1200 / 3000 pods have made it to Running on a 100 m3.medium cluster in AWS. Has the feeling of the algorithm rejecting more and more nodes with each pod scheduled, then forgetting that info when it comes time to schedule the next pod. |
a717857
to
fad1968
Compare
The "Combined requested resources" message becomes excessive as the cluster fills up, drop it down to V(2) Put an explicit V(2) on the only other scheduler Infof call that didn't have V specified already.
GCE e2e test build/test passed for commit a7178574d29e5a624ca5cbd95268133a55132e0c. |
LGTM |
GCE e2e test build/test passed for commit fad1968. |
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
LGTM |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e test build/test passed for commit fad1968. |
Automatic merge from submit-queue |
Auto commit by PR queue bot
…8-upstream-release-1.1 Auto commit by PR queue bot
…k-of-#17438-upstream-release-1.1 Auto commit by PR queue bot
…k-of-#17438-upstream-release-1.1 Auto commit by PR queue bot
The "Combined requested resources" message becomes excessive as
the cluster fills up.
Put an explicit V(2) on the only other scheduler Infof call that didn't
have V specified already.