-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466
Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466
Conversation
/hold (Wait until #78465 is merged) |
/assign @MaciekPytel This PR is a re-apply of #76396. We pushed a fix in klog. |
/test pull-kubernetes-kubemark-e2e-gce-big |
fi | ||
params+=" --log-file=${LOG_PATH}" | ||
params+=" --logtostderr=false" | ||
params+=" --log-file-max-size=0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add also --stderrthreshold=FATAL to avoid logging anything to stderr (it is the case before the change)?
My understanding is that this contributes to the issue -- any data wrote to stderr must be read by docker and from my experience docker reads that data quite slow.
I think this generates significant part of the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant adding stderrthreshold here and in other components as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new klog logic, if logtostderr and alsotostderr are both false, stderrthreshold will not be considered (since no stderr would ever happpen).
See here
In such case, no data will be written to std error whatever the log severity is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Makes sense.
I think this change makes sense. I think we should add --stderrthreshold=FATAL as well to avoid logging anything to stderr (it is the case before the change), which is redirected to pipe with docker on the other side. From my experience docker usually reads data quite slowly from that pipe so this can lead to potential scalability issues. Another thing is that given we have seen that changes like that can affect performance of 5k node tests, I suggest running manual test first before we merge this. Could you run something like that? |
Can you give some guidance how to run a 5k node tests? |
In my opinion this PR should work, but I really prefer testing this before we submit to avoid third revert. /test pull-kubernetes-e2e-gce-large-performance This should test this PR in 2k scale. 5k scale would be better, but we don't have resources until the end of the next week to test this in 5k scale. |
/test pull-kubernetes-e2e-gce-large-performance |
I'm afraid the first test failure is not a flake. I took a look at logs and I see few problems there:
|
I see what happened, this PR doesn't contain klog version update which happens in #78465 We need to rebase this PR to contain that commit and rerun the test, at least now we know that 2000 node tests reproduces the issue we saw in 5k node scale. |
As soon as pull-kubernetes-e2e-gce-large-performance passes I think it's good to submit. /lgtm |
/test pull-kubernetes-e2e-gce-large-performance |
let's try this again for 1.17 /hold cancel |
/test pull-kubernetes-e2e-gce-large-performance |
/retest Review the full test history for this PR. Silence the bot with an |
6 similar comments
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
:( |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
/lgtm |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
/hold Holding for a moment given that we currently have a regression (since Friday). Forfunately we seem to know where the problem is already - an issue will be opened later today. |
@wojtek-t how are things? could we try again? |
We fixed the duplicate log issue in klog (kubernetes/klog#65). The new klog release (v0.3.2) is introduced to k/k in #78465.
Does this PR introduce a user-facing change?:
What type of PR is this?
/kind cleanup