Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466

yuwenma · 2019-05-29T05:54:31Z

We fixed the duplicate log issue in klog (kubernetes/klog#65). The new klog release (v0.3.2) is introduced to k/k in #78465.

Does this PR introduce a user-facing change?:

NONE

What type of PR is this?

/kind api-change
/kind bug

/kind cleanup

/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

yuwenma · 2019-05-29T05:55:56Z

/hold (Wait until #78465 is merged)
/assign @wojtek-t @MaciekPytel
/cc @dims @tallclair

yuwenma · 2019-05-29T06:13:17Z

/assign @MaciekPytel This PR is a re-apply of #76396. We pushed a fix in klog.

xichengliudui · 2019-05-29T06:39:46Z

/test pull-kubernetes-kubemark-e2e-gce-big

mborsz · 2019-05-29T07:08:26Z

cluster/gce/gci/configure-helper.sh

+  fi
+  params+=" --log-file=${LOG_PATH}"
+  params+=" --logtostderr=false"
+  params+=" --log-file-max-size=0"


Could we add also --stderrthreshold=FATAL to avoid logging anything to stderr (it is the case before the change)?

My understanding is that this contributes to the issue -- any data wrote to stderr must be read by docker and from my experience docker reads that data quite slow.

I think this generates significant part of the issue.

I meant adding stderrthreshold here and in other components as well.

For the new klog logic, if logtostderr and alsotostderr are both false, stderrthreshold will not be considered (since no stderr would ever happpen).
See here

In such case, no data will be written to std error whatever the log severity is.

I see. Makes sense.

mborsz · 2019-05-29T07:11:08Z

I think this change makes sense. I think we should add --stderrthreshold=FATAL as well to avoid logging anything to stderr (it is the case before the change), which is redirected to pipe with docker on the other side. From my experience docker usually reads data quite slowly from that pipe so this can lead to potential scalability issues.

Another thing is that given we have seen that changes like that can affect performance of 5k node tests, I suggest running manual test first before we merge this. Could you run something like that?

yuwenma · 2019-05-29T18:37:25Z

I think this change makes sense. I think we should add --stderrthreshold=FATAL as well to avoid logging anything to stderr (it is the case before the change), which is redirected to pipe with docker on the other side. From my experience docker usually reads data quite slowly from that pipe so this can lead to potential scalability issues.

Another thing is that given we have seen that changes like that can affect performance of 5k node tests, I suggest running manual test first before we merge this. Could you run something like that?

Can you give some guidance how to run a 5k node tests?
Just a reminder that we are hitting the v1.15 code freeze (EOD this week). And there's another PR blocked by this change. Is there any chance to get this PR go in today?

mborsz · 2019-05-29T21:46:16Z

In my opinion this PR should work, but I really prefer testing this before we submit to avoid third revert.

/test pull-kubernetes-e2e-gce-large-performance

This should test this PR in 2k scale. 5k scale would be better, but we don't have resources until the end of the next week to test this in 5k scale.

yuwenma · 2019-05-30T04:28:32Z

/test pull-kubernetes-e2e-gce-large-performance

mborsz · 2019-05-30T07:09:39Z

I'm afraid the first test failure is not a flake.

I took a look at logs and I see few problems there:

kube-apiserver is restarting few times

➜  e2e-021a5abcf8-a7a7f-master zgrep -- --address= kube-apiserver.log*
kube-apiserver.log:I0530 01:41:22.465795       1 flags.go:33] FLAG: --address="127.0.0.1"
kube-apiserver.log-20190530-1559174412.gz:I0529 22:21:45.415090       1 flags.go:33] FLAG: --address="127.0.0.1"
kube-apiserver.log-20190530-1559174412.gz:I0529 23:41:21.603442       1 flags.go:33] FLAG: --address="127.0.0.1"
➜  e2e-021a5abcf8-a7a7f-master

some log entries are still repeated, e.g.:

➜  e2e-021a5abcf8-a7a7f-master zgrep 'E0529 23:41:19.450955' kube-apiserver.log*
kube-apiserver.log-20190530-1559174412.gz:E0529 23:41:19.450955       1 metrics.go:96] Error in audit plugin 'buffered' affecting 1 audit events: audit backend shut down
kube-apiserver.log-20190530-1559174412.gz:E0529 23:41:19.450955       1 metrics.go:96] Error in audit plugin 'buffered' affecting 1 audit events: audit backend shut down
kube-apiserver.log-20190530-1559174412.gz:E0529 23:41:19.450955       1 metrics.go:96] Error in audit plugin 'buffered' affecting 1 audit events: audit backend shut down

mborsz · 2019-05-30T07:13:30Z

I see what happened, this PR doesn't contain klog version update which happens in #78465

We need to rebase this PR to contain that commit and rerun the test, at least now we know that 2000 node tests reproduces the issue we saw in 5k node scale.

mborsz · 2019-05-30T08:17:37Z

As soon as pull-kubernetes-e2e-gce-large-performance passes I think it's good to submit.

/lgtm
/approve
/hold

dims · 2019-09-18T13:43:29Z

/test pull-kubernetes-e2e-gce-large-performance

dims · 2019-09-18T13:44:42Z

let's try this again for 1.17

/hold cancel

dims · 2019-09-18T14:17:33Z

/test pull-kubernetes-e2e-gce-large-performance

fejta-bot · 2019-09-18T17:01:09Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-18T19:49:23Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-18T22:37:21Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-19T01:25:21Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-19T04:13:22Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-19T07:01:21Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-19T09:49:23Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

dims · 2019-09-19T11:24:28Z

W0919 10:10:30.497] ERROR: (gcloud.compute.instances.create) Could not fetch resource:
W0919 10:10:30.498]  - Quota 'CPUS' exceeded.  Limit: 5200.0 in region us-east1.

:(

fejta-bot · 2019-09-19T13:40:22Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-19T21:09:50Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

dims · 2019-09-23T02:00:46Z

/lgtm

fejta-bot · 2019-09-23T04:02:40Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

fejta-bot · 2019-09-23T10:41:41Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

wojtek-t · 2019-09-23T12:28:50Z

/hold

Holding for a moment given that we currently have a regression (since Friday). Forfunately we seem to know where the problem is already - an issue will be opened later today.

dims · 2019-09-30T19:38:19Z

@wojtek-t how are things? could we try again?

wojtek-t · 2019-10-01T06:19:04Z

We recovered from the regression. Maybe we can try.

/hold cancel

@mborsz @mm4tt - FYI

k8s-ci-robot requested review from MrHohn and mwielgus May 29, 2019 05:55

k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/gcp and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 29, 2019

k8s-ci-robot assigned MaciekPytel and wojtek-t May 29, 2019

k8s-ci-robot requested review from dims and tallclair May 29, 2019 05:55

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 29, 2019

mborsz reviewed May 29, 2019

View reviewed changes

yuwenma mentioned this pull request May 29, 2019

Rebase core master image from debian to distroless #75306

Closed

k8s-ci-robot assigned mborsz May 30, 2019

k8s-ci-robot removed the sig/gcp label Aug 6, 2019

dims mentioned this pull request Aug 8, 2019

Update to latest klog 0.4.0 #81164

Merged

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 18, 2019

k8s-ci-robot assigned dims Sep 20, 2019

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 20, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2019

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2019

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 1, 2019

k8s-ci-robot merged commit 6610260 into kubernetes:master Oct 1, 2019

k8s-ci-robot added this to the v1.17 milestone Oct 1, 2019

mborsz mentioned this pull request Oct 2, 2019

Revert "Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers.""" #83390

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466

Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466

yuwenma commented May 29, 2019 •

edited

yuwenma commented May 29, 2019 •

edited

yuwenma commented May 29, 2019

xichengliudui commented May 29, 2019

mborsz May 29, 2019

mborsz May 29, 2019

yuwenma May 29, 2019

mborsz May 29, 2019

mborsz commented May 29, 2019

yuwenma commented May 29, 2019

mborsz commented May 29, 2019

yuwenma commented May 30, 2019

mborsz commented May 30, 2019

mborsz commented May 30, 2019

mborsz commented May 30, 2019

dims commented Sep 18, 2019

dims commented Sep 18, 2019

dims commented Sep 18, 2019

fejta-bot commented Sep 18, 2019

fejta-bot commented Sep 18, 2019

fejta-bot commented Sep 18, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

dims commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

dims commented Sep 23, 2019

fejta-bot commented Sep 23, 2019

fejta-bot commented Sep 23, 2019

wojtek-t commented Sep 23, 2019

dims commented Sep 30, 2019

wojtek-t commented Oct 1, 2019

Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466

Revert "Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers."" #78466

Conversation

yuwenma commented May 29, 2019 • edited

yuwenma commented May 29, 2019 • edited

yuwenma commented May 29, 2019

xichengliudui commented May 29, 2019

mborsz May 29, 2019

Choose a reason for hiding this comment

mborsz May 29, 2019

Choose a reason for hiding this comment

yuwenma May 29, 2019

Choose a reason for hiding this comment

mborsz May 29, 2019

Choose a reason for hiding this comment

mborsz commented May 29, 2019

yuwenma commented May 29, 2019

mborsz commented May 29, 2019

yuwenma commented May 30, 2019

mborsz commented May 30, 2019

mborsz commented May 30, 2019

mborsz commented May 30, 2019

dims commented Sep 18, 2019

dims commented Sep 18, 2019

dims commented Sep 18, 2019

fejta-bot commented Sep 18, 2019

fejta-bot commented Sep 18, 2019

fejta-bot commented Sep 18, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

dims commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

fejta-bot commented Sep 19, 2019

dims commented Sep 23, 2019

fejta-bot commented Sep 23, 2019

fejta-bot commented Sep 23, 2019

wojtek-t commented Sep 23, 2019

dims commented Sep 30, 2019

wojtek-t commented Oct 1, 2019

yuwenma commented May 29, 2019 •

edited

yuwenma commented May 29, 2019 •

edited