Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "[Re-Apply][Distroless] Convert the GCE manifests for master containers." #77904

Merged
merged 1 commit into from
May 15, 2019

Conversation

mborsz
Copy link
Member

@mborsz mborsz commented May 15, 2019

Reverts #76396

Sorry for reverting the change again, but we have evidence that it makes gce 5000 node test failing.

The problems we are seeing with this PR:

  1. Some errors are logged multiple times (i.e. ERRORSs 3x, WARNINGs 2x)
  2. All nonempty attempts of kube-logrotate are making kube-apiserver slow enough to triger leaderelection of kube-controller-manager and kube-scheduler.

While 1. is quite easy to understand, the reason for 2. is not known yet.
I'm happy to collaborate with anyone who is interested in rolling forward this PR again to understand the issue and make sure third roll forward will be the successful one :)

/assign @wojtek-t
/cc @mm4tt @krzysied

@k8s-ci-robot
Copy link
Contributor

@mborsz: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 15, 2019
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/gcp and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 15, 2019
@mm4tt
Copy link
Contributor

mm4tt commented May 15, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2019
@mborsz
Copy link
Member Author

mborsz commented May 15, 2019

/test pull-kubernetes-typecheck

@wojtek-t
Copy link
Member

@yuwenma @tallclair @MaciekPytel @anguslees - FYI

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mborsz, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2019
@wojtek-t wojtek-t added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 15, 2019
@k8s-ci-robot k8s-ci-robot merged commit 4d3d153 into kubernetes:master May 15, 2019
@yuwenma
Copy link
Contributor

yuwenma commented May 15, 2019

/cc @dims

Hi Maciej, we really hope this change can go in in v1.15. Can you provide more context about what you have witnessed and proved the manifest change is the root cause? I'm happy to help triaging the issue but would need to have more inputs.

@k8s-ci-robot k8s-ci-robot requested a review from dims May 15, 2019 19:17
@mm4tt
Copy link
Contributor

mm4tt commented May 16, 2019

@yuwenma,

We are quite confident that this change is causing visible regressions in 5K node cluster tests.
Below are the graphs of a test run with #76396.
NpHva4wEakD

There is a strict correlation between logrotate and huge increase in apiserver cpu usage, which in turn increases api latency, causes components to restart (due to slow connection to master), etc.

Now, exactly the same run but with #76396 reverted.
jwqqqEsJdA4

As you can see, there is a huge difference. Why don't know what is the exact root cause here, but we're 100% sure that #76396 introduced an regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants