Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network throughput issues with latest version of container-vm #32596

Closed
vishh opened this issue Sep 13, 2016 · 8 comments · Fixed by #32738
Closed

Network throughput issues with latest version of container-vm #32596

vishh opened this issue Sep 13, 2016 · 8 comments · Fixed by #32738
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@vishh
Copy link
Contributor

vishh commented Sep 13, 2016

The upgraded version of container-vm (gci) has significant performance regressions with localhost networking when compared to its previous versions.

@Amey-D is working on making gci kernel fixes to resolve this issue.

This issue is meant to track the GCI fix from release 1.4 perspective.

The fix for this issue will result in upgrading the gci version pinned to release 1.4 branch.

ETA:
Status: Yellow

@vishh vishh added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network. team/cluster labels Sep 13, 2016
@vishh vishh added this to the v1.4 milestone Sep 13, 2016
@vishh
Copy link
Contributor Author

vishh commented Sep 13, 2016

@Amey-D can you update the ETA for fixing this issue?

@vishh
Copy link
Contributor Author

vishh commented Sep 13, 2016

cc @pwittrock for release tracking purposes.

@vishh vishh added the kind/bug Categorizes issue or PR as related to a bug. label Sep 13, 2016
@bprashanth
Copy link
Contributor

is this a release blocker (why?) or are we aiming for a minor release?

@vishh
Copy link
Contributor Author

vishh commented Sep 13, 2016

@bprashanth It is a blocker because it causes about 70% drop in throughput which is a significant regression from a k8s on GCP user perspective.

@Amey-D
Copy link
Contributor

Amey-D commented Sep 13, 2016

The kernel fix should get merged in GCI kernel today, and hopefully we will be able to release a GCI version tomorrow that includes the fix.

@bprashanth
Copy link
Contributor

Hmm, I somehow fail to see why a kubernetes release needs to be blocked on a bug fix going into one os the kubernetes release is being deployed on top of, but that's just my opinion

@vishh
Copy link
Contributor Author

vishh commented Sep 13, 2016

I get your point of view. But, k8s on GCE cannot have serious user-facing
issues. The distro is just an implementation detail, but an important one.
If we include first-class support for additional distros like ubuntu,
debian, etc. in the future, we will have to make the very same priority
calls.
At the end of the day, the release needs to work for most users in most
environments.

On Tue, Sep 13, 2016 at 2:52 PM, Prashanth B notifications@github.com
wrote:

Hmm, I somehow fail to see why a kubernetes release needs to be blocked on
a bug fix going into one os the kubernetes release is being deployed on top
of, but that's just my opinion


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#32596 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKBmB9rEzkBqQWLfcCKonYbfiE2ryks5qpxs6gaJpZM4J8Ek6
.

@dchen1107
Copy link
Member

I think gci related issues shouldn't blocker OSS k8s 1.4 release. I also agreed that k8s on GCE cannot have serious user-facing issues, but that can be resolved by reverting the changes of making GCE node on GCI default. I am not suggesting that we do that now, but I do suggest we have the summary of the entire issue (not only this particular one) on GCI on node, and deeply understand the risk, then make the final decision.

On another hand, we are still planning to enable GCI on node default between 1.4 and 1.5 time framework. cc/ @pwittrock

Amey-D added a commit to Amey-D/kubernetes that referenced this issue Sep 14, 2016
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7

Fixes kubernetes#32596
@pwittrock pwittrock modified the milestones: v1.4-nonblocking, v1.4 Sep 19, 2016
Amey-D added a commit to Amey-D/kubernetes that referenced this issue Sep 20, 2016
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet

Fixes kubernetes#32596
k8s-github-robot pushed a commit that referenced this issue Sep 20, 2016
Automatic merge from submit-queue

Bump up GCI version.

```release-note
   Upgrading Container-VM base image for k8s on GCE. Brief changelog as follows:
    - Fixed performance regression in veth device driver
    - Docker and related binaries are statically linked
    - Fixed the issue of systemd being oom-killable
```

Fixes #32596

This needs a cherrypick into v1.4 release branch because it is fixing v1.4 release blocking issues. This patch is easy and safe to rollback in case of emergencies.

@vishh can you please review?

Fixes #32596 and many other issues.
cc/ @kubernetes/goog-image  FYI
Amey-D added a commit to Amey-D/kubernetes that referenced this issue Sep 21, 2016
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet

Fixes kubernetes#32596
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this issue Dec 1, 2016
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet

Fixes kubernetes#32596
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
No open projects
GCE Base Image
1.4 Leftovers
Development

Successfully merging a pull request may close this issue.

5 participants