Version bump to etcd v3.2.13, grpc v1.7.5 #57480

jpbetz · 2017-12-20T22:50:33Z

Reapply #57160 but with etcd 3.2.13, which includes etcd-io/etcd#9047 to fix #51099.

We need to scalability test this PR before merging it since the previous attempt to version bump to grpc v1.7+ resulted in a scalability test failure after the PR was merged to master, and we don't want to repeat that. No, no we don't.

Thanks @gyuho for fixing the etcd grpc issue and releasing etcd-3.2.13 on short notice.

Release note:

Upgrade to etcd client 3.2.13 and grpc 1.7.5 to improve HA etcd cluster stability.

jpbetz · 2017-12-20T23:10:58Z

@shyamjvs Is there any to kick off a scalability test on this PR? We'd like to ensure it's passing before merging if possible since the last grpc bump caused scalability test instability on master and we'd prefer ensure we've fixed that issue before merging.

gyuho · 2017-12-20T23:28:03Z

@jpbetz etcd part LGTM.

I'd also like to know how kubernetes encountered #51099.
etcd didn't have any tests around this while upgrading gRPC.

If etcd.embed.Etcd were used, embed.NewConfig().MaxRequestBytes should be configured based on test workloads; otherwise, server-side request limit defaults to 1.5 MiB. For client-side, v3.2.12 defaults response limit to math.MaxInt32. So, just upgrading to v3.2.12 should fix the error "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (26710706 vs. 4194304)".

Thanks.

jpbetz · 2017-12-21T05:33:22Z

/retest

porridge · 2017-12-21T05:55:03Z

/test pull-kubernetes-kubemark-e2e-gce-big

jpbetz · 2017-12-21T07:11:15Z

/test pull-kubernetes-bazel-test

wojtek-t · 2017-12-21T07:35:59Z

Both big kubemark tests passed, but the suite timedout. My feeling is that it's long build and we can ignore it. @porridge ?

porridge · 2017-12-21T07:47:16Z

Yes, the timeout of pull-kubernetes-kubemark-e2e-gce-big is not a problem here. This looks promising, but I'm running an additional 5k-node test manually to confirm.

porridge · 2017-12-21T07:54:29Z

On second thought, it's actually somewhat suspicious that 80 minutes were not enough for a test which passes in ~50 on master. I'll take a closer look.

wojtek-t · 2017-12-21T07:56:28Z

On second thought, it's actually somewhat suspicious that 80 minutes were not enough for a test which passes in ~50 on master. I'll take a closer look.

In master we are not building - with presubmit we are doing building. It used to take 30m+ in the past, maybe it's still the case.

porridge · 2017-12-21T07:58:55Z

Right, just compared and the additional ~30 minutes are from quick-release. Let me fix the timeouts.

porridge · 2017-12-21T11:43:19Z

/test pull-kubernetes-kubemark-e2e-gce-big
just checking whether the bumped timeout works, don't worry if it fails again.

porridge · 2017-12-21T11:45:37Z

FWIW, those pull-kubernetes-bazel-test do not look like flakes.

jpbetz · 2018-01-07T16:23:51Z

Clean rebase. Adding back lgtm.

k8s-github-robot · 2018-01-07T17:21:41Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2018-01-07T18:21:49Z

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

porridge · 2018-01-08T15:04:17Z

FTR:

load capacity failed due to flake: test failures with "read: connection reset by peer; some request body already written" when creating/changing objects #55860
density failed due to something very similar to replicationcontrollers POST latency SLO violation in gci-gce-scalability #56066

jpbetz · 2018-01-08T15:25:13Z

Thanks for following up @porridge

@gyuho

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update etcd server version to 3.2.14 This upgrades the default etcd version used by kubernetes to 3.2.14 We previously [bumped the etcd client to 3.2.14](#57480). Fixes #56438 ```release-note Upgrade default etcd server version to 3.2.14 ``` cc @gyuho

@gyuho

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update etcd server version to 3.2.14 This upgrades the default etcd version used by kubernetes to 3.2.14 We previously [bumped the etcd client to 3.2.14](kubernetes/kubernetes#57480). Fixes kubernetes/kubernetes#56438 ```release-note Upgrade default etcd server version to 3.2.14 ``` cc @gyuho Kubernetes-commit: 0f6354e81b16030f7c2dd9c65a29cd1f5b5e43b2

@gyuho

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update etcd server version to 3.2.14 This upgrades the default etcd version used by kubernetes to 3.2.14 We previously [bumped the etcd client to 3.2.14](kubernetes/kubernetes#57480). Fixes kubernetes/kubernetes#56438 ```release-note Upgrade default etcd server version to 3.2.14 ``` cc @gyuho Kubernetes-commit: 0f6354e81b16030f7c2dd9c65a29cd1f5b5e43b2

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update kubeadm supported etcd version to 3.2.14 in 1.10 **What this PR does / why we need it**: Kubernetes will upgrade to etcd server 3.2.14 in 1.10 cycle (#58645), update DefaultEtcdVersion in kubeadm to this version. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: relevant PR: #57480 #58645 fixes: kubernetes/kubeadm#621 **Special notes for your reviewer**: /cc @kubernetes/sig-cluster-lifecycle-pr-reviews **Release note**: ```release-note NONE ``` kubeadm don't need to advertise this in release notes.

mfojtik · 2018-02-20T11:02:33Z

@jpbetz have you figured out what was causing the "context cancelled" errors?

When trying to pickup the new etcd (3.2.13 or 3.2.16), i'm seeing this panic in k8s tests: https://gist.github.com/mfojtik/a5109e6c752f2569b99b5dc90e5d1801

EDIT: The panic is timeout as in openshift/origin we set the default unit test timeout to 120s. However this test (pkg/master/master_test.go:TestLegacyRestStorageStrategies) is taking over 224s after bumping etcd to 3.2.16 (same for 3.2.13)...

When I saw these context canceled errors in the logs last night, I also
suspected they might be a red herring. I’ll dig in today.

shyamjvs · 2018-03-06T17:26:08Z

FYI - today while looking at apiserver mem usage in our 5k-node scalability tests, I noticed a huge drop across runs 93 and 95. From the diff, this change seems to have most likely caused the improvement :)

Do we know why we see such improvement? What changed?

jpbetz · 2018-03-06T19:34:03Z

@shyamjvs I suspect it is a combination of things. In the grpc v1.3.0 -> v1.7.5 upgrade there are various performance improvements (see https://github.com/grpc/grpc-go/releases/tag/v1.5.0, https://github.com/grpc/grpc-go/releases/tag/v1.4.0) and there are quite a few etcd 3.1-3.2 client improvements (@gyuho any that you know of that would improve memory utilization this much?).

@cheftako It would be great to be able to run the memory analysis tool you've been trying out on scalability runs like this. Ideally, a tool like that would tell us exactly what changed from baseline.

gyuho · 2018-03-06T19:39:59Z

We did some performance work for 3.3 (https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md#improved-1), but doubt client upgrade would have much effect. Yeah, I would be interested in what have caused this :)

redbaron · 2018-05-15T05:25:18Z

is this PR backportable?

@gyuho

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Update etcd server version to 3.2.14 This upgrades the default etcd version used by kubernetes to 3.2.14 We previously [bumped the etcd client to 3.2.14](kubernetes/kubernetes#57480). Fixes kubernetes/kubernetes#56438 ```release-note Upgrade default etcd server version to 3.2.14 ``` cc @gyuho Kubernetes-commit: 0f6354e81b16030f7c2dd9c65a29cd1f5b5e43b2

jpbetz added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Dec 20, 2017

jpbetz added this to the v1.10 milestone Dec 20, 2017

jpbetz assigned jpbetz, smarterclayton, thockin, dchen1107 and wojtek-t Dec 20, 2017

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 20, 2017

jpbetz requested review from wojtek-t and timothysc December 20, 2017 22:50

jpbetz assigned porridge Dec 20, 2017

This was referenced Dec 20, 2017

gRPC update causing failure of API calls with large responses #51099

Closed

Version bump to etcd v3.2.11, grpc v1.7.5 #57160

Merged

porridge mentioned this pull request Dec 21, 2017

Bump timeout for kubemark PR job. kubernetes/test-infra#6056

Merged

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2018

jpbetz added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2018

k8s-github-robot merged commit 531b97b into kubernetes:master Jan 7, 2018

ash2k mentioned this pull request Jan 7, 2018

Update gRPC library to pick up data race fix #53124

Closed

jpbetz mentioned this pull request Jan 23, 2018

Update etcd server version to 3.2.14 #58645

Merged

timothysc mentioned this pull request Jan 23, 2018

Update to etcd 3.2.14 #58652

Closed

gyliu513 mentioned this pull request Jan 23, 2018

Upgrade etcd to 3.2.14 #58654

Closed

sttts mentioned this pull request Feb 7, 2018

Data race in grpc lib used by etcd client openshift/origin#18496

Closed

mfojtik mentioned this pull request Feb 19, 2018

bump(*): update etcd to 3.2.16 and grpc to 1.7.5 openshift/origin#18660

Closed

sttts mentioned this pull request Feb 23, 2018

bump(*): update etcd to 3.2.16 and grpc to 1.7.5 openshift/origin#18731

Merged

shyamjvs mentioned this pull request Apr 19, 2018

Pod-startup performance related failures on 2k/5k-node tests #62808

Closed

joejulian mentioned this pull request Jun 5, 2018

kube-apiserver 1.10.[0-5] & 1.11.0 uses up all available cpu on arm64 #64649

Closed

xarses mentioned this pull request Jun 15, 2018

apiserver timeouts after rolling-update of etcd cluster #47131

Closed

gyuho mentioned this pull request Oct 12, 2018

REQUEST: New membership for @gyuho kubernetes/org#163

Closed

6 tasks

gyuho mentioned this pull request Jul 27, 2019

clientv3: fix secure endpoint failover, refactor with gRPC 1.22 upgrade etcd-io/etcd#10911

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version bump to etcd v3.2.13, grpc v1.7.5 #57480

Version bump to etcd v3.2.13, grpc v1.7.5 #57480

jpbetz commented Dec 20, 2017 •

edited

Loading

jpbetz commented Dec 20, 2017

gyuho commented Dec 20, 2017

jpbetz commented Dec 21, 2017

porridge commented Dec 21, 2017

jpbetz commented Dec 21, 2017

wojtek-t commented Dec 21, 2017

porridge commented Dec 21, 2017

porridge commented Dec 21, 2017

wojtek-t commented Dec 21, 2017

porridge commented Dec 21, 2017

porridge commented Dec 21, 2017

porridge commented Dec 21, 2017

jpbetz commented Jan 7, 2018 •

edited

Loading

k8s-github-robot commented Jan 7, 2018

k8s-github-robot commented Jan 7, 2018

porridge commented Jan 8, 2018

jpbetz commented Jan 8, 2018

mfojtik commented Feb 20, 2018 •

edited

Loading

shyamjvs commented Mar 6, 2018 •

edited

Loading

jpbetz commented Mar 6, 2018

gyuho commented Mar 6, 2018 •

edited

Loading

redbaron commented May 15, 2018

Version bump to etcd v3.2.13, grpc v1.7.5 #57480

Version bump to etcd v3.2.13, grpc v1.7.5 #57480

Conversation

jpbetz commented Dec 20, 2017 • edited Loading

jpbetz commented Dec 20, 2017

gyuho commented Dec 20, 2017

jpbetz commented Dec 21, 2017

porridge commented Dec 21, 2017

jpbetz commented Dec 21, 2017

wojtek-t commented Dec 21, 2017

porridge commented Dec 21, 2017

porridge commented Dec 21, 2017

wojtek-t commented Dec 21, 2017

porridge commented Dec 21, 2017

porridge commented Dec 21, 2017

porridge commented Dec 21, 2017

jpbetz commented Jan 7, 2018 • edited Loading

k8s-github-robot commented Jan 7, 2018

k8s-github-robot commented Jan 7, 2018

porridge commented Jan 8, 2018

jpbetz commented Jan 8, 2018

mfojtik commented Feb 20, 2018 • edited Loading

shyamjvs commented Mar 6, 2018 • edited Loading

jpbetz commented Mar 6, 2018

gyuho commented Mar 6, 2018 • edited Loading

redbaron commented May 15, 2018

jpbetz commented Dec 20, 2017 •

edited

Loading

jpbetz commented Jan 7, 2018 •

edited

Loading

mfojtik commented Feb 20, 2018 •

edited

Loading

shyamjvs commented Mar 6, 2018 •

edited

Loading

gyuho commented Mar 6, 2018 •

edited

Loading