Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version bump to etcd v3.2.13, grpc v1.7.5 #57480

Merged
merged 5 commits into from
Jan 7, 2018

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Dec 20, 2017

Reapply #57160 but with etcd 3.2.13, which includes etcd-io/etcd#9047 to fix #51099.

We need to scalability test this PR before merging it since the previous attempt to version bump to grpc v1.7+ resulted in a scalability test failure after the PR was merged to master, and we don't want to repeat that. No, no we don't.

Thanks @gyuho for fixing the etcd grpc issue and releasing etcd-3.2.13 on short notice.

Release note:

Upgrade to etcd client 3.2.13 and grpc 1.7.5 to improve HA etcd cluster stability.

@jpbetz jpbetz added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. labels Dec 20, 2017
@jpbetz jpbetz added this to the v1.10 milestone Dec 20, 2017
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 20, 2017
@jpbetz
Copy link
Contributor Author

jpbetz commented Dec 20, 2017

@shyamjvs Is there any to kick off a scalability test on this PR? We'd like to ensure it's passing before merging if possible since the last grpc bump caused scalability test instability on master and we'd prefer ensure we've fixed that issue before merging.

@gyuho
Copy link
Member

gyuho commented Dec 20, 2017

@jpbetz etcd part LGTM.

I'd also like to know how kubernetes encountered #51099.
etcd didn't have any tests around this while upgrading gRPC.

If etcd.embed.Etcd were used, embed.NewConfig().MaxRequestBytes should be configured based on test workloads; otherwise, server-side request limit defaults to 1.5 MiB. For client-side, v3.2.12 defaults response limit to math.MaxInt32. So, just upgrading to v3.2.12 should fix the error "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (26710706 vs. 4194304)".

Thanks.

@jpbetz
Copy link
Contributor Author

jpbetz commented Dec 21, 2017

/retest

@porridge
Copy link
Member

/test pull-kubernetes-kubemark-e2e-gce-big

@jpbetz
Copy link
Contributor Author

jpbetz commented Dec 21, 2017

/test pull-kubernetes-bazel-test

@wojtek-t
Copy link
Member

Both big kubemark tests passed, but the suite timedout. My feeling is that it's long build and we can ignore it. @porridge ?

@porridge
Copy link
Member

Yes, the timeout of pull-kubernetes-kubemark-e2e-gce-big is not a problem here. This looks promising, but I'm running an additional 5k-node test manually to confirm.

@porridge
Copy link
Member

On second thought, it's actually somewhat suspicious that 80 minutes were not enough for a test which passes in ~50 on master. I'll take a closer look.

@wojtek-t
Copy link
Member

On second thought, it's actually somewhat suspicious that 80 minutes were not enough for a test which passes in ~50 on master. I'll take a closer look.

In master we are not building - with presubmit we are doing building. It used to take 30m+ in the past, maybe it's still the case.

@porridge
Copy link
Member

Right, just compared and the additional ~30 minutes are from quick-release. Let me fix the timeouts.

@porridge
Copy link
Member

/test pull-kubernetes-kubemark-e2e-gce-big
just checking whether the bumped timeout works, don't worry if it fails again.

@porridge
Copy link
Member

FWIW, those pull-kubernetes-bazel-test do not look like flakes.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 7, 2018
@jpbetz
Copy link
Contributor Author

jpbetz commented Jan 7, 2018

Clean rebase. Adding back lgtm.

@jpbetz jpbetz added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 7, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 531b97b into kubernetes:master Jan 7, 2018
@porridge
Copy link
Member

porridge commented Jan 8, 2018

@jpbetz
Copy link
Contributor Author

jpbetz commented Jan 8, 2018

Thanks for following up @porridge

k8s-github-robot pushed a commit that referenced this pull request Feb 3, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update etcd server version to 3.2.14

This upgrades the default etcd version used by kubernetes to 3.2.14

We previously [bumped the etcd client to 3.2.14](#57480).

Fixes #56438

```release-note
Upgrade default etcd server version to 3.2.14
```

cc @gyuho
k8s-publishing-bot added a commit to kubernetes/sample-apiserver that referenced this pull request Feb 3, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update etcd server version to 3.2.14

This upgrades the default etcd version used by kubernetes to 3.2.14

We previously [bumped the etcd client to 3.2.14](kubernetes/kubernetes#57480).

Fixes kubernetes/kubernetes#56438

```release-note
Upgrade default etcd server version to 3.2.14
```

cc @gyuho

Kubernetes-commit: 0f6354e81b16030f7c2dd9c65a29cd1f5b5e43b2
k8s-publishing-bot added a commit to kubernetes/apiextensions-apiserver that referenced this pull request Feb 3, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update etcd server version to 3.2.14

This upgrades the default etcd version used by kubernetes to 3.2.14

We previously [bumped the etcd client to 3.2.14](kubernetes/kubernetes#57480).

Fixes kubernetes/kubernetes#56438

```release-note
Upgrade default etcd server version to 3.2.14
```

cc @gyuho

Kubernetes-commit: 0f6354e81b16030f7c2dd9c65a29cd1f5b5e43b2
k8s-github-robot pushed a commit that referenced this pull request Feb 7, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update kubeadm supported etcd version to 3.2.14 in 1.10

**What this PR does / why we need it**:
Kubernetes will upgrade to etcd server 3.2.14 in 1.10 cycle (#58645), update DefaultEtcdVersion in kubeadm to this version.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
relevant PR: #57480 #58645
fixes: kubernetes/kubeadm#621

**Special notes for your reviewer**:
/cc @kubernetes/sig-cluster-lifecycle-pr-reviews

**Release note**:

```release-note
NONE
```
kubeadm don't need to advertise this in release notes.
@mfojtik
Copy link
Contributor

mfojtik commented Feb 20, 2018

@jpbetz have you figured out what was causing the "context cancelled" errors?

When trying to pickup the new etcd (3.2.13 or 3.2.16), i'm seeing this panic in k8s tests: https://gist.github.com/mfojtik/a5109e6c752f2569b99b5dc90e5d1801

EDIT: The panic is timeout as in openshift/origin we set the default unit test timeout to 120s. However this test (pkg/master/master_test.go:TestLegacyRestStorageStrategies) is taking over 224s after bumping etcd to 3.2.16 (same for 3.2.13)...

When I saw these context canceled errors in the logs last night, I also
suspected they might be a red herring. I’ll dig in today.

@shyamjvs
Copy link
Member

shyamjvs commented Mar 6, 2018

FYI - today while looking at apiserver mem usage in our 5k-node scalability tests, I noticed a huge drop across runs 93 and 95. From the diff, this change seems to have most likely caused the improvement :)

Do we know why we see such improvement? What changed?

etcd_update_change

@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 6, 2018

@shyamjvs I suspect it is a combination of things. In the grpc v1.3.0 -> v1.7.5 upgrade there are various performance improvements (see https://github.com/grpc/grpc-go/releases/tag/v1.5.0, https://github.com/grpc/grpc-go/releases/tag/v1.4.0) and there are quite a few etcd 3.1-3.2 client improvements (@gyuho any that you know of that would improve memory utilization this much?).

@cheftako It would be great to be able to run the memory analysis tool you've been trying out on scalability runs like this. Ideally, a tool like that would tell us exactly what changed from baseline.

@gyuho
Copy link
Member

gyuho commented Mar 6, 2018

We did some performance work for 3.3 (https://github.com/coreos/etcd/blob/master/CHANGELOG-3.3.md#improved-1), but doubt client upgrade would have much effect. Yeah, I would be interested in what have caused this :)

@redbaron
Copy link
Contributor

is this PR backportable?

openshift-publish-robot pushed a commit to openshift/kubernetes-sample-apiserver that referenced this pull request Jan 14, 2019
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update etcd server version to 3.2.14

This upgrades the default etcd version used by kubernetes to 3.2.14

We previously [bumped the etcd client to 3.2.14](kubernetes/kubernetes#57480).

Fixes kubernetes/kubernetes#56438

```release-note
Upgrade default etcd server version to 3.2.14
```

cc @gyuho

Kubernetes-commit: 0f6354e81b16030f7c2dd9c65a29cd1f5b5e43b2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gRPC update causing failure of API calls with large responses