New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default etcd server to 3.2 for kubernetes 1.11 #61198

Merged
merged 2 commits into from Apr 12, 2018

Conversation

@jpbetz
Contributor

jpbetz commented Mar 14, 2018

Repply #59836 but with latest etcd 3.2 patch version (3.2.18 which includes mvcc fix and leader election timeout fix) and default --snapshot-count to 10k to resolve performance regression in previous etcd 3.2 server upgrade attempt (#60589 (comment)).

See #60589 (comment) for details on the root cause of the performance regression and scalability test results of setting --snapshot-count to 10k.

Upgrade the default etcd server version to 3.2.18

@gyuho @shyamjvs @jdumars @timothysc

@jpbetz

This comment has been minimized.

Contributor

jpbetz commented Mar 14, 2018

@jdumars, @shyamjvs Is there time for this to make this in to k8s 1.10?

@jpbetz jpbetz changed the title from Etcd 3.2 upgrade reattempt to Etcd server 3.2 upgrade reattempt Mar 14, 2018

@timothysc

This comment has been minimized.

Member

timothysc commented Mar 14, 2018

Has this been scale tested? Also what are the details on the fix and it's root cause?

@jbeda jbeda removed their request for review Mar 14, 2018

@@ -39,6 +39,9 @@
},
{ "name": "ETCD_CREDS",
"value": "{{ etcd_creds }}"
},
{ "name": "ETCD_SNAPSHOT_COUNT",
"value": "10000"

This comment has been minimized.

@timothysc

timothysc Mar 14, 2018

Member

You didn't set this default on the kubeadm change.

This comment has been minimized.

@jpbetz

jpbetz Mar 14, 2018

Contributor

Added to cmd/kubeadm/app/phases/etcd/local.go

@shyamjvs

This comment has been minimized.

Member

shyamjvs commented Mar 14, 2018

@jpbetz I would say it would be safer to get this in for next release, given:

  • it's a relatively high risk change to be made this close to release (we don't yet know if there's some other issue than snapshot-count)
  • we may lack bandwidth to scale test this change with the debugging happening in #60589

How critical is this (given that some key fixes have already been backported to 3.1)?

@jpbetz jpbetz added this to the v1.11 milestone Mar 14, 2018

@jpbetz

This comment has been minimized.

Contributor

jpbetz commented Mar 14, 2018

@timothysc Yes, see #60589 (comment) for details on the root cause of the performance regression and scalability test results of setting --snapshot-count to 10k.

@jpbetz

This comment has been minimized.

Contributor

jpbetz commented Mar 14, 2018

@shyamjvs Setting milestone to v1.11

@jpbetz jpbetz changed the title from Etcd server 3.2 upgrade reattempt to Update default etcd server to 3.2 for kubernetes 1.11 Mar 14, 2018

@jpbetz

This comment has been minimized.

Contributor

jpbetz commented Mar 14, 2018

Published k8s.gcr.io images for etcd and etcd-empty-dir-cleanup

@jpbetz

This comment has been minimized.

Contributor

jpbetz commented Apr 11, 2018

Rebased and updated etcd server patch version to 3.2.18.

@timothysc

/lgtm
/approve

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Apr 12, 2018

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot added size/M and removed lgtm size/L labels Apr 12, 2018

@timothysc timothysc added the approved label Apr 12, 2018

@k8s-ci-robot

This comment has been minimized.

Contributor

k8s-ci-robot commented Apr 12, 2018

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: jpbetz, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Apr 12, 2018

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-merge-robot

This comment has been minimized.

Contributor

k8s-merge-robot commented Apr 12, 2018

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-merge-robot k8s-merge-robot merged commit 9816b43 into kubernetes:master Apr 12, 2018

14 of 15 checks passed

Submit Queue Required Github CI test is not green: pull-kubernetes-verify
Details
cla/linuxfoundation jpbetz authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@wojtek-t

This comment has been minimized.

Member

wojtek-t commented Apr 23, 2018

There is a hypothesis that his hurts scalability (as I mentioned in #62808 couple days ago).
For now, we will temporarily revert our scalability jobs to use 3.1.12 etcd to verify that.
If that's the case, we will need to think what to do about this PR.

@jpbetz @timothysc @lavalamp - FYI

@jpbetz

This comment has been minimized.

Contributor

jpbetz commented Apr 23, 2018

Thanks for the heads up @wojtek-t

@liggitt

This comment has been minimized.

Member

liggitt commented Apr 23, 2018

@gyuho fyi

@gyuho

This comment has been minimized.

Member

gyuho commented Apr 23, 2018

ack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment