New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle #9477
Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle #9477
Conversation
Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
/cherry-pick release/v2.20 |
@embik: once the present PR merges, I will cherry-pick it on top of release/v2.20 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release/v2.19 |
@embik: once the present PR merges, I will cherry-pick it on top of release/v2.19 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release/v2.18 |
@embik: once the present PR merges, I will cherry-pick it on top of release/v2.18 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: f46250ab1578a98127672ce384743d419548f3aa
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: embik, xmudrii The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
LGTM label has been added. Git tree hash: 5068b51ce57ae30a548a2391c3c16ff51ddaa484
|
@embik: #9477 failed to apply on top of branch "release/v2.20":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@embik: #9477 failed to apply on top of branch "release/v2.19":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@embik: #9477 failed to apply on top of branch "release/v2.18":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…ubermatic#9477) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Use Check instead of Validate Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update testdata for etcd command with corruption flags Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
…ubermatic#9477) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Use Check instead of Validate Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update testdata for etcd command with corruption flags Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
…ubermatic#9477) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Use Check instead of Validate Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update testdata for etcd command with corruption flags Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
…se to 4h cycle (#9480) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle (#9477) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Use Check instead of Validate Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update testdata for etcd command with corruption flags Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Regenerate fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
…9477) (#9481) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Use Check instead of Validate Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update testdata for etcd command with corruption flags Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
…9477) (#9482) * Enable etcd corruption checks for etcd 3.5 and increase to 4h cycle Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Use Check instead of Validate Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update fixtures Signed-off-by: Marvin Beckers <marvin@kubermatic.com> * Update testdata for etcd command with corruption flags Signed-off-by: Marvin Beckers <marvin@kubermatic.com>
What does this PR do / Why do we need it:
Yesterday, the etcd team announced that they do not recommend usage of etcd 3.5 in production anymore due to data consistency issues that can occur when etcd is terminated under high load. They expect that an upcoming etcd 3.5.3 release will fix this problem, but no ETA has been communicated so far.
This PR, as an initial mitigation, enables corruption checks for etcd 3.5 StatefulSets. This was initially added in #2460 as a Seed level feature flag and is available both in plain
etcd
andetcd-launcher
. We're setting it to always true for the affected etcd version.Corruption checks will not repair corrupted nodes, but will prevent them from starting, and therefore hopefully prevent data inconsistency in the live cluster. Corrupted members need to be manually removed and replaced if running plain
etcd
.etcd-launcher
can be forced to rebuild the member by deleting the matching PV and letting the automatic PV recovery kick in.In addition, this PR bumps the periodic corruption checks up to 4 hours from 10 minutes. The reason behind that is this comment on the etcd data consistency issue. Since the checks produce some overhead, running them every 10 minutes seems rather excessive.
Does this PR close any issues?:
Fixes #
Special notes for your reviewer:
Documentation:
Does this PR introduce a user-facing change?:
Signed-off-by: Marvin Beckers marvin@kubermatic.com