-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Node fence design #1416
Conversation
Signed-off-by: Yaniv Bronhaim <ybronhei@redhat.com>
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA. It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
related to #124 |
/ok-to-test |
@kubernetes/sig-node-proposals |
/sig node |
@bronhaim: GitHub didn't allow me to request PR reviews from the following users: nirs, aglitke. Note that only kubernetes members can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
### Implementation | ||
#### Fence Controller | ||
- The fence controller is a stateless pod instance running in cluster that supports fence. | ||
- The controller will identify unresponsive node by check their apiserver object, once node becomes “not ready” a fence treatment will be triggered by posting crd for fence to initiate fence flow by Fence Executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if controller or executor's node become unresponsive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks a leader election process
current behavior in code base uses jobs to execute fence script. in this update I change the implementation section to fit current flow.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bronhaim Assign the PR to them by writing The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
- How to trigger fence devices/apis - general template parameters (e.g. cluster UPS address and credentials) and overrides values per node for specific fields (e.g. ports related to node) | ||
- How to run a fence flow for each node (each node can be attached to several fence devices\apis and the fence flow can defer) | ||
|
||
The following is stored in `ConfigMap` objects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is everyone on-board with config maps? I'd have thought we'd be creating a lot of noise for that particular set of objects, particularly for the bare metal case. By creating a CRD for this information we isolate it from regular usage and it allows us to optimise the format for this use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.. However, new crd means new object in cluster with version and specific services. for key-value fields I saw that using the builtin structure configmap seems to be good enough for our purposes. how would you optimize the format with crd?
|
||
Some ways to prevent fencing storms: | ||
- Skip fencing if select % of hosts in cluster is non-responsive (will help for issue #2 above) | ||
- Skip fencing if detected the host cannot connect to storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The act of checking and/or making sure its not currently connected to storage is itself a fencing operation, so the above doesn't seem completely accurate.
Fencing storm is a situation in which fencing for a few nodes in the cluster is triggered at the same time, due to an environmental issue. | ||
|
||
Examples: | ||
1. Switch failure - a switch used to connect a few nodes to the environment is failing, and there is no redundancy in the network environment. In such a case, the nodes will be reported as unresponsive, while they are still alive and kicking, and perhaps providing service through other networks they are attached to. If we fence those nodes, a few services will be offline until restarted, while it might not have been necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of, there are downsides either way (deciding whether to fence or not). Have a read of http://blog.clusterlabs.org/blog/2018/two-node-problems
The problem is that you have no way to know if it is a switch failure.
So either:
- you assume it's not a network failure, initiate fencing and potentially kill a healthy Pod (does it really count as healthy if no-one can reach it though) so it can be recovered, or
- you assume it is a network failure, don't initiate fencing and risk that there are multiple copies of some Pods
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
What is the status on this one? |
@thockin I'm in the process of trying to resurrect it. After additional consultation we're moving to a Machine API based approach. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@beekhof I'm just wondering about your last comment |
@rob-nicholson I resurrected it in a different form as a KEP - #2763 Rather than preaching a a specific implementation, it tried to describe the interaction points that solutions should respect. In the end it didn't get a lot of attention and was later closed when KEPs moved to a new repo. We also have an implementation based on this work in progress at https://github.com/openshift/machine-api-operator/tree/master/pkg/controller/machinehealthcheck which we are actively working on. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I know that some people are working on this, thus reopening it
/reopen
|
@fabiand: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Propose fencing mechanism in k8s cluster that allows to manage nodes in cluster with advanced fence actions