Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feasibility-research] Handle machine failure #900

Closed
gaocegege opened this issue Dec 20, 2018 · 4 comments
Closed

[feasibility-research] Handle machine failure #900

gaocegege opened this issue Dec 20, 2018 · 4 comments

Comments

@gaocegege
Copy link
Member

According to the docs about restart policy here: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy, once bound to a node, a Pod will never be rebound to another node.

Thus when the machine node is down, we should try to handle such a failure.

There are some options to achieve it:

I will try to investigate the cost of this feature.

@jlewi
Copy link
Contributor

jlewi commented Feb 4, 2019

@gaocegege Doesn't the CR already handle this? Won't the reconcile logic detect that the pod isn't running and create a new one.

@johnugeorge @richardsliu

@gaocegege
Copy link
Member Author

I think it cannot solve the Split brain problem.

@jlewi jlewi added this to To Do in Needs Triage Nov 26, 2019
@jtfogarty
Copy link

/area engprod
/priority p2

@jtfogarty jtfogarty moved this from To Do to Assigned to Area Owner For Triage in Needs Triage Jan 14, 2020
@stale
Copy link

stale bot commented Apr 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot closed this as completed May 2, 2020
Needs Triage automation moved this from Assigned to Area Owner For Triage to Closed May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants