Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force restart VM expose the risk of data corruption #9830

Closed
xiesheng211 opened this issue May 31, 2023 · 4 comments · Fixed by #9861
Closed

Force restart VM expose the risk of data corruption #9830

xiesheng211 opened this issue May 31, 2023 · 4 comments · Fixed by #9861
Labels

Comments

@xiesheng211
Copy link

What happened:
When VM get force restarted with GracePeriod == 0, the virt-launcher will get force deleted.
Code link

This could cause potential corruption if triggered when the node is under partition, which causes two VMs running at same time.

What you expected to happen:
SIGKILL should succeed when tearing down the VM.

One way to workaround is to set GracePeriod = 1, which is similar workaround as kubectl (link)

cc. @rmohr

@xpivarc
Copy link
Member

xpivarc commented Jun 1, 2023

I believe this is expected behavior. What do you mean by SIGKILL should succeed when tearing down the VM.?

@rmohr
Copy link
Member

rmohr commented Jun 1, 2023

I think the issue is that it behaves like kubectl delete pods --grace-period 0 --force. It is basically not waiting for kubelet confirmation. So, if the kubelet is slow or unresponsive, we risk a split-brain scenario. I don't think this is what we intend to express with a forced restart. I think we just want to express: "Don't wait for a clean shutdown of the guest. I want to press the reset button.".

I think, as outlined by @xiesheng211, going with a grace period of 1 would be more what users would understand as a force-restart: If they system is unhealthy (e.g. node unresponsive), the pod will still be stuck until a force delete, in any other case, it would still restart practically right away.

@xpivarc
Copy link
Member

xpivarc commented Jun 1, 2023

I see your point. We don't wait for confirmation that the Pod was deleted.

@rmohr
Copy link
Member

rmohr commented Jun 1, 2023

I see your point. We don't wait for confirmation that the Pod was deleted.

@xpivarc Does it sound reasonable to internally on the pod delete transform the 0 request into a 1 request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants