Force restart VM expose the risk of data corruption #9830

xiesheng211 · 2023-05-31T20:51:02Z

What happened:
When VM get force restarted with GracePeriod == 0, the virt-launcher will get force deleted.
Code link

This could cause potential corruption if triggered when the node is under partition, which causes two VMs running at same time.

What you expected to happen:
SIGKILL should succeed when tearing down the VM.

One way to workaround is to set GracePeriod = 1, which is similar workaround as kubectl (link)

cc. @rmohr

xpivarc · 2023-06-01T07:28:51Z

I believe this is expected behavior. What do you mean by SIGKILL should succeed when tearing down the VM.?

rmohr · 2023-06-01T07:43:44Z

I think the issue is that it behaves like kubectl delete pods --grace-period 0 --force. It is basically not waiting for kubelet confirmation. So, if the kubelet is slow or unresponsive, we risk a split-brain scenario. I don't think this is what we intend to express with a forced restart. I think we just want to express: "Don't wait for a clean shutdown of the guest. I want to press the reset button.".

I think, as outlined by @xiesheng211, going with a grace period of 1 would be more what users would understand as a force-restart: If they system is unhealthy (e.g. node unresponsive), the pod will still be stuck until a force delete, in any other case, it would still restart practically right away.

xpivarc · 2023-06-01T08:03:58Z

I see your point. We don't wait for confirmation that the Pod was deleted.

rmohr · 2023-06-01T08:52:26Z

I see your point. We don't wait for confirmation that the Pod was deleted.

@xpivarc Does it sound reasonable to internally on the pod delete transform the 0 request into a 1 request?

xiesheng211 added the kind/bug label May 31, 2023

rmohr mentioned this issue Jun 6, 2023

Set the deletion grace period to 1 on forced VM restarts #9861

Merged

kubevirt-bot closed this as completed in #9861 Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force restart VM expose the risk of data corruption #9830

Force restart VM expose the risk of data corruption #9830

xiesheng211 commented May 31, 2023

xpivarc commented Jun 1, 2023

rmohr commented Jun 1, 2023 •

edited

xpivarc commented Jun 1, 2023

rmohr commented Jun 1, 2023

Force restart VM expose the risk of data corruption #9830

Force restart VM expose the risk of data corruption #9830

Comments

xiesheng211 commented May 31, 2023

xpivarc commented Jun 1, 2023

rmohr commented Jun 1, 2023 • edited

xpivarc commented Jun 1, 2023

rmohr commented Jun 1, 2023

rmohr commented Jun 1, 2023 •

edited