From 9c14a90c461bf6944589aea5bde7248547797f19 Mon Sep 17 00:00:00 2001 From: Riccardo Mancini Date: Wed, 5 Mar 2025 17:02:18 +0000 Subject: [PATCH] doc(vsock): clarify vsock support on snapshot/restore The current documentation is not clear, mentioning that the device could break if active at time of snapshot. It also mentions that this is fixed by resetting the device at snapshot time, closing all open connections. This patch clarifies that vsock connections are explicitly closed by a transport reset, in order to avoid the aforementioned issue. This doesn't impact listening sockets as they are able to accept new connections after restore. Signed-off-by: Riccardo Mancini --- docs/snapshotting/snapshot-support.md | 39 +++++++++++---------------- 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/docs/snapshotting/snapshot-support.md b/docs/snapshotting/snapshot-support.md index e1294e1ac7d..dc4235b6aad 100644 --- a/docs/snapshotting/snapshot-support.md +++ b/docs/snapshotting/snapshot-support.md @@ -57,7 +57,10 @@ creation process). Both network and vsock packet loss can be expected on guests that are resumed from snapshots in another Firecracker process. It is also not guaranteed that -the state of the network connections survives the process. +the state of the network connections survives the process. Furthermore, vsock +connections that are open when the snapshot is taken are closed, but existing +vsock listen sockets in the guest still remain active and can accept new +connections after resume (see [Vsock device reset](#vsock-device-reset)). In order to make restoring possible, Firecracker snapshots save the full state of the following resources: @@ -141,8 +144,6 @@ The snapshot functionality is still in developer preview due to the following: - Guest network connectivity is not guaranteed to be preserved after resume. For recommendations related to guest network connectivity for clones please see [Network connectivity for clones](network-for-clones.md). -- Vsock device does not have full snapshotting support. Please see - [Vsock device limitation](#vsock-device-limitation). - Snapshotting on arm64 works for both GICv2 and GICv3 enabled guests. However, restoring between different GIC version is not possible. - If a [CPU template](../cpu_templates/cpu-templates.md) is not used on x86_64, @@ -606,29 +607,19 @@ identifiers, cached random numbers, cryptographic tokens, etc **will** still be replicated across multiple microVMs resumed from the same snapshot. Users need to implement mechanisms for ensuring de-duplication of such state, where needed. -## Vsock device limitation +## Vsock device reset -Vsock must be inactive during snapshot. Vsock device can break if snapshotted -while having active connections. Firecracker snapshots do not capture any -inflight network or vsock (through the linux unix domain socket backend) traffic -that has left or not yet entered Firecracker. - -The above, coupled with the fact that Vsock control protocol is not resilient to -vsock packet loss, leads to Vsock device breakage when doing a snapshot while -there are active Vsock connections. - -As a solution to the above issue, active Vsock connections prior to snapshotting -the VM are forcibly closed by sending a specific event called -`VIRTIO_VSOCK_EVENT_TRANSPORT_RESET`. The event is sent on `SnapshotCreate`. On +The vsock device is reset across snapshot/restore to avoid inconsistent state +between device and driver leading to breakage +([#2218](https://github.com/firecracker-microvm/firecracker/issues/2218)). This +is done by sending a `VIRTIO_VSOCK_EVENT_TRANSPORT_RESET` event to the guest +driver during `SnapshotCreate` +([#2562](https://github.com/firecracker-microvm/firecracker/pull/2562)). On `SnapshotResume`, when the VM becomes active again, the vsock driver closes all -existing connections. Listen sockets still remain active. Users wanting to build -vsock applications that use the snapshot capability have to take this into -consideration. More details about this event can be found in the official Virtio -document [here](https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.pdf), -section 5.10.6.6 Device Events. - -Firecracker handles sending the `reset` event to the vsock driver, thus the -customers are no longer responsible for closing active connections. +existing connections. Existing listen sockets still remain active, but their CID +is updated to reflect the current `guest_cid`. More details about this event can +be found in the official Virtio document +[here](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-4080006). ## VMGenID device limitation