Firecracker Snapshots Support #448

plamenmpetrov · 2020-09-22T16:59:53Z

Hello, we have been working on supporting microVM snapshotting in containerd-firecracker, following its introduction to firecracker. This PR contains new functions for the firecracker-containerd API that together comprise a complete working prototype for working with Firecracker snapshots. This prototype, however, contains workarounds for the missing calls in go-sdk. We also highlight a couple of issues that we would like to hear your feedback on.

We are open to feedback from the community and would be glad to engage in discussions to finalize and contribute this code to upstream.

Authored by @plamenmpetrov and @ustiugov

Summary

We implement functionality for:
- Pausing a microVM - PauseVM
- Creating a snapshot of a microVM - CreateSnapshot
- Resuming a microVM - ResumeVM
- Loading a snapshot of a microVM - LoadSnapshot
- “Offloading” a microVM, which frees up the resources occupied by the microVM - Offload
We refer to these collectively as microVM snapshotting requests.
The firecracker-go-sdk does not support microVM snapshotting as of now. As a result, we embedded the microVM snapshotting requests inside the runtime as HTTP requests. We use our own fork of the firecracker-go-sdk v0.21.0, where we provide basic support to the new logging and metrics of the firecracker version that we use (see below). Without these changes in the firecracker-go-sdk, we observe an error in the containerd logs concerning the firecracker logging. This prevents us from seeing the firecracker logs and makes debugging difficult.
We use the following firecracker version in our tests: firecracker.

API Extensions Description

We create an HTTP client upon creating a microVM or loading a microVM snapshot, which is used to send HTTP requests directly to the firecracker process for the respective microVM (contrary to using the firecracker-go-sdk).

ResumeVM, PauseVM and CreateSnapshot

ResumeVM, PauseVM and CreateSnapshot use the HTTP client to send the respective request to the firecracker process. The return code from firecracker is checked to verify that the operation was successful.

Note that CreateSnapshot does not pause the microVM, but assumes that it is paused. This is in line with the prerequisites for creating a microVM snapshot in firecracker.

Offload

Offload kills the firecracker process for the microVM with the respective ID (using SIGKILL) and deletes the firecracker process’ sock file and vsock file so the microVM can later be loaded. This functionality is implemented in the runtime.

In addition, Offload also kills the shim using SIGKILL, so that the resources can be freed up until/if the microVM is loaded in the future. We remove the functionality where the shim directory for the microVM is removed when the shim terminates. This is because in our use case we decide to store the guest memory file and the state file in the shim directory. We also remove the shim socket file and the firecracker shim socket file and recreate the sockets upon LoadSnapshot (see below). This functionality is implemented in the control plugin.

LoadSnapshot

Before doing anything else, the shim needs to be started for the microVM. We recreate the shim socket and the fccontrol shim socket, and start the shim binary. This functionality is implemented in the control plugin.

LoadSnapshot starts a firecracker process listening on the API same socket that the microVM was using prior to being offloaded. The HTTP client is recreated and a load snapshot request is sent to the firecracker process. The return code returned by firecracker is checked to verify the success of the operation. This functionality is implemented in the runtime.

Note that LoadSnapshot assumes that the tap with the same exact name, IP, and MAC, as before the VM was offloaded, exists. Currently, we recommend removing the tap after calling Offload and re-creating the tap before calling LoadSnapshot because if these two calls are back to back (as may be in tests), it would cause “Tap is busy” error.

Limitations

When calling LoadSnapshot immediately after Offload, we encounter an error that the shimSocket address is in use when trying to load the shim on LoadSnapshot. A workaround is to introduce a sleep of 10-100ms after Offload, depending on the system. This does not happen for the fccontrol shim socket.

ERROR: VM with ID "3" already exists (socket: "/containerd-shim/53d9435747fdf335f1601ccebf98aa71b29f871fcdc68c595c22ca8b0a64d53d.sock")

Calling StopVM on a microVM which has been loaded from a snapshot results in an error, because we lose connection to the agent running inside the microVM.
Performance: re-creating a shim process takes about 30ms, before loading the snapshot in Firecracker, in our experiments, we haven’t yet investigated this issue. The intuition is that shim start-up should not exceed 5-10ms as it is for starting a Firecracker process.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Notes: 1. Uses logging-only branch from ustiugov/firecracker-go-sdk 2. Firecracker logs path is hard-coded. Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

firecracker update Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

* Check that shim dir exists when loading shim * No longer try to create shim dir when loading shim, as it must exist Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

kzys · 2020-09-24T21:02:28Z

This is really cool. Thanks for all the work! I have discussed with @ustiugov a few months back regarding the snapshot support but haven't updated you since then. I apologize for the lack of communication from our side.

ustiugov · 2020-09-24T21:20:10Z

hi @kzys
sure thing! we are happy to contribute. We hope that this PR can establish the ground for finalizing snapshotting support together. Also, we did quite a few performance breakdown studies using this code and the boot-based baseline with different functions that we can contribute too once our submitted paper is published.

kzys · 2020-10-05T22:41:51Z

For starting a micro VM, the proposed API asks clients to call;

CreateVM to start a Firecracker process and boot Linux inside.
Offload to kill the Firecracker process.
LoadSnapshot to start a new Firecracker process again.

However, if a client is going to call LoadSnapshot, CreateVM doesn't have to load a guest OS. Is that correct understanding?

I think it would be better to provide a way to create a brand-new VM from its snapshot, rather than "offloading" a booted VM. Could you explain the reasons regarding the design decision?

ustiugov · 2020-10-08T14:22:53Z

@kzys Sorry for the late response.

We expect that a workflow where an idle VM can be snapshotted followed by freeing its resources with Offload (i.e., killed) on the same physical host. When a new request comes for this VM, the orchestrator (i.e., the containerd client) can restore the VM into a newly created Firecracker process.

LoadSnapshot allows to load the VM state and resume the VM from the same exact point where the VM was previously Offloaded. Compared to StartVM, LoadSnapshot expects certain files to be present before loading the VM state (including the guest memory). These files include the disk image (the two disk drives, namely fc-dev-thinpool-snap-9 and ctrstub0for the first started VM) and the tap device with the same exact characteristics.

As opposed to StopVM, Offload preserves the disk drives whereas the tap is recreated manually (by our custom orchestrator). To simplify the procedure, Offload does NOT remove the VM's shim folder, keeping all the files in place.

I hope these explanations make sense although we are open for the feedback.

kzys · 2020-10-08T16:58:48Z

Thanks! Now I can understand the assumption here, but I'd like to give more options to customers regarding how to keep micro VM's artifacts.

Moving micro VMs between multiple hosts would be beneficial for customers. For example, if a host is having a hardware failure or some system updates such as updating Linux kernel, the customer may want to keep their micro VMs in somewhere (e.g. cloud storage) and run them in somewhere else.

While don't want to directly integrate cloud storage clients in firecracker-containerd, we should make that possible for higher-level orchestrators such as Fargate.

Would you mind if I ask you to split this pull request into a few ones? Pause, Resume and CreateSnapshot are relatively straightforward. We will move some of the implementation from here to the SDK, but the proposed APIs look good to me. The rest may need more design discussions upfront.

ustiugov · 2020-10-08T17:40:32Z

@kzys Thank you! Indeed, the scenario that you mentioned makes a lot of sense. We took the easier path, by limiting our scenario to the same physical host, because we are still not clear what firecracker-containerd can assume about the disk state of VMs (both the data and the emulated devices). This is the only missing piece that we miss here and would really like to hear your opinion/advice on.

sure, we can split the PR. should we create the following PRs:

PR1 with current implementation of Pause, Resume and CreateSnapshot. Should we merge this one first, and then you will move some parts of the code to the SDK?
PR2 with everything else, namely LoadSnapshot and Offload. Then, we discuss how firecracker-containerd should work with the disk-related state.

What do you think?

kzys · 2020-10-09T22:19:09Z

What do you mean by the disk state of VMs? It would be better to make less assumptions and let customers decide what they want to do.

Regarding the PRs, the split makes sense but I'd like to have Pause, Resume and CreateSnapshot on the SDK first. We are going to have the SDK's 0.22.0 release next week. Could you help us to have the APIs on the SDK after the release?

ustiugov · 2020-10-09T23:11:19Z

@kzys I call the disk state the following: 1) the images pulled by firecracker-containerd; 2) the devmapper block devices.
According to the current way in which firecracker-containerd manages the VM lifecycle, both must be present on the host machine before loading a VM from a snapshot. So, unless you plan on revisiting these design decisions, we need to find a way to reconstruct this state on the target host. I think that there should be a better way than complete block device's content migration.

Yes, I think that we should be able to get Pause, Resume and CreateSnapshot to the SDK first. @plamenmpetrov is our leading developer here.

plamenmpetrov · 2020-10-14T06:39:25Z

@kzys Just to clarify, version 0.22.0 of the SDK does not seem to offer support for Pause, Resume, or CreateSnapshot. In that case, would you suggest that we wait for the SDK's 0.22.0 release and then submit a PR to the SDK implementing these calls? Once this is done, then we can use the SDK to implement the calls in firecracker-containerd.

kzys · 2020-10-27T22:03:14Z

@plamenmpetrov Yes! We've finally released 0.22.0 and Firecracker has released 0.23.0. It would be awesome if you could port Pause, Resume and CreateSnapshot changes to the SDK.

ustiugov · 2020-11-11T12:22:44Z

@kzys we merged PauseVM, ResumeVM, and CreateSnapshot into the sdk. what should be our next steps?

kzys · 2020-11-13T00:58:14Z

Thanks for the contribution. The next step would be using the new SDK APIs from firecracker-containerd.

While CreateSnapshot is probably complicated since we need to keep a container's root filesystem in addition to the microVM, Pause and Resume should be straightforward.

Can you make a new PR that exposes Pause and Resume from firecracker-containerd?

ustiugov · 2021-04-02T21:38:47Z

@kzys are there any plans on finalizing the snapshots support? maybe, we could start discussing the next steps

haikyuu · 2023-05-04T10:41:00Z

Any updates to the snapshots feature? Now that loading snapshots is available in go SDK v1

plamenmpetrov added 9 commits September 22, 2020 17:50

Implemented resume and pause call chain.

4bb428d

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Removed deprecated logging

6856f1b

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Added support for creating and loading snapshots of VM.

58dcb44

Notes: 1. Uses logging-only branch from ustiugov/firecracker-go-sdk 2. Firecracker logs path is hard-coded. Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Altered buildVMConfiguration tests

c810db3

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Added dialer for firecracker socket

e0bf2b7

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

kill shim functionality

c2c9057

firecracker update Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Remove unnecessary mkdir

407e26a

* Check that shim dir exists when loading shim * No longer try to create shim dir when loading shim, as it must exist Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

Use SIGKILL, remove patchDrive artifacts

b78efd2

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

store firecracker logs in shimDir

e2520f3

Signed-off-by: Plamen Petrov <plamb0brt@gmail.com>

plamenmpetrov mentioned this pull request Nov 4, 2020

Implement PauseVM, ResumeVM, and CreateSnapshot firecracker-microvm/firecracker-go-sdk#278

Merged

plamenmpetrov mentioned this pull request Nov 20, 2020

Add PauseVM and ResumeVM support #460

Merged

RoyceDavison changed the base branch from master to main April 22, 2021 00:09

ustiugov mentioned this pull request Mar 10, 2022

LoadSnapshot firecracker-microvm/firecracker-go-sdk#395

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firecracker Snapshots Support #448

Firecracker Snapshots Support #448

plamenmpetrov commented Sep 22, 2020

kzys commented Sep 24, 2020

ustiugov commented Sep 24, 2020

kzys commented Oct 5, 2020 •

edited

ustiugov commented Oct 8, 2020

kzys commented Oct 8, 2020

ustiugov commented Oct 8, 2020

kzys commented Oct 9, 2020 •

edited

ustiugov commented Oct 9, 2020

plamenmpetrov commented Oct 14, 2020

kzys commented Oct 27, 2020

ustiugov commented Nov 11, 2020

kzys commented Nov 13, 2020

ustiugov commented Apr 2, 2021

haikyuu commented May 4, 2023

Firecracker Snapshots Support #448

Are you sure you want to change the base?

Firecracker Snapshots Support #448

Conversation

plamenmpetrov commented Sep 22, 2020

Summary

API Extensions Description

ResumeVM, PauseVM and CreateSnapshot

Offload

LoadSnapshot

Limitations

kzys commented Sep 24, 2020

ustiugov commented Sep 24, 2020

kzys commented Oct 5, 2020 • edited

ustiugov commented Oct 8, 2020

kzys commented Oct 8, 2020

ustiugov commented Oct 8, 2020

kzys commented Oct 9, 2020 • edited

ustiugov commented Oct 9, 2020

plamenmpetrov commented Oct 14, 2020

kzys commented Oct 27, 2020

ustiugov commented Nov 11, 2020

kzys commented Nov 13, 2020

ustiugov commented Apr 2, 2021

haikyuu commented May 4, 2023

kzys commented Oct 5, 2020 •

edited

kzys commented Oct 9, 2020 •

edited