[RFC] 2020 Roadmap #1104

raduweiss · 2019-05-24T15:38:13Z

This is the time of the year when we plan for 2020, and we're eager to take in proposals from everyone in the open source community. So tell us what you want to see in Firecracker by adding your ideas, asks, and use cases to this issue. We've already given the existing feature requests some thought, but we'll be happy if you come up with more! While we encourage everyone to think big, we will ensure that the resulting roadmap aligns with our mission of enabling secure, multi-tenant, minimal-overhead execution of container and function workloads.

Below is a starting (unsorted) list of items that we're likely to work on. We're keen on hearing your feedback on all of the items below. Let's also add to it!

Roadmap Proposals/Ideas

Accelerated Inference Option for microVMs: Use Case: Some Firecracker users may want to run inference workloads in a Firecracker microVM. Solution: [TBD]; however, GPU pass-through conflicts with Firecracker microVM memory overcommitment.
Block Device Encryption: Use Case: All users should ensure that data is encrypted at rest. In some environments this features is implemented by default in a lower software (or even in the hardware). When that's not the case, or when additional defense in depth is desired, Firecracker should be able to encrypt data as it is committed to the block device backing file. Solution: Implement CPU-accelerated AES encryption for block device sector writes (and decryption for reads), either with an ephemeral, or with a customer-provided key.
Continuous Integration Improvements: Use Case We need to support multiple platforms; current CI system is not that friendly for contributors; test reports are hard to follow. Related Issues: PR status checks hard to interpret #973. Solution: Fix all of the above :)
seccomp Improvements: Use Case: Currently, we use a manually vetted seccomp whitelist. However, due to the inherent complexities of how musl and glibc issue system calls this is a very brittle and labor-intensive process. Solution: [TBD]; we’d like to move beyond a hand crafted syscall whitelist towards automated generation of "to manually review" syscall differences between releases, and potentially a "blacklist" set of known exploitable syscalls, that we never whitelist.
Use rust-vmm Crates: Use Case: The rust-vmm project is creating a set of high-quality building blocks for virtual machine projects to use. Firecracker already use two crates from there, but we could use many more (as they mature), and end up with less code complexity. Solution: Integrate additional rust-vmm crates as they mature.
Device Model Fuzzing: Use Case: Raise the security bar by additional hardening of the device model. Solution: Apply device model fuzzing as part of the CI.
Release Artifacts: Use Case: We need to identify if the currently released artifacts are sufficient (GitHub Releases with code and binaries). Solution: We clearly want to sign our artifacts, and there will be different release binaries for ARM. Beyond that, [TBD]. Ideas so far include reproducible builds and snap packages. Related Issues: Snap Package Support #1075.
API Stability: Use Case: We need to give users forward-looking guarantees with regards to API compatibility across releases. Solution: [TBD].
Single-Step Configuration: Use Case: It takes between 3 and many API calls to configure Firecracker before starting the guest microVM. For some use cases, this adds needless latency / complexity. A one-step config method would help here. Solution: Take config data from file, stdin, or command line argument (before microVM boot). Related Issues: [UX] Pass configuration as file #923.
microVM Snapshot Support: Use Case: The ability to save the microVM state and restore it at a future point. Solution: Some form of functionality that allows the full state of the microVM to be saved, along with the corresponding restore functionality. Probably driven via the Firecracker API. _ Issues_: Create RW disk snapshots of running Firecracker microVMs #886, Support Checkpoint/Restore in Firecracker #1035.

Collected Feature Request Issues

We'll update al of these once we triage the roadmap, but for clarity, these are the issues that feature request issues we found and discussed:

[Boot]: Support for initrd_path #208 Discussion about adding support for initrd_path
RISC-V support #752 RISC-V support
Proposal: Use Docker CE as jailer #813 Proposal: Use Docker CE as jailer
MMDS: Support Callout Mechanism for Some/All Paths #833 MMDS: Support callout mechanism for some/all paths
Create RW disk snapshots of running Firecracker microVMs #886 Create RW disk snapshots of running Firecracker microVMs
[UX] Pass configuration as file #923 Passing configuration as file instead of using API calls
PR status checks hard to interpret #973 PR status checks hard to interpret
[Platforms] Add support for CPUID customization #998 Add support for customized CPU models
Support Checkpoint/Restore in Firecracker #1035 Support Checkpoint/Restore in Firecracker
Snap Package Support #1075 Snap Package Support

The text was updated successfully, but these errors were encountered:

luxas · 2019-06-04T18:32:42Z

xref: #208
it'd be nice to see initrd support in the roadmap

rgooch · 2019-06-05T05:34:10Z

I would like to see a mechanism to capture traffic between the VM and the link-local address and route it over a secure channel to something running on the host. See issue #833 for context.

raduweiss · 2019-06-11T14:55:19Z

@luxas, @rgooch, thanks for the suggestions, we've noted them down. We'll be discussing these and, then post back here.

Also, we'll keep this issue open until the end of the month. If there are proposals that can be better clarified verbally, we may also set up a conference call in the last week of June.

rgooch · 2019-06-12T05:22:01Z

Something else I'd like to see is the ability to freeze and save a VM (including RAM and CPU state) to persistent storage (with a specified encryption key) and later restore the VM so that all processes within the VM resume where they left off. This would allow rebooting the Hypervisor without having to restart the VMs.

luxas · 2019-07-02T16:50:13Z

There might be good ways to do this already (pardon me if I missed it), but an easy way to monitor the guest VM's CPU/memory/disk utilization would be really nice for instrumentation tooling integrations (like Prometheus).

raduweiss · 2019-07-03T10:03:04Z

@luxas for CPU and Memory, you should be able to do that via the cgroups that the jailer creates. For disk I guess it depends on what kind of backing file and guest file system you use, but there should be tools that tell you that either way.

raduweiss · 2019-07-03T10:05:46Z

Everyone, thank you for your suggestions. We're working through the triage, and will update this post once we're done (we think it's going to take about 10 more days).

petreeftime · 2019-07-03T10:12:34Z

There might be good ways to do this already (pardon me if I missed it), but an easy way to monitor the guest VM's CPU/memory/disk utilization would be really nice for instrumentation tooling integrations (like Prometheus).

@luxas Virtual Machine Monitors do not typically track memory usage or disk usage. This is because the VMM does not understand disk formats and memory is completely managed by the operating system running inside the VM, and not visibile to the VMM. The way to typically gather that data is to run something like collectd inside the virtual machine itself.

andreeaflorescu · 2019-07-03T10:14:24Z

@luxas for memory utilization we have a metric called dirty_pages. This is computed using KVM_GET_DIRTY_LOG.

luxas · 2019-07-11T16:01:13Z

Cool, thank you everyone for the responses and clarifications 😄

luxas · 2019-07-15T14:52:38Z

Also add #1180 to the roadmap?

raduweiss · 2019-07-15T15:15:01Z

Hi everyone!

It's time to conclude on this Roadmap RFC :), we've taken in all the suggestions, discussed them, prioritized them, in some cases asked you for feedback directly, and in some cases created prototypes to see how things would work out. Thank you for all the ideas/contributions!

There's now a GitHub project called "Roadmap" in this repository to keep track of the features status, from Researching, all the way to Shipped.

I'm also reproducing the outcome below. But before that, we also want to say that this is in no way a closed roadmap. We will always be open to discuss new ideas, and to expand our roadmap with features that add value for serverless-type function and container use-cases.

2020 Roadmap

Below is the current snapshot of the Roadmap project. Each section starts with a description of what that section means.

Researching

Feature requests/ideas that users have asked for or that we think users will like, and that fit Firecracker’s charter, but that we haven’t nailed down the design for:

Machine learning acceleration [Devices] Offer support for hardware-accelerated inference in Firecracker #1179: Doing hardware-accelerated inference in a serverless environment is compelling use case. However, adding straight up GPU passthrough means that microVM can't oversubscribe memory, and we need to add PCI emulation to Firecracker, which comes with a lot of extra complexity/attack surface. The first step here will be to research the options and alternatives (e.g., GPU passthrough, or something else), and figure out the path forward. Related issues: GPU Support #849, Virtio-vfio：Should we support virtio-vfio in the nearly future？ #776.
Host filesystem sharing Addding support for dev-tool build on Windows. #1080: This feature is in high demand, and we formerly rejected the p9-based implementation for security concerns. With the advent of things like virtio-fs and other ideas of how to achieve this functionality, we will need to research our options and revisit the threat model impact. Related issues: Share filesystem between Firecracker guest and Ubuntu host systems? #889.
Add Support for CPUID Customization [Platforms] Add support for CPUID customization #998: Although Firecracker has supported CPUID and provided templates, it only supports EC2 C3 and EC2 T2 instance types. This item looks at adding support for customized templates.

Contributions Welcome

We want these in Firecracker, but it’s not something that the maintainer team expects to get around to in the next year or so (contributions are of course also welcome for all other feature requests):

Initrd support [Boot]: Support for initrd_path #208: With the initial API when specifying a kernel path, you could also specify the path of the initrd file.
Snap Package Support Snap Package Support #1075.

Upcoming

We want these in Firecracker, and it’s something that the maintainer team expects to get around to in the next year or so.

seccomp Improvements Bundle 1 [Hardening] Seccomp improvements #1177: The current way our seccomp filters are set up can be improved. Issues included in this bundle: Seccomp: apply different filters depending on thread #1178, Seccomp: Improve syscall allow-listing / deny-listing Process #1008, The seccomp filter should check the most frequently used syscalls first #1022, Firecracker panics cause a seccomp violation #1088, Remove timerfd_create() syscall from the seccomp allow-list #962, SYS_open and SYS_mmap get denied by the seccomp filter #485.
Continuous Integration Improvements Bundle 1 [Devel] CI improvements bundle 1 #1181: Our continuous integration system can use a number of upgrades. [TODO]: List specific improvements we target for 2019. Related issues: PR status checks hard to interpret #973.
Use all Applicable rust-vmm Crates: 2020 Edition [Devel] Use rust-vmm/vm-superio|linux-loader|event-manager|vmm-sys-util #1182: Firecracker should use all applicable rust-vmm crates. [TODO]: List specific crates we plan to use in 2019/2020.
Firecracker Release Improvement Bundle 1 [Devel] Firecracker release improvements bundle 1 #1183: Sign Firecracker release artifacts; Ensure reproducible builds; Keep documentation in sync with the latest release.
Run Firecracker without API [UX] Run Firecracker without API #1165.
Pass Configuration as File [UX] Pass configuration as file #923.
Apply Continuous Fuzz Testing to the Firecracker Device Model [Hardening] Continuously fuzz all Firecracker guest-facing attack surfaces #737.

In Development

Work towards these features is in progress:

microVM Snapshot Support: The ability to create a microVM snapshot is compelling for serverless use cases. It’s also a feature ask from users. We’ve been tinkering with a snapshot + restore prototype, aiming to optimize for the speed of these operations. We’re still thinking about the next steps here. Related issues: Support Checkpoint/Restore in Firecracker #1035, Create RW disk snapshots of running Firecracker microVMs #886.
Support Virtio Vsock [Devices] virtio vsock #650: We want to have vsock support to enable container integration, but we don't want to use vhost since that would be another attack surface to directly exposes the host kernel. Instead, we'll write another back-end for vsock.

Preview

You can test this feature out, but don’t use it in production. Feedback very much welcomed!

AMD Support [Platforms] AMD CPU support #651.
ARM Support [Platforms] Arm CPU support #648.

Shipped

Fit for production usage.

Features not Added to the Roadmap

Block device encryption: Prototyped and found that the complexity/IO penalty doesn’t justify the use case simplification (there are other ways to achieve this that don’t depend on Firecracker).
API Stability: It looks like it’s not time for this yet.
Use Docker CE as jailer Proposal: Use Docker CE as jailer #813: Can be implemented at a higher level in the stack (e.g., Ignite).
MMDS: Support callout mechanism for some/all paths MMDS: Support Callout Mechanism for Some/All Paths #833. Looks like a specific use case, and we’ll have alternatives with vsock.
Risc-v support RISC-V support #752: We think this is best done in rust-vmm, not Firecracker (and Firecracker can then consume it from there in the future).

raduweiss added the Roadmap: New Request label May 24, 2019

raduweiss self-assigned this May 24, 2019

GreenCee mentioned this issue May 24, 2019

open roadmap and planning cnabio/cnab-spec#163

Closed

luxas mentioned this issue Jun 4, 2019

[Boot]: Support for initrd_path #208

Closed

marcov mentioned this issue Aug 30, 2019

bzImage / initrd support (v2) #670

Closed

marcov mentioned this issue Sep 6, 2019

loader: add support for initrd #1246

Merged

7 tasks

alsrdn closed this as completed Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] 2020 Roadmap #1104

[RFC] 2020 Roadmap #1104

raduweiss commented May 24, 2019 •

edited

Loading

luxas commented Jun 4, 2019

rgooch commented Jun 5, 2019

raduweiss commented Jun 11, 2019

rgooch commented Jun 12, 2019

luxas commented Jul 2, 2019

raduweiss commented Jul 3, 2019

raduweiss commented Jul 3, 2019

petreeftime commented Jul 3, 2019

andreeaflorescu commented Jul 3, 2019

luxas commented Jul 11, 2019

luxas commented Jul 15, 2019

raduweiss commented Jul 15, 2019

[RFC] 2020 Roadmap #1104

[RFC] 2020 Roadmap #1104

Comments

raduweiss commented May 24, 2019 • edited Loading

Roadmap Proposals/Ideas

Collected Feature Request Issues

luxas commented Jun 4, 2019

rgooch commented Jun 5, 2019

raduweiss commented Jun 11, 2019

rgooch commented Jun 12, 2019

luxas commented Jul 2, 2019

raduweiss commented Jul 3, 2019

raduweiss commented Jul 3, 2019

petreeftime commented Jul 3, 2019

andreeaflorescu commented Jul 3, 2019

luxas commented Jul 11, 2019

luxas commented Jul 15, 2019

raduweiss commented Jul 15, 2019

2020 Roadmap

Researching

Contributions Welcome

Upcoming

In Development

Preview

Shipped

Features not Added to the Roadmap

raduweiss commented May 24, 2019 •

edited

Loading