Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] 2020 Roadmap #1104

Closed
raduweiss opened this issue May 24, 2019 · 12 comments
Closed

[RFC] 2020 Roadmap #1104

raduweiss opened this issue May 24, 2019 · 12 comments
Assignees

Comments

@raduweiss
Copy link
Contributor

raduweiss commented May 24, 2019

This is the time of the year when we plan for 2020, and we're eager to take in proposals from everyone in the open source community. So tell us what you want to see in Firecracker by adding your ideas, asks, and use cases to this issue. We've already given the existing feature requests some thought, but we'll be happy if you come up with more! While we encourage everyone to think big, we will ensure that the resulting roadmap aligns with our mission of enabling secure, multi-tenant, minimal-overhead execution of container and function workloads.

Below is a starting (unsorted) list of items that we're likely to work on. We're keen on hearing your feedback on all of the items below. Let's also add to it!

Roadmap Proposals/Ideas

  • Accelerated Inference Option for microVMs: Use Case: Some Firecracker users may want to run inference workloads in a Firecracker microVM. Solution: [TBD]; however, GPU pass-through conflicts with Firecracker microVM memory overcommitment.
  • Block Device Encryption: Use Case: All users should ensure that data is encrypted at rest. In some environments this features is implemented by default in a lower software (or even in the hardware). When that's not the case, or when additional defense in depth is desired, Firecracker should be able to encrypt data as it is committed to the block device backing file. Solution: Implement CPU-accelerated AES encryption for block device sector writes (and decryption for reads), either with an ephemeral, or with a customer-provided key.
  • Continuous Integration Improvements: Use Case We need to support multiple platforms; current CI system is not that friendly for contributors; test reports are hard to follow. Related Issues: PR status checks hard to interpret #973. Solution: Fix all of the above :)
  • seccomp Improvements: Use Case: Currently, we use a manually vetted seccomp whitelist. However, due to the inherent complexities of how musl and glibc issue system calls this is a very brittle and labor-intensive process. Solution: [TBD]; we’d like to move beyond a hand crafted syscall whitelist towards automated generation of "to manually review" syscall differences between releases, and potentially a "blacklist" set of known exploitable syscalls, that we never whitelist.
  • Use rust-vmm Crates: Use Case: The rust-vmm project is creating a set of high-quality building blocks for virtual machine projects to use. Firecracker already use two crates from there, but we could use many more (as they mature), and end up with less code complexity. Solution: Integrate additional rust-vmm crates as they mature.
  • Device Model Fuzzing: Use Case: Raise the security bar by additional hardening of the device model. Solution: Apply device model fuzzing as part of the CI.
  • Release Artifacts: Use Case: We need to identify if the currently released artifacts are sufficient (GitHub Releases with code and binaries). Solution: We clearly want to sign our artifacts, and there will be different release binaries for ARM. Beyond that, [TBD]. Ideas so far include reproducible builds and snap packages. Related Issues: Snap Package Support #1075.
  • API Stability: Use Case: We need to give users forward-looking guarantees with regards to API compatibility across releases. Solution: [TBD].
  • Single-Step Configuration: Use Case: It takes between 3 and many API calls to configure Firecracker before starting the guest microVM. For some use cases, this adds needless latency / complexity. A one-step config method would help here. Solution: Take config data from file, stdin, or command line argument (before microVM boot). Related Issues: [UX] Pass configuration as file #923.
  • microVM Snapshot Support: Use Case: The ability to save the microVM state and restore it at a future point. Solution: Some form of functionality that allows the full state of the microVM to be saved, along with the corresponding restore functionality. Probably driven via the Firecracker API. _ Issues_: Create RW disk snapshots of running Firecracker microVMs #886, Support Checkpoint/Restore in Firecracker #1035.

Collected Feature Request Issues

We'll update al of these once we triage the roadmap, but for clarity, these are the issues that feature request issues we found and discussed:

@luxas
Copy link

luxas commented Jun 4, 2019

xref: #208
it'd be nice to see initrd support in the roadmap

@rgooch
Copy link

rgooch commented Jun 5, 2019

I would like to see a mechanism to capture traffic between the VM and the link-local address and route it over a secure channel to something running on the host. See issue #833 for context.

@raduweiss
Copy link
Contributor Author

@luxas, @rgooch, thanks for the suggestions, we've noted them down. We'll be discussing these and, then post back here.

Also, we'll keep this issue open until the end of the month. If there are proposals that can be better clarified verbally, we may also set up a conference call in the last week of June.

@rgooch
Copy link

rgooch commented Jun 12, 2019

Something else I'd like to see is the ability to freeze and save a VM (including RAM and CPU state) to persistent storage (with a specified encryption key) and later restore the VM so that all processes within the VM resume where they left off. This would allow rebooting the Hypervisor without having to restart the VMs.

@luxas
Copy link

luxas commented Jul 2, 2019

There might be good ways to do this already (pardon me if I missed it), but an easy way to monitor the guest VM's CPU/memory/disk utilization would be really nice for instrumentation tooling integrations (like Prometheus).

@raduweiss
Copy link
Contributor Author

@luxas for CPU and Memory, you should be able to do that via the cgroups that the jailer creates. For disk I guess it depends on what kind of backing file and guest file system you use, but there should be tools that tell you that either way.

@raduweiss
Copy link
Contributor Author

Everyone, thank you for your suggestions. We're working through the triage, and will update this post once we're done (we think it's going to take about 10 more days).

@petreeftime
Copy link
Contributor

There might be good ways to do this already (pardon me if I missed it), but an easy way to monitor the guest VM's CPU/memory/disk utilization would be really nice for instrumentation tooling integrations (like Prometheus).

@luxas Virtual Machine Monitors do not typically track memory usage or disk usage. This is because the VMM does not understand disk formats and memory is completely managed by the operating system running inside the VM, and not visibile to the VMM. The way to typically gather that data is to run something like collectd inside the virtual machine itself.

@andreeaflorescu
Copy link
Member

@luxas for memory utilization we have a metric called dirty_pages. This is computed using KVM_GET_DIRTY_LOG.

@luxas
Copy link

luxas commented Jul 11, 2019

Cool, thank you everyone for the responses and clarifications 😄

@luxas
Copy link

luxas commented Jul 15, 2019

Also add #1180 to the roadmap?

@raduweiss
Copy link
Contributor Author

Hi everyone!

It's time to conclude on this Roadmap RFC :), we've taken in all the suggestions, discussed them, prioritized them, in some cases asked you for feedback directly, and in some cases created prototypes to see how things would work out. Thank you for all the ideas/contributions!

There's now a GitHub project called "Roadmap" in this repository to keep track of the features status, from Researching, all the way to Shipped.

I'm also reproducing the outcome below. But before that, we also want to say that this is in no way a closed roadmap. We will always be open to discuss new ideas, and to expand our roadmap with features that add value for serverless-type function and container use-cases.

2020 Roadmap

Below is the current snapshot of the Roadmap project. Each section starts with a description of what that section means.

Researching

Feature requests/ideas that users have asked for or that we think users will like, and that fit Firecracker’s charter, but that we haven’t nailed down the design for:

Contributions Welcome

We want these in Firecracker, but it’s not something that the maintainer team expects to get around to in the next year or so (contributions are of course also welcome for all other feature requests):

Upcoming

We want these in Firecracker, and it’s something that the maintainer team expects to get around to in the next year or so.

In Development

Work towards these features is in progress:

  • microVM Snapshot Support: The ability to create a microVM snapshot is compelling for serverless use cases. It’s also a feature ask from users. We’ve been tinkering with a snapshot + restore prototype, aiming to optimize for the speed of these operations. We’re still thinking about the next steps here. Related issues: Support Checkpoint/Restore in Firecracker #1035, Create RW disk snapshots of running Firecracker microVMs #886.
  • Support Virtio Vsock [Devices] virtio vsock #650: We want to have vsock support to enable container integration, but we don't want to use vhost since that would be another attack surface to directly exposes the host kernel. Instead, we'll write another back-end for vsock.

Preview

You can test this feature out, but don’t use it in production. Feedback very much welcomed!

Shipped

Fit for production usage.

Features not Added to the Roadmap

  • Block device encryption: Prototyped and found that the complexity/IO penalty doesn’t justify the use case simplification (there are other ways to achieve this that don’t depend on Firecracker).
  • API Stability: It looks like it’s not time for this yet.
  • Use Docker CE as jailer Proposal: Use Docker CE as jailer #813: Can be implemented at a higher level in the stack (e.g., Ignite).
  • MMDS: Support callout mechanism for some/all paths MMDS: Support Callout Mechanism for Some/All Paths #833. Looks like a specific use case, and we’ll have alternatives with vsock.
  • Risc-v support RISC-V support #752: We think this is best done in rust-vmm, not Firecracker (and Firecracker can then consume it from there in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants