Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Use Docker CE as jailer #813

Closed
frank-dspeed opened this issue Jan 6, 2019 · 26 comments
Closed

Proposal: Use Docker CE as jailer #813

frank-dspeed opened this issue Jan 6, 2019 · 26 comments
Assignees

Comments

@frank-dspeed
Copy link

There are a few Jailers already around that support creation of Linux/BSD Jails the two biggest are Docker CE (containerD, moby) and SystemdNS

I would love to see that we get this Project to choose one of this Options as additional jailer

It will enable firecracker to be spawned in all this environments so it can be deployed in a local Kubernetes Cluster (containerd & moby) or CoreOs installation (SystemdNS)

@alexandruag alexandruag added the Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled label Feb 15, 2019
@alexandruag
Copy link
Contributor

Hi, we didn't get the chance to investigate the implications of this proposal, but it does sound quite interesting. We'll update the issue after having a look.

@frank-dspeed
Copy link
Author

@alexandruag no problem we all got time 👍

@MalteJ
Copy link

MalteJ commented Apr 1, 2019

Sounds very interesting, especially due to the Docker API that allows an easy automation of spinning up firecracker instances!
Any news on this topic?

@samuelkarp
Copy link

@alexandruag I think the right component to investigate as a replacement of the jailer (if desired) is runc. This is the low-level container runtime used by most of the popular higher-level tools like Docker, containerd, and CRI-O. systemd-nspawn would also potentially be a reasonable option, but is a bit more tightly coupled to systemd than runc is.

@frank-dspeed If you're interested in running container workloads in Firecracker, we're working on that in the firecracker-containerd project. There is also Kata Containers, which has a working implementation with Firecracker and Kubernetes.

@MalteJ
Copy link

MalteJ commented Apr 2, 2019

I tried to start firecracker within Docker but still have some problems:

root@firecracker:~# docker run -it --rm \
    -v /dev:/dev \
    -v /tmp/firecracker-socket:/run/firecracker \
    --net host \
    --privileged \
    mjanduda/firecracker \
    /opt/firecracker/bin/firecracker --api-sock /run/firecracker/firecracker.socket
2019-04-02T12:00:57.469944319 [anonymous-instance:ERROR:vmm/src/sigsys_handler.rs:70] Shutting down VM after intercepting a bad syscall (288).
2019-04-02T12:00:57.470275839 [anonymous-instance:ERROR:vmm/src/sigsys_handler.rs:76] Failed to log metrics while stopping: Logger was not initialized.

The syscall error comes up after I tried to define the kernel via the API.
Any ideas what I'm missing here?

@MalteJ
Copy link

MalteJ commented Apr 2, 2019

ahh, I have compiled with 1.33. I'll try 1.32. (see also #997)

@alexandruag
Copy link
Contributor

Hi everyone, just picked this up (for real this time :D). I'll have a look at all these things you mentioned and try to figure out if/how they can fit together. Also, just wondering, any particular reason why you would rather build Firecracker via cargo build (or something equivalent), instead of using our tools/devtool build thingie?

@alexandruag alexandruag self-assigned this Apr 2, 2019
@MalteJ
Copy link

MalteJ commented Apr 2, 2019

Also, just wondering, any particular reason why you would rather build Firecracker via cargo build (or something equivalent), instead of using our tools/devtool build thingie?

I wanted to add vsock and didn't know how to add this using the devtool 😊

@alexandruag
Copy link
Contributor

Something like tools/devtool build -- --features vsock should work. All the extra parameters after -- are passed on to cargo build.

@MalteJ
Copy link

MalteJ commented Apr 2, 2019

I could start a Micro-VM within Docker using

docker run -it --rm \
    -v /dev/kvm:/dev/kvm \
    -v /run/firecracker:/run/firecracker \
    -v /root:/root \
    --privileged \
    mjanduda/firecracker \
    /opt/firecracker/bin/firecracker --api-sock /run/firecracker/firecracker.socket

(I share /root because there are my kernel & rootfs images)

The container is simply a debian with a firecracker executable:

FROM debian:latest
RUN mkdir -p /opt/firecracker/bin
COPY firecracker/build/debug/firecracker /opt/firecracker/bin/
WORKDIR /opt/firecracker
CMD ["/opt/firecracker/bin/firecracker", "--api-sock", "/run/firecracker/firecracker.socket"]

Now it's about choosing the required capabilities and maybe narrowing down the seccomp rules to get rid of --privileged.
The target would be to provide a recommended docker run command as well as an OCI spec for runc in the firecracker docs.

@frank-dspeed
Copy link
Author

@MalteJ thats correct and works but first of all let me explain a bit: we don't want to run firecracker inside docker this way we want to enable firecracker to work on docker.

@MalteJ
Copy link

MalteJ commented Apr 2, 2019

@frank-dspeed what do you mean exactly with "firecracker to work on docker"?
I understood you want to replace the jailer with docker!?

@frank-dspeed
Copy link
Author

@MalteJ your right replacing the jailer with docker and offer it as a additional jailer

@alexandruag
Copy link
Contributor

Hello everyone,

I've tried spending some time to find something like a unifying perspective over jailing, but ultimately the Firecracker jailer is about implementing our recommendations for building a secure sandbox around each Firecracker process. I didn't attempt to replicate everything the jailer does (outlined here), but it seems possible to achieve the same using something like docker/runc and extra scripting.

We're not actively trying to come up with other options right now, but we're very interested in any use cases you might have which don't align with the jailer, what's missing, and what you consider building as an alternative.

@style95
Copy link

style95 commented Apr 30, 2019

I am new to this area but I think users want to run Firecracker VM with docker run command.
Since Docker is commonly used in many container-based platforms, they could easily benefit from Firecracker with this appoach.
Since docker is comprised of the high/low level of runtimes, it would be best if we can transparently replace fundamental runtime to Firecracker.
So end users can use docker command as is, but docker will create a microVM instead of a conatiner.

@nmeyerhans
Copy link
Contributor

@style95 What you describe is effectively what the firecracker-containerd project and kata containers projects are working to build. As you describe, it's not so much the use of Docker as the jailer as it is the ability to run unmodified containers using familiar tools with the addition of an enhanced security boundary provided by firecracker.

Using Docker in place of the current wouldn't really support such workflows. You'd still be responsible for dealing with a per-application microVM root image in a custom format, handling instantiation and configuration of the microVM, configuring the network stack, etc.

If you're interested in using firecracker to enhance the isolation boundary of you containerized applications, please consider contributing to either (or both) of the projects I mentioned above.

@style95
Copy link

style95 commented Apr 30, 2019

@nmeyerhans Thanks you so much for the detail explanation.
I will look into those projects.

@frank-dspeed
Copy link
Author

@nmeyerhans the jailer part comes only from the documentation thats why i suggested it. I did not evaluate other Scenarios but i am happy that others did pickup the idea. Out of my view it looked like when we could replace the jailer with docker then we could create jailed containers.

But i am not sure now what is the right path:

  • firecracker inside docker
  • docker inside firecracker

@MalteJ
Copy link

MalteJ commented Apr 30, 2019

I don't understand why you want to force firecracker users to use the firecracker-jailer and not use container technologies they are already using in their stack.
It's not about running docker containers in firecracker. It's about providing a guideline how to jail firecracker microVMs using established jailing/container technologies like runc/oci/docker.

@MalteJ
Copy link

MalteJ commented Apr 30, 2019

honestly I don't understand why you have implemented an own jailer at all.

@raduweiss
Copy link
Contributor

raduweiss commented Apr 30, 2019

honestly I don't understand why you have implemented an own jailer at all.

@MalteJ the way I think about this is not that we built our own jailer, but that out of the box, we provide one minimalist, tightly coupled, and efficient way of isolating a Firecracker microVM . You can definitely use Firecracker without the jailer, or with another jailing technology, it just means that you'll be responsible for tweaking the settings to ensure that things are secure and work well (e.g., when setting up thousands of jails for thousands of microVMs on the same host, and while starting and stopping them all the time). We'd be happy to see integrations with other jailing technologies.

As for users doing something like docker run and getting their workload to run in a microVM, yes, Ia agree with you 100%! This is what we want for users! As @nmeyerhans mentions above, this is in effect what the firecracker-containerd project and kata containers project are doing, and that's fully the direction we want to go in.

Some specific reasons why we made a separate jailer:

  • Keeping with our minimalism tenet, we wanted to have as little code as possible that does specific and tightly coupled jailing for Firecracker, and nothing more (less code is easier to audit, and represents a smaller attack surface). The other jailers folks mentioned are solid, and well tested pieces of technology, but they need to take many more use cases into account, and so they have much larger code bases. Firecracker's jailer is ~1.2k lines of Rust.
  • With thousands of microVMs potentially running on the same host (which is one of the main use cases for Firecracker), we want to keep the compute resource usage per microVM low. The Firecracker jailer drops privileges and then execs into Firecracker, thus keeps all the same process resources. This approach helped us stay under our 5 MiB of memory per Firecracker microVM limit.
  • As Firecracker changes, we need to know that we can adjust the jailer properties without much hassle. If we used a more generic jailer, it might be that at times we would need to features that aren't there (or, we would want to disable features that are there by default), and this may not be in line with that jailer's software's direction.
  • Like the main Firecracker binary, we can statically compile the jailer against musl to produce a binary that's more platform independent, reducing the dependencies for running a jailed Firecracker.

So to sum it up, yes, we want to give users the ability to isolate their containers in VMs, as transparently as possible. This is something that other projects are working on, and something we support. It will be up to those projects how to use the jailer (or if they want to use it at all). But out of the box, we provide what we think is a simple and effective way to jail Firecracker at scale.

@MalteJ
Copy link

MalteJ commented Apr 30, 2019

@raduweiss thanks for the clarification :)

@raduweiss
Copy link
Contributor

raduweiss commented Jul 15, 2019

@frank-dspeed, @MalteJ, we've looked at this again as part of our 2020 Roadmap exercise, but in the end, beyond the arguments above, it looks like this can be implemented at a higher level in the stack (e.g., like Ignite did).

Therefor, I'll be closing this now. Feel free to re-open it if you want to continue the conversation :)

@raduweiss raduweiss removed the Priority: Medium Indicates than an issue or pull request should be resolved ahead of issues or pull requests labelled label Jul 15, 2019
@luxas
Copy link

luxas commented Jul 15, 2019

@raduweiss Hi 👋!
(Sorry, didn't read all of the comments here before, but...)

If I understand it correctly, it'd be very nice to make it possible for the jailer to not do a specific set of things (e.g. the chroot, cgroups, maybe the numa node stuff), to let higher-level tools like Ignite run Firecracker in a container (like we do), but also run the jailer there to filter syscalls, reassign uid/gid, etc. Letting the container runtime take care of the cgroups, chroot, netns and similar and the jailer take care of syscall filtering would be a very good split of responsibilities IMO.

xref related Ignite issue weaveworks/ignite#68 (comment)

@raduweiss
Copy link
Contributor

I think the syscall filtering is now taken care of in Firecracker directly. But I understand the general idea. Do you want to open an issue about it? I think it's different than this one.

@luxas
Copy link

luxas commented Jul 15, 2019

Ok. Sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants