New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prototype support for macvtap interfaces #2217
Conversation
Hi @upxe , Thanks for the contribution ! This might be very useful. We would like to do some performance tests first, and we can discuss next steps after. In the meanwhile, just dropping a few questions:
|
I think that this depends on how you name your tap devices. The default (on Debian at any rate) isn't very useful: the However, This all means that we would probably need to distinguish between tap and macvtap interfaces at the configuration level, rather than relying on the interface name looking like
Haven't look into this yet - I raised the PR to see whether there was interest in principle in supporting this before diving in too deep. In theory, shouldn't be a problem, because once you've opened the device it's more-or-less a normal tap device. I shall have a look, but have had a few problems running the test suite - the whole unit test + integration test run ( |
Hi @upxe. I will redo the tests and come back with the exact test scenario and numbers. |
Being able to specify the absolute path to the taped is extremely useful, as it allows unique names . In an embedded environment, there is the desire to put each VM into a network namespace. Using MacVTap makes the configuration easier, as the Macvtap can be attached directly to the |
Also asked for in #1933 |
Hi @upxe. I think that we can start working towards merging this, but the first step would be to point the PR towards a feature branch. I've created one called feature/macvtap - please update the PR when you have the time. |
I've updated the PR; unless you're happy with the current approach (which was just a PoC, and I think is a bit too simple-minded), it would be useful to have a steer on how you want to proceed, e.g.
Apart from the above, I'm aware at the moment that I need to:
|
@upxe, indeed, all the items that you mentioned will have to be added, but I haven't looked into what the best approach would be to steer it towards the final implementation, given all the implications that it may have with snapshotting and other areas such as isolation between microVMs. This is why, as a first step, we should start working towards merging it into a feature branch. Related to your first two bullets, preferably we would like to have both Also, I would also test it through the Python framework, not only using rust unit tests. The snapshotting feature would also have to be tested through Python tests when macvtap snapshots are restored. |
Hi @upxe! Do you still plan on contributing to this PR? |
I do; the day job intervened in December. Might get a chance to look at it this weekend. btw, I had (thought that I'd) put it onto a feature branch on 28 Nov so was a little confused by @gc-plp's comment about "merging towards a feature branch" - if I haven't interpreted that correctly then please let me know. Thanks. |
@upxe Been going through this and this PR is a great addition to FC, thank you so much. It happens to be functionality that is a blocker on a concept we are looking at. |
I'm looking at Jailer integration and there's a snag: macvtap interfaces are (AFAICT - shall investigate further) accessed via per-interface device nodes in /dev, rather than via /dev/net/tun. These individual device nodes aren't in the jailed firecracker's view of /dev, and the Jailer can't create then (like it does for /dev/kvm and /dev/net/tun) because it doesn't know which interfaces the VM will use. Options that I can think of (unless I can somehow make the interfaces accessible via /dev/net/tun):
Do the maintainers have any views on those? I'm minded to prefer option 2, unless the host may have lots of interfaces, in which case 1. |
Hi Upxe,
could you drop me a private email? I have a few questions about the various
implementation ideas.
Kind regards
Charles
…On Sat, 23 Jan 2021 at 12:38, upxe ***@***.***> wrote:
I'm looking at Jailer integration and there's a snag: macvtap interfaces
are (AFAICT - shall investigate further) accessed via per-interface device
nodes in /dev, rather than via /dev/net/tun. These individual device nodes
aren't in the jailed firecracker's view of /dev, and the Jailer can't
create then (like it does for /dev/kvm and /dev/net/tun) because it doesn't
know which interfaces the VM will use.
Options that I can think of (unless I can somehow make the interfaces
accessible via /dev/net/tun):
1.
Put a read-only sysfs mount into the chroot. Simplest and most
flexible, but exposes rather more of the host to the VM than is ideal.
2.
Jailer finds all macvtap interfaces (via sysfs) and creates device
nodes in the chroot for all such interfaces that the firecracker uid/gid
would be able to open. Potentially exposes unnecessary interfaces to the
VM, but can be avoided by assigning a unique UID/GID to each VM, which is
probably a good idea anyway. The macvtap interfaces must exist before
running the Jailer.
3.
Make a jailer user declare via command-line args the names of any
macvtap interfaces that the VM will use, so that the Jailer can create the
relevant device nodes. Puts minimal device nodes into the chroot, at the
expense of some managerial/orchestration complexity.
Do the maintainers have any views on those? I'm minded to prefer option 2.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2217 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA6C3Z4U6FI2CJXLM632FTS3KRLPANCNFSM4S5T274Q>
.
|
@upxe I think that the question to ask here would be, what happens if a sandbox escape occurs and an attacker is able to access the jailer environment? Which of the three options would ensure that the attacker doesn't disturb a neighboring VM? From this angle, option 3 seems to be ensure a minimal exposure of host devices inside the jailer. As you mentioned, there is some additional complexity passed on to the orchestrator, but a piece of extra logic would still have to be added in order to connect a microVM through a macvtap interface (e.g. creating the actual interface before launching the jailer). What's your opinion here? |
Hi @upxe |
I'm happy to go with a command line argument in the jailer to have it create some additional device nodes. The last remaining issue for now (haven't considered save/restore yet) is that I'm currently using sysfs to map interface names to device nodes, and sysfs isn't available in the jailed root. But if we're creating device nodes in the jailer then we just name them appropriately and have firecracker look for them if /sys isn't mounted. If the host has a macvtap interface called foo that the user wants to use in a VM then we end up with:
Incidentally, we can distinguish tap from mavctap via the ethtool GDRVINFO ioctl, but AFAICT we'd still need sysfs to figure out the major/minor numbers for the device node. |
e421ba8
to
bb07771
Compare
src/jailer/src/env.rs
Outdated
// Since namespaces are shared by default when creating a new process using fork or clone, | ||
// unshare() is used to disassociate the current process from the default shared namespace. | ||
SyscallReturnCode(unsafe { libc::unshare(libc::CLONE_NEWNS) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would explicitly add in the comment that this unshares the mount namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/jailer/src/env.rs
Outdated
.into_empty_result() | ||
.map_err(Error::MountSysfs)?; | ||
|
||
// Unmount the default sysfs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// Unmount the default sysfs. | |
// Unmount the current sysfs since it's describing the previous namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Utilities to detect macvtap network interfaces. We're about to need them in both jailer and firecracker, so move from the latter into a separate module. Needs unit tests Signed-off-by: Ametros <github@ametros.net>
Firecracker network interfaces use /dev/net/tun to create tap interfaces. On the host these are typically bridged, using the standard Linux bridge implementation. Linux macvtap interfaces are similar to bridges but simpler: the kernel doesn't snoop frames to maintain a FIB to map MAC addresses to interfaces, because the full topology is known to the kernel. However, macvtap interfaces are exposed to userland as /dev/tap%d device nodes, which Firecracker doesn't support. This commit adds support by checking whether the `host_dev_name` looks like a path name, and opening the device node if it does. Tested thus: # Create a dummy interface to host the macvtap interfaces, plus two # macvtap interfaces in bridge mode sudo ip link add dummy0 type dummy sudo ip link add link dummy0 name vtap0 type macvtap mode bridge sudo ip link add link dummy0 name vtap1 type macvtap mode bridge sudo ip link set dummy0 up sudo ip link set vtap0 up sudo ip link set vtap1 up sudo chmod 666 /dev/tap* # For proof-of-concept purposes only # Configure two VMs thus; use ip link to get tap d { ... { "network-interfaces" [ { ... "host_dev_name: "/dev/tapX", "guest_mac": "<see ip link output>", } ] } ... } You must use the correct MAC address in the VM, i.e. the MAC address that the host kernel has assigned to the tap interface. To create a Tap instance given an interface name X, see whether /sys/devices/virtual/net/<X>/macvtap is a directory. If so then the interface is a macvtap interface, so look in that directory for a device name and open /dev/<that name>. Otherwise use the original open_named() code to open a tap interface via /dev/net/tun. Signed-off-by: Ametros <github@ametros.net>
macvtap interfaces are opened by device node, and (AFAICT) there's no way to discover the device major/minor numbers without sysfs, which isn't mounted in the jailer chroot. This commit adds a --macvtap=foo argument to the jailer; for each such argument, look up the device major/minor numbers for macvtap interface foo and create /dev/net/foo in the jailer chroot. Signed-off-by: Ametros <github@ametros.net>
Some other changes were made that make possible creating a set up for testing the macvtap support. Signed-off-by: Diana Popa <dpopa@amazon.com>
Signed-off-by: Diana Popa <dpopa@amazon.com>
Signed-off-by: Diana Popa <dpopa@amazon.com>
Signed-off-by: Diana Popa <dpopa@amazon.com>
This commit should be revised once merging to master since we want to add unit test for the functionality. Signed-off-by: Diana Popa <dpopa@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gc-plp @georgepisaltu @acatangiu @luminitavoicu I addressed your feedback, PTAL! Thanks
Rebase please - the branch conflicts with main. |
done on upxe:macvtap; don't know how to update this pull request because it's showing as merged + closed |
Despite the fact that this is marked merged + closed, it doesn't seem to be in there? Did this "die on the vine" somewhere? This seems like a lot of work, and a lot of value to be dropped. (I would like to make use of it!) What can I do to help? |
Hi @dzenc,
This PR only got merged to a feature branch, which never made it to
If you want, you can comment on #1933 with your usecase, to see if it can gain some traction again :) |
Comment added to #1933. Thanks for listening! :) |
Firecracker uses /dev/net/tun to open/create tap interfaces. On the host these are typically bridged, using the standard Linux bridge implementation.
Linux macvtap interfaces are similar to bridges but simpler: the kernel doesn't snoop frames to maintain a FIB to map MAC addresses to interfaces and doesn't need to use flooding, because the full topology is known to the kernel.
However, macvtap interfaces are exposed to userland as /dev/tap%d device nodes, which Firecracker doesn't support. This commit adds support by checking whether the
host_dev_name
in the network configuration looks like a path name, and opening the device node if it does.Tested thus:
Configure two VMs thus; look in
/dev
for device node names; useip link
to get tap MAC address etc:Start the VMs, give them IP addresses on the same subnet; they can then ping each other etc.
You must use the correct MAC address in the VM configuration, i.e. the MAC address that the host kernel has assigned to the tap interface.
Reason for This PR
macvtap (see #1933) allows VMs on a host to be networked without using the Linux bridge code; it is simpler and not prone to ARP spoofing.
The implementation in the commit proves that macvtap-based networking works; I'm after feedback please as to whether macvtap support would be considered for merging. If so then I'd look at configuring properly rather than leeching off the existing /dev/net/tun configuration code.
Alternatively, a more flexible approach would be to allow Firecracker to adopt a file descriptor as a network interface and read/write frames to it. (In other words, in this case, open /dev/tapXX as e.g. FD 3, then exec() firecracker and have it treat FD 3 as the tap interface). This would let me do other useful things on the host, but would require a jailer change to keep the relevant FD[s] open.
Description of Changes
Extend
Tap::open_named()
invirtio/net/tap.rs
to support opening/dev/tapXX
directly, rather than using/dev/net/tun
to open/create tap interfaces. This allows Firecracker to use tap interfaces create viamacvtap
on the host.rust-vmm
.License Acceptance
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
PR Checklist
I haven't looked at these as this is an RFC; if macvtap (or passing in FDs) is acceptable then I'll submit a PR that meets them.
[Reviewer TODO: Verify that these criteria are met. Request changes if not]
git commit -s
).unsafe
code is properly documented.firecracker/swagger.yaml
.CHANGELOG.md
.