Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device Passthrough ( Most notably, GPU ) #108

Open
qrpike opened this issue May 20, 2016 · 19 comments

Comments

@qrpike
Copy link

@qrpike qrpike commented May 20, 2016

I know it's currently not possible, but if/when would it be realized to be able to pass through device access.

Main question is for the GPU so the linux VM can spin up/down machine learning containers.

Thanks,

@xez

This comment has been minimized.

Copy link
Collaborator

@xez xez commented May 27, 2016

Probably not at the PCIe level. I don't know if there are any common paravirtualized interfaces for GPGPU?

@brainstorm

This comment has been minimized.

Copy link

@brainstorm brainstorm commented Jul 4, 2016

A pity that PCI passthrough is not in place for xhyve from bhyve... as I gather in the docs perhaps by design choice as stated in xhyve's README.md?:

(...) xhyve is equivalent to the bhyve process but gains a subset of a userspace port of the vmm kernel module. SVM, PCI passthrough and the VMX host and EPT aspects are dropped.

Since @NVIDIA cannot implement support for Docker on it for OSX.

@bms

This comment has been minimized.

Copy link

@bms bms commented Jul 4, 2016

Look at XenServer's design for this.

@brainstorm

This comment has been minimized.

Copy link

@brainstorm brainstorm commented Jul 4, 2016

@bms ... not sure what you mean by that. My goal is to be able to run GPU applications with Docker on OSX (which runs on top of xhyve). Linux GPUs+Docker is already well supported by nvidia-docker since there's no xhyve:

https://github.com/NVIDIA/nvidia-docker

So I'm not sure how Xen fits in the picture here... care to explain?

@brainstorm brainstorm mentioned this issue Jul 6, 2016
0 of 3 tasks complete
@bms

This comment has been minimized.

Copy link

@bms bms commented Jul 17, 2016

My point was that Xen (specifically Citrix XenServer) already has a mature architecture for GPU virtualization, which -- correct me if I'm wrong -- is not a feature in either BHyve or XHyve yet. How Docker encapsulates a GPU virtualization approach, I have no idea.

@pmj

This comment has been minimized.

Copy link

@pmj pmj commented Nov 28, 2016

(Stumbled across this as I'm investigating bhyve/xhyve for a project.)

I don't have any personal experience with it, but XenServer's vGPU stuff is fully Nvidia-specific. I don't know if the hypervisor/Dom0 (host) side of it is open at all.

You can do pure (non-mediated) PCI(e) passthrough with bhyve on FreeBSD and indeed Xen and KVM with Qemu on Linux though; this works via a kernel driver which claims the device on the host (vfio on Linux), and programming the IOMMU so the device's DMA can only access the VM's memory. Graphics card passthrough adds extra difficulty, but that's mostly at the firmware/initialisation level.

For basic PCIe passthrough on OSX/MacOS hosts, I guess the first place to look would be Apple's VT-d driver, which is loaded by default on Ivy Bridge and newer Macs as far as I'm aware. This controls the IOMMU. I've not dealt with this directly beyond writing (PCIe) device drivers for OSX, where DMAs need to select whether they want to use IOMMU address translation or not, but from this I have a sneaking suspicion that Apple just puts all devices in one IOMMU group and that's it. That approach wouldn't be compatible with isolating a selection of devices for assigning to a VM. I certainly don't see an API there at first glance that a passthrough host driver might be able to call. So implementing this could well require extending Apple's VT-d driver, which will probably require the expertise of someone who understands VT-d really, really well. (Or gaining that expertise; the official documentation is very daunting, however.)

Note also that if you're going to pass through one of your Mac's GPUs, the passthrough driver will need to claim it during early boot and make it completely unavailable to the host OS's graphics drivers, as WindowServer currently does not support any kind of hot-enabling/hot-disabling of IOFramebuffer devices.

@Manouchehri

This comment has been minimized.

Copy link

@Manouchehri Manouchehri commented Apr 16, 2017

@pmj Apple uses VT-d domains. Loukas/snare's BruCON 0x06 Thunderbolt talk covered it. (Note: snare works for Apple now and this research was done way back in 2014, so some of the security concerns aren't applicable anymore.)

https://www.youtube.com/watch?v=epeZYO9qFbs&feature=youtu.be&t=2068

https://developer.apple.com/library/content/documentation/HardwareDrivers/Conceptual/ThunderboltDevGuide/DebuggingThunderboltDrivers/DebuggingThunderboltDrivers.html

@3XX0 3XX0 mentioned this issue Apr 17, 2017
@RockNHawk

This comment has been minimized.

Copy link

@RockNHawk RockNHawk commented Aug 19, 2017

+1

1 similar comment
@westover

This comment has been minimized.

Copy link

@westover westover commented Oct 1, 2017

+1

@pmj

This comment has been minimized.

Copy link

@pmj pmj commented Oct 1, 2017

@Manouchehri Looks like you might be right, skimming through the VTd driver source some more, it looks like a new space is created for each mapper, and it would appear that most PCI devices get their own mapper, and kexts can explicitly ask for that mapper. It's still not obvious how the connection to a device's specific mapper is made from a particular IOMemoryDescriptor/IODMACommand in the usual case of using the "system" mapper, which is the case in most device drivers. You'll notice the documentation you linked refers through to this doc where they go into code specifics. Neither of the key calls, IODMACommand::prepare and IODMACommand::gen64IOVMSegments reference the device, so it's not clear how the system works out what device you're going to give those DMA addresses to. The other question is how well all of this works together with the Hypervisor.framework's VM memory mappings. Still: I'd have to look into it in much more detail, but it does look doable. I doubt I'll ever get around to it in my spare time though.

@alexkreidler

This comment has been minimized.

Copy link

@alexkreidler alexkreidler commented Oct 17, 2017

Is this on the roadmap at all?

@lccro

This comment has been minimized.

Copy link

@lccro lccro commented Nov 6, 2017

+1

@pmj

This comment has been minimized.

Copy link

@pmj pmj commented Nov 7, 2017

Is xhyve currently even actively maintained? Is there such a thing as a roadmap? If not - is there commercial interest in supporting/developing xhyve further?

@rickard-von-essen

This comment has been minimized.

Copy link
Contributor

@rickard-von-essen rickard-von-essen commented Nov 7, 2017

@pmj see #124

@pmj

This comment has been minimized.

Copy link

@pmj pmj commented Jan 21, 2018

I've recently taken a deep dive on the current macOS VT-d code, and it seems that @Manouchehri's assertion is correct - each PCI device does indeed end up with its own VT-d mapper and thus domain/space. The bhyve/FreeBSD PCI passthrough code is reasonably straightforward as far as MMIO, interrupts, etc. are concerned. One thing I've yet to investigate is whether there are any dragons lurking in getting the entire VM-physical to host-physical mapping table, which would need to be fed to the IOMMU. In theory this shouldn't be a problem, but who knows, Hypervisor.framework might expect VM memory to be host-pageable or some other assumption.

As people have mentioned GPU passthrough: FreeBSD/bhyve documentation declares that GPU passthrough is not supported. I don't know what black magic would be required to make this happen. Is anyone even interested in non-GPU passthrough?

@Manouchehri

This comment has been minimized.

Copy link

@Manouchehri Manouchehri commented Jan 21, 2018

@pmj My use case is passing through a ConnectX-3 Pro card (40/56GbE fiber).

@lterfloth

This comment has been minimized.

Copy link

@lterfloth lterfloth commented Dec 29, 2018

Any information on the progress? Does it work yet? 👍

@pmj

This comment has been minimized.

Copy link

@pmj pmj commented Dec 29, 2018

Any information on the progress? Does it work yet? 👍

@lterfloth If you're asking me personally: I haven't worked on this beyond the initial research convincing myself that it's possible with Apple's VT-d implementation. It's a fairly large chunk of work -- probably a few weeks of the initial burst of development, followed by who knows how much time debugging and tweaking with real devices and guest OSes/drivers for all the various edge cases etc. This is certainly more than I have spare time for (or can blow off paid contracts for), so at least for me personally, it's not happening unless someone sponsors it or I end up in a situation where I need the feature desperately enough to invest in it. I obviously can't speak for anyone else!

@marcj

This comment has been minimized.

Copy link

@marcj marcj commented Jul 18, 2019

unless someone sponsors it

@pmj, so how much do you need? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.