Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Devices] Offer support for hardware-accelerated inference in Firecracker #1179

Open
Tracked by #6
raduweiss opened this issue Jul 15, 2019 · 14 comments
Open
Tracked by #6
Assignees
Labels
Roadmap: Tracked Items tracked on the roadmap project. Type: Enhancement Indicates new feature requests
Projects

Comments

@raduweiss
Copy link
Contributor

Doing hardware-accelerated inference in a serverless environment is compelling use case.

However, adding straight up GPU passthrough means that microVM can't oversubscribe memory, and we need to add PCI emulation to Firecracker, which comes with a lot of extra complexity/attack surface.

The first step here will be to research the options and alternatives (e.g., GPU passthrough, or something else), and figure out the path forward.

Related issues: #849, #776.

@raduweiss raduweiss added Feature: Emulation Roadmap: Tracked Items tracked on the roadmap project. labels Jul 15, 2019
@raduweiss raduweiss added this to Researching in Roadmap Jul 15, 2019
@nlauchande
Copy link

I am very interested on this usecase.

@richardliaw
Copy link

+1, very interested in this use case. Any update on this? (I understand it's still in the research phase)

@raduweiss raduweiss changed the title Machine Learning Acceleration Offer support for hardware-accelerated inference in Firecracker Sep 18, 2020
@raduweiss raduweiss changed the title Offer support for hardware-accelerated inference in Firecracker [Devices] Offer support for hardware-accelerated inference in Firecracker Sep 18, 2020
@zaharidichev
Copy link

@raduweiss is this something that anyone is working on atm? Is it still on the roadmap?

@ananos
Copy link

ananos commented Nov 18, 2020

Hi @zaharidichev,

we have some thoughts on this [1], shared them earlier this year in the slack wοrkspace [2], but a chat is still pending I'm afraid. We have a rough proof-of-concept implementation on firecracker, based on the design principles of [1], which exhibits negligible overhead for image inference (jetson-inference backend, using tensorRT, tested on an NVIDIA jetson nano & a generic x86_64 machine with an RTX 2060 SUPER & another machine with a T4). We should be able to open-source the whole stack pretty soon. Feel free to drop us a line if you're interested in our early PoC.

Essentially, the idea is that we abstract away the hardware-specific operations via a slim runtime library/system, that supports any kind of backend (ranging from a simple CUDA/OpenCL function to a TensorFlow operation/app). Combined with a simple virtio frontend/backend implementation we are able to forward operations from a guest to the host/monitor, which in turn executes the actual "acceleratable" function on the hardware accelerator.

Another option (if latency is not critical to you) could be to use rCUDA, which we plan to try but haven't had the time yet...

BTW, @raduweiss we should plan to have that chat [2] at some point -- give us a shout when you are available!

cheers,
Tassos

[1] https://blog.cloudkernels.net/posts/vaccel/
[2] https://firecracker-microvm.slack.com/archives/CDL3FUR8B/p1591093992140800

@raduweiss
Copy link
Contributor Author

@ananos , yeah our bad, we totally dropped the ball here. Our apologies! I'll reply directly so we can talk.

@ananos
Copy link

ananos commented Dec 4, 2020

Hi @zaharidichev, all

just wanted to share our blog post about our approach on the above: https://blog.cloudkernels.net/posts/vaccel_v2/

using nvidia-container-runtime & a docker image we've put together, you are able to run the jetson-inference image classification example from a Firecracker VM. You can find more info in the above post or @ https://vaccel.org. Of course, you can ping us, we will be more than happy to share how to try out vAccel on Firecracker.

cheers,
Tassos

@sandreim sandreim mentioned this issue Mar 8, 2021
@amrragab8080
Copy link

Any update on the GPU support in Firecracker?

@raduweiss
Copy link
Contributor Author

raduweiss commented Mar 31, 2021

We’ve been thinking about / experimenting in this space in the last months, and we'll keep at it this year, but there’s no ETA for this feature right now. For maximum utility in a serverless platform paradigm [a], a single GPU hardware resource needs to be safely used by multiple microVMs, without trading off the other capabilities that Firecracker users like (e.g., CPU/memory oversubscription, fast snapshot-restore, or high mutation rate of the host’s microVMs). This is a pretty complex problem, and we’re still exploring our options.

As with the other larger features, as we approach what we think is a good design here, we'll post some form of RFC to get community feedabck.

We’d be happy to hear of any use cases to so we can factor them in – feel free to update this thread, or share them directly on our Slack [b]!

[a] https://github.com/firecracker-microvm/firecracker/blob/master/CHARTER.md
[b] firecracker-microvm Slack workspace link

@raduweiss raduweiss moved this from Researching to Coming Soon in Roadmap May 7, 2021
@raduweiss raduweiss moved this from Coming Soon to We're working on it in Roadmap May 7, 2021
@pdames
Copy link

pdames commented May 26, 2021

Any updates? My team is interested in running Ray on Firecracker, but the current lack of GPU support would erode the value of doing so.

@raduweiss
Copy link
Contributor Author

Any updates? My team is interested in running Ray on Firecracker, but the current lack of GPU support would erode the value of doing so.

Sorry for not getting back here sooner, we were still working through our options. We've settled on implementing plain PCIe GPU passthrough, which comes at the cost of requiring micoVMs to start with the their full memory mapped, will probably negate the advantages of using snapshot-restore, and requires the full GPU to be attached to a microVM - all things we wanted to see if we could improve upon, but we didn't find way that upholds all our tenets.

We will want to get broad feedback from the community here on how to actually present this as a feature (we'll start a discussion in the following weeks). Given the trade-offs above, we will consider building a separate Firecracker mode or Firecracker variant, or something along those lines.

@zvonkok
Copy link

zvonkok commented May 9, 2022

@raduweiss I am leading the enablement of GPUs and other NV accelerators on Kata containers. I was trying to use the Slack Invite in the README.md but it is invalid.

What would be the best way to get into the loop on the PCIe implementation in firecracker? I fixed and I'm currently fixing several other issues (BAR sizes, MDEV support, ...) in Kata's PCIe (QEMU) implementation.

Would be nice if I could get hands-on with some pre-released artifacts to start testing on our side.

@raduweiss
Copy link
Contributor Author

Hi @zvonkok . We've re-prioritized our roadmap, and for 2022 we're not pursuing the Firecracker PCIe implementation / GPU passthrough work anymore.

@xmarcalx xmarcalx moved this from We're working on it to Researching in Roadmap Jun 9, 2022
@DemiMarie
Copy link

@raduweiss: what would be needed for a “good” solution? Could https://libvf.io be helpful?

@JonathanWoollett-Light JonathanWoollett-Light added Type: Enhancement Indicates new feature requests and removed Feature: Emulation labels Mar 23, 2023
@mmcclean-aws
Copy link

mmcclean-aws commented May 27, 2023

Any plans to support Inferentia and Trainium based instances ? They expose the accelerators via PCI to the OS but I see PCI support is not planned for firecracker. See docs for more details on the devices exposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Roadmap: Tracked Items tracked on the roadmap project. Type: Enhancement Indicates new feature requests
Projects
Roadmap
Researching
Development

No branches or pull requests