Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kata2.0 + Containerd using gpu in K8S 1.7 #246

Open
han2ni3bal opened this issue Dec 29, 2021 · 7 comments
Open

Kata2.0 + Containerd using gpu in K8S 1.7 #246

han2ni3bal opened this issue Dec 29, 2021 · 7 comments
Labels
question Requires an answer

Comments

@han2ni3bal
Copy link

Before raising this question

I am now in a cluster that originally used the runtime as docker, I switched the runtime of some nodes to containerd, and then used kata 2.0 for this part of the nodes. In this case, using Nvidia docker2.0 to start the container should be impossible, because it needs to specify the startup runtime as runc, and we need to specify it as kata, and Nvidia docker1.0 does not support k8s 1.8 as well. How can I make the kata2.0 container use Nvidia gpu normally?

@jodh-intel
Copy link
Contributor

@han2ni3bal - You are correct that you cannot use docker with Kata 2.x (see kata-containers/kata-containers#3417).

However, have you tried going through the steps in:

... and replacing:

$ docker run --device /dev/vfio/...

... with:

$ ctr run --device /dev/vfio/...

@han2ni3bal
Copy link
Author

@jodh-intel Really appreciate for your kindly help, we are going to try this way to use GPU pass-through mode with Kata2.0. What am I now concerned is that how to combine kata2.0 together with K8s devices plugin in our cluster. The solution now is that we can refer to Kubevirt devices plugin cause Kubevirt and Kata both use Vfio with IOMMU, but after checking the code of Kubevirt device plugin, I found that the device plugin finds gpu devices by the VFIO-PCI Driver, but kata does not use this. So I should figure it out that how to register gpu devices which are used by Kata2.0 into the cluster. Do you have some suggestions for this one? Thank you very much!

@fighterhit
Copy link

fighterhit commented Jan 13, 2022

@jodh-intel Really appreciate for your kindly help, we are going to try this way to use GPU pass-through mode with Kata2.0. What am I now concerned is that how to combine kata2.0 together with K8s devices plugin in our cluster. The solution now is that we can refer to Kubevirt devices plugin cause Kubevirt and Kata both use Vfio with IOMMU, but after checking the code of Kubevirt device plugin, I found that the device plugin finds gpu devices by the VFIO-PCI Driver, but kata does not use this. So I should figure it out that how to register gpu devices which are used by Kata2.0 into the cluster. Do you have some suggestions for this one? Thank you very much!

I also have the same question about how to combine kata2.0 together with K8s devices plugin in our cluster. Does the NVIDIA/k8s-device-plugin no longer work under this environment, so we need to develop a device plugin for Kata2.x? What about Kata1.x?

@jodh-intel
Copy link
Contributor

Anyone in the community have GPU devices who can comment here?

Maybe @Jimmy-Xu or @flx42 have thoughts on this?

/cc @egernst, @dgibson.

@han2ni3bal
Copy link
Author

@fighterhit I think we can use Kubevirt device plugin with kata2.0, but I still need to test it.

@fighterhit
Copy link

fighterhit commented Jan 25, 2022

@fighterhit I think we can use Kubevirt device plugin with kata2.0, but I still need to test it.

Thank you for your advice @han2ni3bal , if you have any related progress, please let me know if you don't mind. 😀

@fighterhit
Copy link

@fighterhit I think we can use Kubevirt device plugin with kata2.0, but I still need to test it.

Hi @han2ni3bal , have you tested the kubevirt gpu device plugin? When I use the latest version device plugin, the following error will be reported:

failed to create containerd task: failed to create shim: QMP command failed: The device is not writable: Permission denied: unknown 

My kata is v2.3.2 and containerd is v1.5.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Requires an answer
Projects
Issue backlog
  
To do
Development

No branches or pull requests

3 participants