-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom GPU inference image cannot auto scale across multi GPUs. #924
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
1 similar comment
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
@rtrobin Can you help elaborate how all pods can possibly allocate on the same GPU ? Can you paste the custom inference service yaml and what errors you saw? |
@yuzisun Hi, I don't know the exact code implementation. The docker is shipped by other team, which provides a web service. The DL model is compiled from Tensorflow into dynamic library, called by main program. There are two errors in this case. First, not all pods are allocating on gpu worker nodes. Some pods applies on cpu worker node or even cpu master node. These pods fail to launch because no gpu card found, failed to load cudnn and other nvidia library. Second, I guess, the code loads TF model into default gpu device, which is No.0 card. I'm not sure how kfserving apply each pod on device. Maybe it needs to specify environment variable CUDA_VISIBLE_DEVICES to restrict the devices that the pod could see. My yaml file is shown below. apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
name: custom-service
spec:
default:
predictor:
custom:
container:
image: custom-image
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1" |
Your |
@salanki @rtrobin in that case those fields are treated as unknown fields, what we really need here is pruning unknown fields which is only available in kubernetes 1.15+. |
@salanki Thanks! Stupid typo... Btw, do we have a full docs of what character field can be used and how to set them? Now only samples are provided. I have to go through samples to find if any one could fit my need. For example, I want to define an inference service with limited gpu memory. I can't tell this feature is possible or not according to samples. @yuzisun |
You mean select a specific GPU type? That you have to do with:
You need to have your nodes tagged with the For GPU inference, I have some additional examples that might be helpful in my own repo: https://github.com/coreweave/kubernetes-cloud/tree/master/online-inference. |
@yuzisun: The UX of being able to set fields that take no effect is pretty bad :/ |
I just find, GPU device is occupied exclusively by individual node. However, with my former wrong configuration, the device can be shared by nodes, which could lead to a better performance in some scenario. Also cluster could deploy more models than current strategy. I was wondering to restrict GPU memory in yaml file, so that nodes could select device which have enough memory left in "share mode". But it seems device-share-mode is not supported yet? Your repo seems helpful. I would check it later. Appreciate! @salanki |
NVIDIA explicitly doesn’t not support sharing the same GPU over multiple containers in the NVIDIA device plugin. NVIDIA/k8s-device-plugin#134 (comment) |
That's unfortunately pretty common issue for kubernetes crd, once kubeflow bumps the kubernetes min requirement we should be able to use v1 crd to prune unknown fields. |
@rtrobin The full api doc is here https://github.com/kubeflow/kfserving/blob/master/docs/apis/README.md, for custom it is the kubernetes container spec |
For my case, in inference stage, most models won't use more than half of the GPU memory. It would be very useful to deploy multiple models in one device. I find few GPU sharing device plugins, such as gpushare-device-plugin and shared-gpu-nvidia-k8s-device-plugin. Did you, by any chance, try this kind of plugins before? @salanki |
Thanks! |
@rtrobin instead of using low level device plugin, KFServing is working on a solution for cohosting multiple models onto the same container, you can checkout detailed proposal linked on this issue. |
I briefly worked through the discussion in the issue. It is a good idea to host multiple models in the same container, especially for the sklearn and xgboost framework. As someone mentioned in the post, triton has a similar feature to host multiple models. Also triton uses multi cuda stream to benefit the gpu usage. For TF, TensorRT or Onnx framework, maybe triton is already a good option. Besides, for my specific use case, this solution can't solve my problem. The inference service shipped to me is more likely to be a custom service, not a model to integrate. For me, as service provider, I can't ask algorithms developer to use the same version framework. For most case, I don't even know how the service implements, or the framework it uses. Also, besides framework, there are lots of other packages involved. Maybe one service uses OpenCV 4, but other uses OpenCV 3. It is hard to put two service in the same container. |
@rtrobin you can checkout https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_gpu_sharing.md, @k82cn is the expert on this. |
We are building the successor for the current python kfserver in https://github.com/SeldonIO/mlserver |
Thanks for the info. The original issue is caused by config file type. I'm closing this issue now. |
/kind bug
What steps did you take and what happened:
I have a custom image which offers a prediction API. When I try auto-scaling sample, all pods are attempting to allocate on the same GPU. Also most pods fail to launch and trigger restart.
What did you expect to happen:
Custom inference pods can auto scale across multi GPUs, like tf/pytorch pre-built predictor.
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: