Custom GPU inference image cannot auto scale across multi GPUs. #924

rtrobin · 2020-07-07T02:25:51Z

/kind bug

What steps did you take and what happened:
I have a custom image which offers a prediction API. When I try auto-scaling sample, all pods are attempting to allocate on the same GPU. Also most pods fail to launch and trigger restart.

What did you expect to happen:
Custom inference pods can auto scale across multi GPUs, like tf/pytorch pre-built predictor.

Environment:

Istio Version:
Knative Version:
KFServing Version: 0.3
Kubeflow version:
Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
Minikube version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2020-07-07T02:25:58Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/inference	0.53

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

issue-label-bot · 2020-07-07T02:25:58Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/inference	0.53

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

yuzisun · 2020-07-07T11:36:48Z

@rtrobin Can you help elaborate how all pods can possibly allocate on the same GPU ? Can you paste the custom inference service yaml and what errors you saw?

rtrobin · 2020-07-07T13:51:25Z

@yuzisun Hi, I don't know the exact code implementation. The docker is shipped by other team, which provides a web service. The DL model is compiled from Tensorflow into dynamic library, called by main program.

There are two errors in this case. First, not all pods are allocating on gpu worker nodes. Some pods applies on cpu worker node or even cpu master node. These pods fail to launch because no gpu card found, failed to load cudnn and other nvidia library. Second, I guess, the code loads TF model into default gpu device, which is No.0 card. I'm not sure how kfserving apply each pod on device. Maybe it needs to specify environment variable CUDA_VISIBLE_DEVICES to restrict the devices that the pod could see.

My yaml file is shown below.

apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
  name: custom-service
spec:
  default:
    predictor:
      custom:
        container:
          image: custom-image
        resources:
          limits:
            nvidia.com/gpu: "1"
          requests:
            nvidia.com/gpu: "1"

salanki · 2020-07-07T18:15:11Z

Your resources is on the wrong indentation level. They should be under container. I am surprised KFServing allowed this tbh. That is why it's not getting scheduled in the right places and can see all GPUs.

yuzisun · 2020-07-08T00:48:27Z

@salanki @rtrobin in that case those fields are treated as unknown fields, what we really need here is pruning unknown fields which is only available in kubernetes 1.15+.

rtrobin · 2020-07-08T02:49:10Z

@salanki Thanks! Stupid typo...

Btw, do we have a full docs of what character field can be used and how to set them? Now only samples are provided. I have to go through samples to find if any one could fit my need. For example, I want to define an inference service with limited gpu memory. I can't tell this feature is possible or not according to samples. @yuzisun

salanki · 2020-07-08T02:55:22Z

You mean select a specific GPU type? That you have to do with:

metadata:
  annotations:
    serving.kubeflow.org/gke-accelerator: Tesla_V100

You need to have your nodes tagged with the gke-accelerator label. You are welcome to hit us up on #kfserving on the Kubeflow slack as well for some more interactive discussions.

For GPU inference, I have some additional examples that might be helpful in my own repo: https://github.com/coreweave/kubernetes-cloud/tree/master/online-inference.

salanki · 2020-07-08T04:07:02Z

@yuzisun: The UX of being able to set fields that take no effect is pretty bad :/

rtrobin · 2020-07-08T04:09:45Z

I just find, GPU device is occupied exclusively by individual node. However, with my former wrong configuration, the device can be shared by nodes, which could lead to a better performance in some scenario. Also cluster could deploy more models than current strategy. I was wondering to restrict GPU memory in yaml file, so that nodes could select device which have enough memory left in "share mode". But it seems device-share-mode is not supported yet?

Your repo seems helpful. I would check it later. Appreciate! @salanki

salanki · 2020-07-08T04:14:48Z

NVIDIA explicitly doesn’t not support sharing the same GPU over multiple containers in the NVIDIA device plugin. NVIDIA/k8s-device-plugin#134 (comment)

yuzisun · 2020-07-08T07:14:54Z

@yuzisun: The UX of being able to set fields that take no effect is pretty bad :/

That's unfortunately pretty common issue for kubernetes crd, once kubeflow bumps the kubernetes min requirement we should be able to use v1 crd to prune unknown fields.
kubernetes-sigs/kubebuilder#1174
GoogleCloudPlatform/flink-on-k8s-operator#85

yuzisun · 2020-07-08T07:32:47Z

@salanki Thanks! Stupid typo...

Btw, do we have a full docs of what character field can be used and how to set them? Now only samples are provided. I have to go through samples to find if any one could fit my need. For example, I want to define an inference service with limited gpu memory. I can't tell this feature is possible or not according to samples. @yuzisun

@rtrobin The full api doc is here https://github.com/kubeflow/kfserving/blob/master/docs/apis/README.md, for custom it is the kubernetes container spec

rtrobin · 2020-07-08T10:16:02Z

For my case, in inference stage, most models won't use more than half of the GPU memory. It would be very useful to deploy multiple models in one device. I find few GPU sharing device plugins, such as gpushare-device-plugin and shared-gpu-nvidia-k8s-device-plugin. Did you, by any chance, try this kind of plugins before? @salanki

rtrobin · 2020-07-08T10:17:05Z

@rtrobin The full api doc is here https://github.com/kubeflow/kfserving/blob/master/docs/apis/README.md, for custom it is the kubernetes container spec

Thanks!

yuzisun · 2020-07-08T13:33:00Z

@rtrobin instead of using low level device plugin, KFServing is working on a solution for cohosting multiple models onto the same container, you can checkout detailed proposal linked on this issue.

rtrobin · 2020-07-09T03:48:37Z

@rtrobin instead of using low level device plugin, KFServing is working on a solution for cohosting multiple models onto the same container, you can checkout detailed proposal linked on this issue.

I briefly worked through the discussion in the issue. It is a good idea to host multiple models in the same container, especially for the sklearn and xgboost framework. As someone mentioned in the post, triton has a similar feature to host multiple models. Also triton uses multi cuda stream to benefit the gpu usage. For TF, TensorRT or Onnx framework, maybe triton is already a good option.

Besides, for my specific use case, this solution can't solve my problem. The inference service shipped to me is more likely to be a custom service, not a model to integrate. For me, as service provider, I can't ask algorithms developer to use the same version framework. For most case, I don't even know how the service implements, or the framework it uses. Also, besides framework, there are lots of other packages involved. Maybe one service uses OpenCV 4, but other uses OpenCV 3. It is hard to put two service in the same container.

yuzisun · 2020-07-10T10:15:00Z

@rtrobin you can checkout https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_gpu_sharing.md, @k82cn is the expert on this.

ukclivecox · 2020-07-10T10:41:59Z

We are building the successor for the current python kfserver in https://github.com/SeldonIO/mlserver
It will support the new V2 Dataplane and be a plugin replacement for the current server for sklearn, xgoost and custom models in kfserving.
Would be happy to discuss how we can ensure this is part of the roadmap.

rtrobin · 2020-08-10T04:00:38Z

@rtrobin you can checkout https://github.com/volcano-sh/volcano/blob/master/docs/user-guide/how_to_use_gpu_sharing.md, @k82cn is the expert on this.

Thanks for the info. The original issue is caused by config file type. I'm closing this issue now.

k8s-ci-robot added the kind/bug label Jul 7, 2020

issue-label-bot bot added the area/inference label Jul 7, 2020

yuzisun added kind/feature and removed kind/bug labels Jul 13, 2020

rtrobin closed this as completed Aug 10, 2020

631068264 mentioned this issue Jun 22, 2023

使用kubeflow1.6.1 使用自定义镜像有问题 AliyunContainerService/gpushare-scheduler-extender#199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom GPU inference image cannot auto scale across multi GPUs. #924

Custom GPU inference image cannot auto scale across multi GPUs. #924

rtrobin commented Jul 7, 2020

issue-label-bot bot commented Jul 7, 2020

issue-label-bot bot commented Jul 7, 2020

yuzisun commented Jul 7, 2020

rtrobin commented Jul 7, 2020

salanki commented Jul 7, 2020

yuzisun commented Jul 8, 2020

rtrobin commented Jul 8, 2020

salanki commented Jul 8, 2020 •

edited

salanki commented Jul 8, 2020

rtrobin commented Jul 8, 2020

salanki commented Jul 8, 2020

yuzisun commented Jul 8, 2020 •

edited

yuzisun commented Jul 8, 2020

rtrobin commented Jul 8, 2020

rtrobin commented Jul 8, 2020

yuzisun commented Jul 8, 2020

rtrobin commented Jul 9, 2020

yuzisun commented Jul 10, 2020

ukclivecox commented Jul 10, 2020

rtrobin commented Aug 10, 2020

Custom GPU inference image cannot auto scale across multi GPUs. #924

Custom GPU inference image cannot auto scale across multi GPUs. #924

Comments

rtrobin commented Jul 7, 2020

issue-label-bot bot commented Jul 7, 2020

issue-label-bot bot commented Jul 7, 2020

yuzisun commented Jul 7, 2020

rtrobin commented Jul 7, 2020

salanki commented Jul 7, 2020

yuzisun commented Jul 8, 2020

rtrobin commented Jul 8, 2020

salanki commented Jul 8, 2020 • edited

salanki commented Jul 8, 2020

rtrobin commented Jul 8, 2020

salanki commented Jul 8, 2020

yuzisun commented Jul 8, 2020 • edited

yuzisun commented Jul 8, 2020

rtrobin commented Jul 8, 2020

rtrobin commented Jul 8, 2020

yuzisun commented Jul 8, 2020

rtrobin commented Jul 9, 2020

yuzisun commented Jul 10, 2020

ukclivecox commented Jul 10, 2020

rtrobin commented Aug 10, 2020

salanki commented Jul 8, 2020 •

edited

yuzisun commented Jul 8, 2020 •

edited