Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU Memory as resource requirement for InferenceService #947

Open
Svendegroote91 opened this issue Jul 14, 2020 · 8 comments
Open

Comments

@Svendegroote91
Copy link

Svendegroote91 commented Jul 14, 2020

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
Would it be possible to add the GPU memory as a resource requirement, similarly to the GPU count?
For example:

apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
  name: "flowers-sample-gpu"
spec:
  default:
    predictor:
      tensorflow:
        storageUri: "gs://kfserving-samples/models/tensorflow/flowers"
        runtimeVersion: "1.13.0-gpu"
        resources:
          limits:
            aliyun.com/gpu-mem: 3

Is it technically already possible to try this if you have the
GPUshare scheduler extender installed on your cluster?

I noticed that they added something related in the Arena repo - kubeflow/arena#211

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/inference 0.74

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@yuzisun
Copy link
Member

yuzisun commented Jul 14, 2020

@Svendegroote91 technically inference service spec already allows aliyun.com/gpu-mem: 3 since resources limits is a map. Would you like to try out and let us know if that works out of the box?

@Svendegroote91
Copy link
Author

@yuzisun Ok I can give it a try and let you know.

If the resources limits is a map, I see that from a scheduling perspective it will work.
However, I wonder if the memory constraint will be correctly translated into a command argument --per_process_gpu_memory_fraction for Tensorflow serving (see similar PR on kubeflow/arena#211) . If not, I would be happy to help on this. Any thoughts on this?

@yuzisun
Copy link
Member

yuzisun commented Jul 14, 2020

@Svendegroote91 looks like we will need that argument, feel free to send a PR for that.

@Svendegroote91
Copy link
Author

Svendegroote91 commented Jul 20, 2020

@yuzisun I tested the setup but it doesn't work out of the box. Probably the following steps will need to be implemented:

I'll see if I can file a PR for this somewhere soon.

@yuzisun
Copy link
Member

yuzisun commented Aug 1, 2020

@Svendegroote91 sorry for the late reply, KFServing do support gpu image for tfserving and you can specify runtimeVersion: 1.14-gpu for example. Our v1beta1 API should actually allow you specify container fields on prepackaged model server like tfserving, it is worth note that currently you can achieve all above with KFServing custom.

@631068264
Copy link

631068264 commented Feb 7, 2023

I use kubeflow1.6.1 + k8s-gpushare-schd-extender:1.11-d170d8a newest version

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "firesmoke"
spec:
  predictor:
    containers:
      - name: kserve-container
        image: harbor.xxx.cn/library/model/firesmoke:v1
        env:
          - name: MODEL_NAME
            value: firesmoke
        resources:
          limits:
            aliyun.com/gpu-mem: 1

error

Status:
  Components:
    Predictor:
      Latest Created Revision:  firesmoke-predictor-default-00001
  Conditions:
    Last Transition Time:  2023-02-07T16:30:17Z
    Message:               Revision "firesmoke-predictor-default-00001" failed with message: binding rejected: failed bind with extender at URL http://127.0.0.1:32766/gpushare-scheduler/bind, code 500.
    Reason:                RevisionFailed
    Severity:              Info
    Status:                False
    Type:                  PredictorConfigurationReady
    Last Transition Time:  2023-02-07T16:30:17Z
    Message:               Configuration "firesmoke-predictor-default" does not have any ready Revision.
    Reason:                RevisionMissing
    Status:                False
    Type:                  PredictorReady
    Last Transition Time:  2023-02-07T16:30:17Z
    Message:               Configuration "firesmoke-predictor-default" does not have any ready Revision.
    Reason:                RevisionMissing
    Severity:              Info
    Status:                False
    Type:                  PredictorRouteReady
    Last Transition Time:  2023-02-07T16:30:17Z
    Message:               Configuration "firesmoke-predictor-default" does not have any ready Revision.
    Reason:                RevisionMissing
    Status:                False
    Type:                  Ready
Events:                    <none>

Any other gpushare choice for kubeflow @yuzisun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants