Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to schedule inference service pods to certain nodes #730

Closed
pradithya opened this issue Mar 12, 2020 · 15 comments
Closed

Add ability to schedule inference service pods to certain nodes #730

pradithya opened this issue Mar 12, 2020 · 15 comments

Comments

@pradithya
Copy link
Member

/kind feature

Describe the solution you'd like
Allow user to specify node selector, affinity, anti-affinity, and toleration so that they can control in which node the inference service pods will be scheduled.
The implementation should also allow predictor and transformer to have a different configuration (related to #666).

It could look something like this

apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
  name: transformer-cifar10
spec:
  default:
    predictor:
      pytorch:
        modelClassName: Net
        storageUri: gs://kfserving-samples/models/pytorch/cifar10
      nodeSelector:
        disktype: ssd
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/e2e-az-name
                operator: In
                values:
                - e2e-az1
                - e2e-az2
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: another-node-label-key
                operator: In
                values:
                - another-node-label-value
      tolerations:
      - key: "key"
        operator: "Exists"
        effect: "NoSchedule"
    transformer:
      custom:
        container:
          image: gcr.io/kubeflow-ci/kfserving/image-transformer:latest
          name: user-container

Anything else you would like to add:
Knative serving has this limitation (knative/serving#1816). Alternatively, it could be implemented in the webhook.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
feature 0.99

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@yuzisun
Copy link
Member

yuzisun commented Mar 13, 2020

this is probably best to be implemented in your pod or deployment mutating webhook imo.

@pradithya
Copy link
Member Author

@yuzisun Can you explain more? Is it inside kfserving or outside?

@jlewi jlewi removed the feature label Mar 20, 2020
@salanki
Copy link
Contributor

salanki commented Mar 28, 2020

@yuzisun: I don't think it's reasonable to expect every user to implement a custom admission controller just to add affinities. It also makes sense to have them as part of the InferenceService, instead of having to define my node requirements for my InferenceService in custom code in a separate admission controller.

@yuzisun
Copy link
Member

yuzisun commented Mar 28, 2020

@salanki KFServing has been trying to find a right balance between allowing leaking operator fields into our inference service API which is data scientist/developer friendly and this use case definitely hits that, we can discuss what’s best way to implement this if you can join our WG meeting.

@ellistarn
Copy link
Contributor

Completely agree with this request.

We actually have this exact same problem with Knative, who made similar restrictions on their API before finally exposing the Pod Template Spec instead of ContainerTemplateSpec.

This is a big API discussion and I think we need to solve it before GA. It would be great to enable the full power of the PodTemplateSpec while simultaneously empowering users with framework specs.

One option would be to use variant interfaces to effectively extend the PodTemplateSpec with some of our magic. We will need to discuss in the working group.

@salanki
Copy link
Contributor

salanki commented Mar 30, 2020

Happy to join a WG meeting and also write up some use cases / examples if it is helpful!

@dalfos
Copy link

dalfos commented Jun 9, 2020

Hi. Any news on this subject?

@ellistarn
Copy link
Contributor

We are addressing this in our new api v1beta1. Stay tuned.

@yuzisun
Copy link
Member

yuzisun commented Jul 20, 2020

@salanki @pradithya @dalfos Looks like knative has lifted the restriction now https://github.com/knative/serving/pull/8645/files.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/inference 0.68

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@yuzisun
Copy link
Member

yuzisun commented Feb 21, 2021

@yuzisun yuzisun closed this as completed Feb 21, 2021
@daganida88
Copy link

Can you please provide an example??
When I did this:spec:
predictor:
nodeSelector:
v100: "true"
triton:
name: kfserving-container
ports:
- containerPort: 9000
name: h2c
protocol: TCP
resources:
limits:
cpu: "1"
memory: 8Gi
requests:
cpu: "1"
memory: 8Gi
runtimeVersion: 20.10-py3
storageUri: gs://kfserving-poc/models/
It did not work. I didn't see the node selector in the knative service, configuration, or revision.

@dalfos
Copy link

dalfos commented Aug 4, 2021

@daganida88 did you enabled feature in configmap config-features?

kubectl -n knative-serving get cm config-features -o yaml

apiVersion: v1
data:
  kubernetes.podspec-nodeselector: enabled
kind: ConfigMap
metadata:
...

@ddelange
Copy link
Contributor

ddelange commented Jun 3, 2022

confirming that running

kubectl patch configmap config-features -n knative-serving -p '{"data": {"kubernetes.podspec-nodeselector": "enabled", "kubernetes.podspec-tolerations": "enabled", "kubernetes.podspec-affinity": "enabled"}}'

will allow to specify such configs in the InferenceService spec (a Deployment gets created with the expected spec).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants