Add ability to schedule inference service pods to certain nodes #730

pradithya · 2020-03-12T02:27:22Z

/kind feature

Describe the solution you'd like
Allow user to specify node selector, affinity, anti-affinity, and toleration so that they can control in which node the inference service pods will be scheduled.
The implementation should also allow predictor and transformer to have a different configuration (related to #666).

It could look something like this

apiVersion: serving.kubeflow.org/v1alpha2
kind: InferenceService
metadata:
  name: transformer-cifar10
spec:
  default:
    predictor:
      pytorch:
        modelClassName: Net
        storageUri: gs://kfserving-samples/models/pytorch/cifar10
      nodeSelector:
        disktype: ssd
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/e2e-az-name
                operator: In
                values:
                - e2e-az1
                - e2e-az2
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: another-node-label-key
                operator: In
                values:
                - another-node-label-value
      tolerations:
      - key: "key"
        operator: "Exists"
        effect: "NoSchedule"
    transformer:
      custom:
        container:
          image: gcr.io/kubeflow-ci/kfserving/image-transformer:latest
          name: user-container

Anything else you would like to add:
Knative serving has this limitation (knative/serving#1816). Alternatively, it could be implemented in the webhook.

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2020-03-12T02:27:28Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
feature	0.99

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

yuzisun · 2020-03-13T12:52:50Z

this is probably best to be implemented in your pod or deployment mutating webhook imo.

pradithya · 2020-03-13T17:52:52Z

@yuzisun Can you explain more? Is it inside kfserving or outside?

salanki · 2020-03-28T00:02:20Z

@yuzisun: I don't think it's reasonable to expect every user to implement a custom admission controller just to add affinities. It also makes sense to have them as part of the InferenceService, instead of having to define my node requirements for my InferenceService in custom code in a separate admission controller.

yuzisun · 2020-03-28T13:42:16Z

@salanki KFServing has been trying to find a right balance between allowing leaking operator fields into our inference service API which is data scientist/developer friendly and this use case definitely hits that, we can discuss what’s best way to implement this if you can join our WG meeting.

ellistarn · 2020-03-30T21:43:19Z

Completely agree with this request.

We actually have this exact same problem with Knative, who made similar restrictions on their API before finally exposing the Pod Template Spec instead of ContainerTemplateSpec.

This is a big API discussion and I think we need to solve it before GA. It would be great to enable the full power of the PodTemplateSpec while simultaneously empowering users with framework specs.

One option would be to use variant interfaces to effectively extend the PodTemplateSpec with some of our magic. We will need to discuss in the working group.

salanki · 2020-03-30T21:44:46Z

Happy to join a WG meeting and also write up some use cases / examples if it is helpful!

dalfos · 2020-06-09T10:41:07Z

Hi. Any news on this subject?

ellistarn · 2020-06-09T16:59:04Z

We are addressing this in our new api v1beta1. Stay tuned.

yuzisun · 2020-07-20T00:32:41Z

@salanki @pradithya @dalfos Looks like knative has lifted the restriction now https://github.com/knative/serving/pull/8645/files.

issue-label-bot · 2020-07-20T00:32:49Z

Issue-Label Bot is automatically applying the labels:

Label	Probability
area/inference	0.68

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

yuzisun · 2021-02-21T05:38:37Z

Node selector/Affinity/Toleration now can be specified on InferenceService v1beta1 API.

https://knative.dev/docs/serving/feature-flags/#kubernetes-node-selector
https://knative.dev/docs/serving/feature-flags/#kubernetes-node-affinity
https://knative.dev/docs/serving/feature-flags/#kubernetes-toleration

daganida88 · 2021-03-07T11:28:03Z

Can you please provide an example??
When I did this:spec:
predictor:
nodeSelector:
v100: "true"
triton:
name: kfserving-container
ports:
- containerPort: 9000
name: h2c
protocol: TCP
resources:
limits:
cpu: "1"
memory: 8Gi
requests:
cpu: "1"
memory: 8Gi
runtimeVersion: 20.10-py3
storageUri: gs://kfserving-poc/models/
It did not work. I didn't see the node selector in the knative service, configuration, or revision.

dalfos · 2021-08-04T18:46:43Z

@daganida88 did you enabled feature in configmap config-features?

kubectl -n knative-serving get cm config-features -o yaml

apiVersion: v1
data:
  kubernetes.podspec-nodeselector: enabled
kind: ConfigMap
metadata:
...

ddelange · 2022-06-03T08:20:34Z

confirming that running

kubectl patch configmap config-features -n knative-serving -p '{"data": {"kubernetes.podspec-nodeselector": "enabled", "kubernetes.podspec-tolerations": "enabled", "kubernetes.podspec-affinity": "enabled"}}'

will allow to specify such configs in the InferenceService spec (a Deployment gets created with the expected spec).

k8s-ci-robot added the kind/feature label Mar 12, 2020

issue-label-bot bot added the feature label Mar 12, 2020

jlewi removed the feature label Mar 20, 2020

yuzisun added the kfserving/v1beta1 label May 31, 2020

issue-label-bot bot added the area/inference label Jul 20, 2020

yuzisun closed this as completed Feb 21, 2021

danielricks mentioned this issue Mar 25, 2021

Differences between yaml syntax v1alpha2 and v1beta1 (for NodeSelector support) #1500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to schedule inference service pods to certain nodes #730

Add ability to schedule inference service pods to certain nodes #730

pradithya commented Mar 12, 2020

issue-label-bot bot commented Mar 12, 2020

yuzisun commented Mar 13, 2020

pradithya commented Mar 13, 2020

salanki commented Mar 28, 2020

yuzisun commented Mar 28, 2020 •

edited

Loading

ellistarn commented Mar 30, 2020

salanki commented Mar 30, 2020

dalfos commented Jun 9, 2020

ellistarn commented Jun 9, 2020

yuzisun commented Jul 20, 2020 •

edited

Loading

issue-label-bot bot commented Jul 20, 2020

yuzisun commented Feb 21, 2021

daganida88 commented Mar 7, 2021

dalfos commented Aug 4, 2021

ddelange commented Jun 3, 2022

Add ability to schedule inference service pods to certain nodes #730

Add ability to schedule inference service pods to certain nodes #730

Comments

pradithya commented Mar 12, 2020

issue-label-bot bot commented Mar 12, 2020

yuzisun commented Mar 13, 2020

pradithya commented Mar 13, 2020

salanki commented Mar 28, 2020

yuzisun commented Mar 28, 2020 • edited Loading

ellistarn commented Mar 30, 2020

salanki commented Mar 30, 2020

dalfos commented Jun 9, 2020

ellistarn commented Jun 9, 2020

yuzisun commented Jul 20, 2020 • edited Loading

issue-label-bot bot commented Jul 20, 2020

yuzisun commented Feb 21, 2021

daganida88 commented Mar 7, 2021

dalfos commented Aug 4, 2021

ddelange commented Jun 3, 2022

yuzisun commented Mar 28, 2020 •

edited

Loading

yuzisun commented Jul 20, 2020 •

edited

Loading