Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow setting Kubernetes tolerations in KFP pipelines #2823

Closed
thesuperzapper opened this issue Jul 7, 2022 · 3 comments · Fixed by #2848
Closed

allow setting Kubernetes tolerations in KFP pipelines #2823

thesuperzapper opened this issue Jul 7, 2022 · 3 comments · Fixed by #2848
Assignees
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines kind:enhancement New feature or request
Milestone

Comments

@thesuperzapper
Copy link
Member

Related to #2681

Background

It is common practice in Kubernetes to apply a taint to Nodes with accelerator hardware (like GPUs), so that resources are not wasted by Pods which do not require the accelerator.

Currently, there is no way to specify a Pod toleration in Elyra Kubeflow pipelines, this means that running Elyra pipelines with GPUs is effectively impossible on a well-designed Kubernetes cluster.

Solution

We can resolve this by allowing users to specify a list of "tolerations" in a similar way to our "volume mounts" (see PR #2799).

We only need to allow the user to set key, operator, value, and effect, because tolerationSeconds is intended for cluster management, rather than scheduling.

References:

Examples

This toleration will match taints of key=my-key and value=my-value with effect=NoSchedule:

- key: "my-key"
  operator: "Equal"
  value: "my-value"
  effect: "NoSchedule"

This toleration will match taints of key=my-key and value=my-value with ANY effect:

- key: "my-key"
  operator: "Equal"
  value: "my-value"

This toleration will match taints of key=my-key and ANY value with ANY effect:

- key: "my-key"
  operator: "Exists"
@thesuperzapper thesuperzapper added the kind:enhancement New feature or request label Jul 7, 2022
@ptitzler ptitzler added component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines and removed status:Needs Triage labels Jul 7, 2022
@ptitzler ptitzler added this to the 3.11.0 milestone Jul 11, 2022
@ptitzler
Copy link
Member

Thanks for providing the detailed examples and resource links! They made it easy to investigate what it would take to support this. It looks like this should be a straightforward implementation. Since user-friendly GUI support depends on #2780, we might want to aim for two phase delivery if the prerequisite is not ready in time:

  • phase 1 (3.11.0):
    • tolerations can be declared as pipeline default property and node property (generic nodes and custom nodes)
    • zero or more tolerations can be declared (similar to volumes, environment variables, ...)
    • tolerations must be specified as a single string (similar to what is currently required for volumes, environment variables, ...), e.g. key:operator:value:effect to be consistent
    • limited support for validation
  • phase 2:

If possible we'll try to avoid a two phase delivery as that would require pipeline migration.

@kevin-bates
Copy link
Member

If possible we'll try to avoid a two phase delivery as that would require pipeline migration.

If these are persisted in the "encoded" format (e.g. key:operator:value:effect) in both phases, would there be a need for migration?

@ptitzler
Copy link
Member

If possible we'll try to avoid a two phase delivery as that would require pipeline migration.

If these are persisted in the "encoded" format (e.g. key:operator:value:effect) in both phases, would there be a need for migration?

You are right! No need in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:pipeline-editor pipeline editor component:pipeline-runtime issues related to pipeline runtimes e.g. kubeflow pipelines kind:enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants