Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Add PodTemplate configuration to flytekit task definitions #3123

Closed
2 tasks done
hamersaw opened this issue Nov 30, 2022 · 21 comments
Closed
2 tasks done
Assignees
Labels
enhancement New feature or request flytekit FlyteKit Python related issue flytepropeller
Milestone

Comments

@hamersaw
Copy link
Contributor

hamersaw commented Nov 30, 2022

Motivation: Why do you think this is important?

Many users require more complex k8s configuration within the Flyte ecosystem. This may include specifying task-specific tolerations, volume mounts, etc. Currently this is unsupported within flytekit, where the API favors ergonomics in abstracting away the complexities of k8s configuration from end-users. However, some use-cases require this complex configuration and striking a balance is very difficult.

Goal: What should the final outcome look like, ideally?

Users should be able to configure every aspect of the k8s Pod that Flyte creates without bloating the task definition.

One solution is to add a PodTemplate configuration to every flytekit task definition. This would work similarly to the default PodTemplate scheme where this PodTemplate definition serves as the base in creating the k8s Pod. By default this will be empty and therefore not applied, but if it is specified it will serve as the base for Pod configuration. This API could look something like:

@task(
    request=Resources(),
    pod_template=V1PodTemplate{
        spec=V1PodSpec{
            tolerations=[
                V1Toleration{
                    key= "num-gpus"
                    operator= "Equal"
                    value= 1
                    effect= "NoSchedule"
                },
            ]
        }
    }
)

Describe alternatives you've considered

Currently, proposals often include one-off configuration updates that solve a specific use-case but are not applicable in generalized terms. This approach was used in the k8s plugin and has proved unmaintainable in that every option requires code changes which may change between k8s versions.

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@hamersaw hamersaw added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Nov 30, 2022
@fg91
Copy link
Member

fg91 commented Nov 30, 2022

I really like this proposal since it would solve a problem I currently face:

Our ML engineers would like to have the ability to switch between T4, A100, and V100 GPUs. The resource name is the same for every GPU type on GKE. Instead, GKE adds a label to the nodes. One could use a Pod task configured with a node selector. However, this only works as a replacement for Python tasks whereas we need this to work for Pytorch tasks.

I had the idea to expose not the pod template but the name of the pod template in the task decorator:

@task(
    request=Resources(),
    pod_template_name="v100_pod_template",
)

I feel this would align well with Flyte's current approach where

the API favors ergonomics in abstracting away the complexities of k8s configuration from end-users.

Non ops-savvy ML engineers might not like the UX of having to specify a complete pod template. As the MLOps engineer (or a K8s savvy ML engineer) who runs the platform for them, however, I don't mind maintaining a set of pod templates for them that are managed with the rest of the infrastructure as IaC.

This would also bloat the task definition less.

@hamersaw hamersaw added flytekit FlyteKit Python related issue flytepropeller and removed untriaged This issues has not yet been looked at by the Maintainers labels Nov 30, 2022
@flixr
Copy link
Contributor

flixr commented Dec 1, 2022

I need the same (especially runtimeClassName and affinity), but mostly for ContainerTask (see #3055).
Volumes and mounts from secrets I worked around using the pod spec template.
Also already discussed this with @eapolinario

@fg91
Copy link
Member

fg91 commented Dec 2, 2022

About exposing pod template or pod template name in the task decorator: of course only one would be enough. Exposing only the name bloats the decorator less in my opinion but in case the other solution is preferred by others, I'm just as happy about this one!

@flixr
Copy link
Contributor

flixr commented Dec 2, 2022

I vote for exposing the pod spec directly, otherwise it always needs an admin to create an appropriate pod template in the cluster first and the users cannot easily inspect those.
We e.g. want to use affinity to select nodes with GPUs larger than certain amount of GPU memory (nodes are tagged using the nvidia device plugin with GPU feature discovery, so we can easily match on those per task).

@hamersaw
Copy link
Contributor Author

We had a chance to discuss this at some length today and decided to move forward supporting both approaches. So the configuration value can be either a PodTemplate or a PodTemplate name. A few things to highlight on this:

  1. The PodTemplate will be applied rather than the default PodTemplate but in the same way, namely serving as the base for Pod configuration. In some scenarios this will require some duplication of configuration, but we feel it provides the best balance of flexibility and understandability.
  2. This can replace the current Pod Task implementation. We will begin by making an update in flytekit to convert the existing Pod definition into a PodTemplate to unify the APIs. This change will be transparent to users, so the Pod Task will continue working identically but we can unify the APIs in the backend. Moving forward, since all Pod configuration will also be available in the Python Task / Container Task these should be preferred for future implementation.

@fg91
Copy link
Member

fg91 commented Dec 14, 2022

  1. The PodTemplate will be applied rather than the default PodTemplate but in the same way, namely serving as the base for Pod configuration. In some scenarios this will require some duplication of configuration

This is perfect, thanks 🙏

Since in this PR we apply the Pod template to kf operator task pods, will the pod template exposed in the task decorator also work for those task types since it will be a replacement for the default pod template? (Our ML engineers cannot choose the GPU type for Pytorch tasks currently since they cannot configure a node selector. For Python tasks we work around this with Pod tasks.)

@hamersaw
Copy link
Contributor Author

hamersaw commented Dec 14, 2022

Since in this PR we apply the Pod template to kf operator task pods, will the pod template exposed in the task decorator also work for those task types since it will be a replacement for the default pod template? (Our ML engineers cannot choose the GPU type for Pytorch tasks currently since they cannot configure a node selector. For Python tasks we work around this with Pod tasks.)

Absolutely, I have not had a chance yet to figure out how this will work in the backend, but presumably we can overload the LoadOrDefault with some task metadata (containing the PodTemplate from flytekit) which will then return the correct PodTemplate. So hopefully this doesn't even require changes anywhere else.

@flixr
Copy link
Contributor

flixr commented Dec 23, 2022

@hamersaw any updates here?

@hamersaw
Copy link
Contributor Author

hamersaw commented Jan 3, 2023

@flixr thanks for checking in on this! Sorry for slow response, as you might imagine many of us took some time to unwind over the holidays. We just discussed our Q1 plan and this is a very high priority item. I'm expecting to address this in the next few weeks.

@flixr
Copy link
Contributor

flixr commented Jan 4, 2023

👍 let me know if I can help out with testing or so...

@xshen8888
Copy link

Is this issue related to #3241 ?

@davidmirror-ops
Copy link
Contributor

By exposing the complete PodSpec, user would have control over (for example) the container name for a particular task?

By reading the docs I understand that the container name in the PodTemplate will be overridden by Flyte.

This is in regards to @stephen37's question here

@kumare3
Copy link
Contributor

kumare3 commented Jan 23, 2023

@xshen8888 this is indeed related and will address your problem. Though passing in the CLI is still hard. I have an eventual goal of making the entire configuration externally configurable at pyflyte run time - maybe some brave soul can help :D

@ByronHsu
Copy link
Contributor

I will work on the flytekit part. Thank @wild-endeavor for offering this chance.

@cosmicBboy
Copy link
Contributor

@eapolinario to write "proto-docs" for this feature in flytesnacks somewhere

@cosmicBboy
Copy link
Contributor

@cosmicBboy to make issues for Spark, Ray, Dask issues

@lynxoid
Copy link

lynxoid commented Mar 27, 2023

Do ShellTasks now support pod_template?

@hamersaw
Copy link
Contributor Author

hamersaw commented Mar 27, 2023

Do ShellTasks now support pod_template?

@lynxoid I do not believe so, but adding support should be a trivial add to flytekit since shell tasks are basically a lighweight wrapper. cc @wild-endeavor @eapolinario can you confirm this? If so, it would be a great contribution!

@ByronHsu
Copy link
Contributor

@hamersaw ShellTasks inherits from PythonInstanceTask, and then PythonAutoContainerTask. pod_template is on PythonAutoContainerTask's constructor, so users should be able to pass pod_template to ShellTask to use. Correct me if I am wrong

@eapolinario
Copy link
Contributor

This feature has been integrated into the SDK and backend. To simplify support going forward, let's get bugs and feature requests as separate github issues.

@hamersaw
Copy link
Contributor Author

hamersaw commented Aug 7, 2023

This issue is not yet fully complete. Waiting on final integration into the dask plugin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flytekit FlyteKit Python related issue flytepropeller
Projects
None yet
Development

No branches or pull requests