Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

cisrudlow · 2024-01-30T15:07:24Z

What is needed?
We need the ability to dynamically decide in which K8s namespace the pods of a given pipeline execution (pipeline run) will be deployed.

Problem Description:
Our pipelines can be executed in the context of different clients (they are customer agnostic) - they are treated as a data service, which are invoked through an API.
We do not want to have a separate copy of the pipeline for each client, as we would have a big problem with deployment and maintenance (we are talking about thousands of clients here).
Each client has a different volume of data, so we need the flexibility of configuring allocated resources, but we also want to be able to easily allocate costs to a specific client - the simplest way to do this is through namespaces, but the ability to tag pods might also be sufficient.

We also need to have the ability for full computational isolation between clients (e.g., without explicit shared memory). To achieve hard isolation, we can have each client have a separate k8s cluster and use solutions like Liqo to connect them into one large cluster. Each cluster will be visible as a separate namespace.
Workload separation will take place on the basis of deploying pods (and other resources like service accounts) in separate namespaces.

For full isolation, we would also separate PVs, but this will be very complicated from the point of view of Mage's source code, so this request doesn’t cover it.

Possible Solution:
The simplest solution that comes to mind. Now, it is possible to configure a namespace for the K8s Executor in the metadata.yaml of a given pipeline (or even individual steps). Since our pipelines will be triggered via API, we can also pass as a call parameter the name of the client which will be synonymous with the K8s namespace.

If metadata.yaml could be parsed before each pipeline execution and it could include a reference to parameters passed by the API, e.g., {{ kwargs[client_id] }}, then the problem would be solved.

tommydangerous · 2024-02-15T12:24:01Z

Thank you for submitting this. We’ll add it to the roadmap and get on it ASAP.

cisrudlow · 2024-02-15T13:01:19Z

I would like to extend this ticket. We have an EMR cluster for each client. We would like to indicate for each run on which cluster the PySpark notebooks should execute for each client.
It would be great if there was an option to fetch these dynamic parameters from another service (configuration service).

tommydangerous · 2024-02-15T15:22:57Z

Thank you for sharing the details. We’ll work on this. When do you need it by?

cisrudlow · 2024-02-15T17:36:54Z

we have quite a tight roadmap. So the answer is asap ;) (less then month)

tommydangerous · 2024-02-15T17:48:17Z

Okay, adding to roadmap for less than a month from now. Are you on Slack by any chance?

wangxiaoyou1993 mentioned this issue Mar 19, 2024

[xy] Interpolate global vars in k8s executor namespace #4792

Merged

7 tasks

wangxiaoyou1993 closed this as completed in #4792 Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

cisrudlow commented Jan 30, 2024

tommydangerous commented Feb 15, 2024

cisrudlow commented Feb 15, 2024

tommydangerous commented Feb 15, 2024

cisrudlow commented Feb 15, 2024

tommydangerous commented Feb 15, 2024

Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

Comments

cisrudlow commented Jan 30, 2024

tommydangerous commented Feb 15, 2024

cisrudlow commented Feb 15, 2024

tommydangerous commented Feb 15, 2024

cisrudlow commented Feb 15, 2024

tommydangerous commented Feb 15, 2024