Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

Closed
cisrudlow opened this issue Jan 30, 2024 · 5 comments · Fixed by #4792
Closed

Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488

cisrudlow opened this issue Jan 30, 2024 · 5 comments · Fixed by #4792

Comments

@cisrudlow
Copy link

What is needed?
We need the ability to dynamically decide in which K8s namespace the pods of a given pipeline execution (pipeline run) will be deployed.

Problem Description:
Our pipelines can be executed in the context of different clients (they are customer agnostic) - they are treated as a data service, which are invoked through an API.
We do not want to have a separate copy of the pipeline for each client, as we would have a big problem with deployment and maintenance (we are talking about thousands of clients here).
Each client has a different volume of data, so we need the flexibility of configuring allocated resources, but we also want to be able to easily allocate costs to a specific client - the simplest way to do this is through namespaces, but the ability to tag pods might also be sufficient.

We also need to have the ability for full computational isolation between clients (e.g., without explicit shared memory). To achieve hard isolation, we can have each client have a separate k8s cluster and use solutions like Liqo to connect them into one large cluster. Each cluster will be visible as a separate namespace.
Workload separation will take place on the basis of deploying pods (and other resources like service accounts) in separate namespaces.

For full isolation, we would also separate PVs, but this will be very complicated from the point of view of Mage's source code, so this request doesn’t cover it.

Possible Solution:
The simplest solution that comes to mind. Now, it is possible to configure a namespace for the K8s Executor in the metadata.yaml of a given pipeline (or even individual steps). Since our pipelines will be triggered via API, we can also pass as a call parameter the name of the client which will be synonymous with the K8s namespace.

If metadata.yaml could be parsed before each pipeline execution and it could include a reference to parameters passed by the API, e.g., {{ kwargs[client_id] }}, then the problem would be solved.

@tommydangerous
Copy link
Member

Thank you for submitting this. We’ll add it to the roadmap and get on it ASAP.

@cisrudlow
Copy link
Author

I would like to extend this ticket. We have an EMR cluster for each client. We would like to indicate for each run on which cluster the PySpark notebooks should execute for each client.
It would be great if there was an option to fetch these dynamic parameters from another service (configuration service).

@tommydangerous
Copy link
Member

Thank you for sharing the details. We’ll work on this. When do you need it by?

@cisrudlow
Copy link
Author

we have quite a tight roadmap. So the answer is asap ;) (less then month)

@tommydangerous
Copy link
Member

Okay, adding to roadmap for less than a month from now. Are you on Slack by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants