-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dynamic namespace configuration ability in each pipeline run for K8s Executor #4488
Comments
Thank you for submitting this. We’ll add it to the roadmap and get on it ASAP. |
I would like to extend this ticket. We have an EMR cluster for each client. We would like to indicate for each run on which cluster the PySpark notebooks should execute for each client. |
Thank you for sharing the details. We’ll work on this. When do you need it by? |
we have quite a tight roadmap. So the answer is asap ;) (less then month) |
Okay, adding to roadmap for less than a month from now. Are you on Slack by any chance? |
What is needed?
We need the ability to dynamically decide in which K8s namespace the pods of a given pipeline execution (pipeline run) will be deployed.
Problem Description:
Our pipelines can be executed in the context of different clients (they are customer agnostic) - they are treated as a data service, which are invoked through an API.
We do not want to have a separate copy of the pipeline for each client, as we would have a big problem with deployment and maintenance (we are talking about thousands of clients here).
Each client has a different volume of data, so we need the flexibility of configuring allocated resources, but we also want to be able to easily allocate costs to a specific client - the simplest way to do this is through namespaces, but the ability to tag pods might also be sufficient.
We also need to have the ability for full computational isolation between clients (e.g., without explicit shared memory). To achieve hard isolation, we can have each client have a separate k8s cluster and use solutions like Liqo to connect them into one large cluster. Each cluster will be visible as a separate namespace.
Workload separation will take place on the basis of deploying pods (and other resources like service accounts) in separate namespaces.
For full isolation, we would also separate PVs, but this will be very complicated from the point of view of Mage's source code, so this request doesn’t cover it.
Possible Solution:
The simplest solution that comes to mind. Now, it is possible to configure a namespace for the K8s Executor in the metadata.yaml of a given pipeline (or even individual steps). Since our pipelines will be triggered via API, we can also pass as a call parameter the name of the client which will be synonymous with the K8s namespace.
If metadata.yaml could be parsed before each pipeline execution and it could include a reference to parameters passed by the API, e.g., {{ kwargs[client_id] }}, then the problem would be solved.
The text was updated successfully, but these errors were encountered: