-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build a bridge between MLflow and Kubeflow #6647
Comments
MLflow Projects Specification is available here: Note that we are only interested in MLflow Projects that specify the environment for the software by means of a Dockerfile/Docker container. We are not interested in supporting MLflow Projects that specify their environment as a conda.yaml file. |
@jagane-infinstor Great timing. The Kubeflow user survey identified that a good percentage of Kubeflow users (43%) also leverage MLFlow. I believe that Kubeflow needs a Model Registry component and we need to consider integrating or building. I am interested in bringing this idea to the contributors and users to see if they have opinions on requirements and timing. I believe this would be a good discussion topic for the Sept 27 Kubeflow Community Meeting. |
We tried integrating it in kubeflow dashboard through Iframe, it was not able to compare experiements. Then if we add it as an external URI then we need to think authentication part, like how we can use Kubeflow Creds to authenticate in MLFlow. |
#6564 needs to be merged for MLFlow to work properly in iframe. |
MLflow open source does not have any authentication built in. The commercial offerings such as Databricks, AzureML and Infinstor (I work for InfinStor) include authentication. Databricks for example, uses a bearer token. |
In another context, we utilized an MIT licensed piece of software called single spa - https://single-spa.js.org/ to integrate the UIs of multiple disparate projects. That may be an option for us here. |
Deploy MlFlow on Kubernetes using the helm chart and proxy it on the kubeflow-gateway's subpath. |
For Grafana they had something called as auth-proxy configuration. By which you can use kubeflow-userid to create and authenticate user in grafana. I think for bridging MLflow to kubeflow we should be configuring something like that to keep the multitenant aspect of kubeflow intact. Or we can configure it on istio Authorization Policy. |
@Madaditya @amolsr - there are two different use models for MLflow integrated with Kubeflow. The first model is what you have outlined - using a helm chart to create an MLflow instance within the K8s cluster. The other is when the user has an external MLflow service such as Databricks, AzureML or InfinStor. In this case, the kubeflow components and user code in pods created by kubeflow components need to be able to access the external MLflow service. It is important to make this work as well. |
2 deployment models for mlflow makes sense , embedded mlflow vs managed mlflow. @jbottum @jagane-infinstor maybe its a data point / point-of-view for how mlflkow and kf can play nicely with each other The way we are currently deploying/integrating with mlflow is the former model ( as highlighted by @amolsr ), and it is positioned as our experiment tracking tool of choice for training step/stage of your end2end kubeflow pipeline, primarily since mlflow has better UX around capturing and querying model stats. If one needs to serve, we have a custom kfp component that looks at mlflow model registry artifact path and registers the kserve endpoint as well. So key touch points between mlflow and KF pipeline could look like |
I appreciate the discussion and options. If we are going to have a Phase 1 for Kubeflow 1.7, I believe that we will need to set-up a review meeting soon (perhaps Thu, Sept 29) and include folks in this thread along with others i.e. @benjamintanweihao @thesuperzapper @DomFleischmann @kimwnasptd @james-jwu @zijianjoy @richardsliu. Before that review, I think we should raise the topic in the Tuesday, Sept 27 Community Meeting 8am PT. I would like gather an initial view on 1) Is there significant user / distribution interest, especially for KF 1.7 Phase 1, 2) do we need to support multiple architectures and can those customization(s) be developed and supported effectively, perhaps semi-independently (like KFP-Argo and KFP-Tekton) 3) do we have teams who can sustain a strategic commitment, as I expect this is a XXL sized feature that will take multiple releases. 4) Can a team show a 5-10 minute prototype in the Community meeting (to help show the functional vision)? |
@jbottum - appreciate your setting the requirements for this. We are happy to make the presentation at the Sep 27, 2022 Community Meeting and to show a demonstration of some parts of this capability. Do you folks have a specific template for the presentation - I realize that we are probably going to be time bound and I want to make best use of everybody's time. |
I'm really excited for this - I've been hoping for the two communities to collaborate for some time :) |
I am attaching a project proposal pdf. Please note that this project has been renamed 'Concurrent' and the new website is available at https://concurrent-ai.org/ |
This looks really promising! I wonder how this interacts/overlaps with kubeflow pipelines, but in general this looks really nice and the demo was excellent. I'd love to contribute to the conversation going forward |
Thanks, Andrew. I will invite you to the 1 hour session that we discussed at the Community meeting this morning. Of the top of my head, I would say that the difference between KFP and Concurrent are the following:
|
There seems to be some overlap with KFP. KFP already uses Argo Workflows under the hood, which constructs K8s-native pipelines out of the box. Is there any FAQs page that illustrate the differences or relationships? |
We do have a FAQ page with some comparison with KFP here: https://docs.concurrent-ai.org/files/faq/
In summary, while there is overlap in the end goal, there are philosophical differences that result in a very different looking component and very different end user profile. I believe that KFP and Concurrent can co-exist in kubeflow and service users of different profiles. |
Hello @terrytangyuan - to directly speak to your comment re. Argo Workflows, Concurrent is designed to use multiple kubernetes clusters, possibly distributed across the WAN. Argo Workflows is limited to a single k8s cluster, and a single k8s cluster cannot be distributed across the WAN since etcd uses the raft protocol for consensus, which is not suitable for use across the WAN. Concurrent's design center is multi k8s across WAN links - stepping out of the bounds of a single k8s cluster enables us to use consensus algorithms will allow us to do this. |
Multi-cluster support is on our roadmap. It's the top-voted issue, and we already have a working POC. argoproj/argo-workflows#3523 |
@terrytangyuan thanks for that pointer. This is within a single Region, i.e. no WAN links between clusters? |
@jagane-infinstor i am the annoying one from the meeting with all the security questions ;-) |
I think the main requirements are:
You also need to decide whether you want to support KFP v1 and or v2. I would start with integrating mlflow for lineage, parameters and model tracking first. So just as an alternative to the current google ml-metadata. There should be a switch to select either mlflow or ml-metadata as metadata backend in the cluster, KF pipelines executed via argo should write the appropriate information to the selected endpoint only. This is where you would need to extend KFP to support Mlflow too. This also implies some effort to integrate all the mlflow information into the KFP runs pages. After this is done you might think about implementing "concurrent" as a KFP component where you just input the mlflow specification, so this preprocessing can become a regular part of a pipeline. Please have a look at https://www.kubeflow.org/docs/components/pipelines/v2/author-a-pipeline/components/#3-custom-container-components |
Here's the recording of the deep dive meeting held this morning: |
ML-metadata already works well as a metadata backend. What are the additional benefits MLFlow brings (as metadata backend) that ml-metadata does not cover? MLFlow has a lot of dependencies. I'd imagine introducing a new metadata backend will add a lot of complexity and maintenance overhead. From the meeting, you mentioned:
It seems like a new project without much adoption and traction. I am not sure if it's worth adding the complexity to the codebase. Who owns that project? Is it vendor-neutral? |
I have an important excerpt from the slack channel: "I am just wondering if it would be possible for KFP to open up more the initcontainer approach discussed by That is already the case in KFP. Instead off ugly NON-rootless fuse KFP uses proper PVCs and emptydirs for S3/GCP data import and export. There is no need to change KFP stuff. Fuse needs root, so it is NOT allowed in serious enterprise environments in any kind of container, no matter whether that is an initcontainer or sidecar. I spent a lot of time with a former google KFP developer (@Bobgy on github) to get rid of exactly such unnecessary root stuff by using a proper architecture. If you check the Kubeflow architecture you will understand that giving an initcontainer or sidecar root permissions means giving any user root permissions. And even if that would not be the case, any serious company security policy will not allow this. I can only repeat myself: Many other projects have done this sucessfully rootless and mlflow can do so too. "The customer image for KFP component (step in a pipeline) would be an issue. Very less DS people in our org are able to write Docker Files, even we have CI/CD pipeline for building custom image with GitLab setup. Customer specific images are not an issue. Having to build them yourself might be an issue. That is what i am describing above. Mlflow must be compatible with arbitrary images from the depths of the internet (Dockerhub) as long as they have python3 installed and can install python packages as non-root. That would be the same contract that KFP currently demands. If Mlflow uses something other than python they can just inject a binary. "The more custom image components involved in pipeline, means we also need to migrate the image into an image registry provided by the cloud vendor for cloud migration later on. We will also need to migrate all the secrets for image repositories." I think you are using the wrong term. Custom image component just means a component that uses a custom image so specifying a non-default base image in your pipeline. It does not say anything about building it or managing registries.You want custom base images in your pipeline. You just do NOT want to build them at runtime or provide a Dockerfile for them and manage a registry including secrets.I know that you mean the same as I, but we really need to use the right and precise terms, otherwise it is too confusing for the other people following this thread. We really need to help infinstor understand the problem precisely. From the KFP documentation:"base_image – Optional. Specify a CUSTOM OCI container IMAGE to use in the component. For lightweight components, the image needs to have python 3.5+. Default is the python image corresponding to the current python environment." Maybe you meant https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.containers.html with custom image component. I would call this "image builder component" in KFP. This is something old and ugly that is not usable in serious enterprise environments. All other Kubeflow components have moved away from stuff like this. Actually the next step discussed with the kubeflow/manifest maintainer is getting rid of the remaining root in istio by switching to istio-cni. |
Ml-metadata does not work well as a metadata backaend. Please have a look at all the bugs here. The most important things are 1. not namespace isolated 2. You cannot delete database entries. If Mlflow is willing to solve tha, itt would be a clear benefit. Also outside of Kubeflow MLflow is the dominant metadata backend. I would also appreciate if ml-emtatdata gets fixed, but so far that is blocked by upstream issues. infinstor asked me how to integrate concurrent, so i gave them a proposal in the comments above. And yes i think the same, infinstor should implement concurrent as a KFP component or compile to a KFP Pipeline. Adding a new backend is overkill. Regarding Mlflow metadata tracking there is a different story. That must be integrated directly into KFP as an alternative to ml-metadata. The good thing is you can work seperately on both tasks. There is no need to integrate all Mlflow components at once. |
We would absolutely LOVE this. Almost started doing something home cooked already.... |
Do we have any further traction or design documents which may have been created? Would love to contribute if possible! |
Hello @jbottum , do you know who could give an update on this ? |
@jagane-infinstor Hey Jagane - Could you please provide a status on concurrent and related activities ? Thanks! |
i hope to upstream a multi-user-isolation mlflow implementation this year. Not concurrent, just the normal mlflow stuff. But no guarantees at all. |
Hello :) Thanks. |
Sorry, I apologize - we have been completely swamped in work relating to
Generative AI. We have not been able to spend any time on this particular
issue #6647.
Best regards,
Jagane
…On Wed, Aug 9, 2023 at 9:34 AM sofsms ***@***.***> wrote:
Hello :)
Any update on this ?
Thanks.
—
Reply to this email directly, view it on GitHub
<#6647 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXXIT4R43HRKIGUEBKD6MULXUO3ZVANCNFSM6AAAAAAQL2ELY4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Do you need help contributing? |
Hello Julius - thanks for asking. And yes, we'd love some help. Let's
consider the two aspects of Issue #6647.
1. *Make MLflow work as an alternate experiment tracking system and
Model management system for Kubeflow.* The main problem here is that
MLflow does not include any authentication capabilities in its open source
form. One way to solve this would be to add support for the auth mechanism
used by Kubeflow. Unfortunately, that will not be accepted into MLflow
source tree, so we'll have to live with that. All said and done, the issue
here is that customers want integration with their corp. directory, not a
MLflow specific one or a Kubeflow specific one.
2. *Make Concurrent for MLflow be an alternative to KFP:* For this, the
first order of business is that we should finish the discussion re.
sidecars for access to privileged code - we use that today for making fuse
mounts possible. That is essential for making data available to the
compute. As a k8s security expert, do you have any suggestions for how we
should go about this?
Thanks
Jagane
…On Thu, Aug 10, 2023 at 2:38 AM Julius von Kohout ***@***.***> wrote:
Hello :) Any update on this ?
Thanks.
Do you need help contributing?
—
Reply to this email directly, view it on GitHub
<#6647 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXXIT4VBNNEBPSWQ2SBVTW3XUST2LANCNFSM6AAAAAAQL2ELY4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hey Jagane, The auth concept has been recently introduced with the latest version of the open source MLFlow (still experimental)
And at least from the initial look at the docs, the permisions can also be fine grained for different API's. Maybe that helps somehow? |
I recently found this tutorial: https://medium.com/dkatalis/kubeflow-with-mlflow-702cf2ebf3bf When I finished the tutorial, the kubeflow dashboard could not be properly load (the elements in the 'Quick shortcuts' were missing). Any idea where the tutorial is failing? |
@jagane-infinstor I now have mlflow available. With a central database in the kubeflow namespace and separate credentials per namespace. We can start the per namespace mlflow server via the Workbench/Workspace UI. So Zero overhead namespaces are still possible. But so far there is not enough time to upstream it. Concurrent is another topic. Probably with the Workspace 2.0 overhaul we can take a closer look at upstreaming. |
Another relevant effort on the subject here #7396 |
/close this belongs to kubeflow/manifests Please reopen there if necessary and model registry is not enough |
@juliusvonkohout: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/kind feature
Why you need this feature:
We want to run AI workloads in Kubernetes and use MLflow for Experiment Tracking and Model Management.
Describe the solution you'd like:
We would like to design and run a DAG in Kubernetes, with each node of the DAG being an MLflow Project.
MLflow is very popular among Data Scientists, Data Engineers and MLOps staff. Its strength is ML Experiment Tracking and ML Model Management. However, MLflow does not include any compute capability. Kubeflow, on the other hand, is very strong in managing compute via Kubernetes. It would be useful for Kubeflow to include functionality to build a DAG out of MLflow Projects (a packaged reproducible piece of ML code) and run it in Kubernetes.
We believe that for this project to be successful:
Anything else you would like to add:
We have been working on a proof of concept - MLflow Parallels, an Apache Licensed open source project. https://mlflow-parallels.org
The text was updated successfully, but these errors were encountered: