Multi-User support for Kubeflow Pipelines #1223

IronPan · 2019-04-24T23:14:07Z

[April/6/2020]
Latest design is in https://docs.google.com/document/d/1R9bj1uI0As6umCTZ2mv_6_tjgFshIKxkSt00QLYjNV4/edit?ts=5e4d8fbb#heading=h.5s8rbufek1ax

Areas we are working on:

Release

How do we release KFP multi user mode? [Multi user] How do we release KFP multi user in Kubeflow? #3645
Multi user mode early access release [Multi User] Multi user mode early access release #3693
[Deployment] Merge changes to upstream kubeflow repo [Multi User] Integrate KFP multi user with KF 1.1 #3241
Integrate with platforms other than GCP [Pipelines Multi User] integrate with non GCP platforms manifests#1364

Areas related to integration with Kubeflow

[Central Dashboard] Manage contributors for all namespaces I own Dashboard's contributors section doesn't show all the namespaces I own kubeflow#4569
[Central Dashboard] Support login to Kubeflow cluster without creating his/her namespace for a non-admin contributor Support login to Kubeflow cluster without creating his/her namespace for a non-admin contributor kubeflow#4889
[Profile CRD] Support more than one owner of a profile CR Support more than one owner of a profile CR kubeflow#4888
[Profile CRD] Support updating the owner of a profile Support updating the owner of a profile kubeflow#4890

=============== original description

Some users express the interest of an isolation between the cluster admin and cluster user - Cluster admin deploy Kubeflow Pipelines as part of Kubeflow in the cluster;
Cluster user can use Kubeflow Pipelines functionalities, without being able to access the control plane.

Here are the steps to support this functionality.

Provision control plane in one namespace, and launch argo workflow instances in another
- provision control plane in kubeflow namespace, and argo job in namespace FOO (parameterization)
- API server should update the incoming workflow definition to namespace FOO. Sample code that API server modify the workflow
Currently all workflows are run under a clusterrole pipeline-runner (definition). And it's specified during compilation (link). Instead, it should run the workflows under a role instead of a clusterrole.
- change pipeline-runner to role, and specify the namespace during deployment (expose as deployment parameter)
- API server should update the incoming workflow definition to use pipeline-runner role.
Cluster user can access UI through IAP/SimpleAuth endpoint, instead of port-forwarding.

IronPan · 2019-04-24T23:17:34Z

Ideally this should be implemented in a way that get Kubeflow Pipeline closer to support multi-user. E.g. launch workflow in arbitrary namespace

jlewi · 2019-04-25T15:41:25Z

What's the priority of this?

How does this align with the broader plans in Kubeflow to support multiple users?

IronPan · 2019-04-26T17:49:32Z

This is not yet being prioritized, although I think this deserve a high priority.

In addition to admin/user isolation, here is a list of items to achieve the full multi-user support for KFP

Every user (or group of users) will have a dedicate namespace and service account, role, and role binding in that namespace. These resources should be create by the Kubeflow Profile CRD.
With IAP integration, the incoming request contains the user email. Pipeline API server should authorize the email with Kubernetes API by doing user impersonation check

In case of creating a job/run, the job/run should be created in the user's namespace, run by the service account in that namespace. Argo crd or scheduled workflow crd should be able to control resources across all namespaces.
In case of creating all resources, API server need to add additional column in the resource table to log the user's identity or namespace or both, so it can filter the resource in Get/List call.
In case of Get/List resource, API server need to filter the resource based on user's privileges.

IronPan · 2019-04-26T18:19:10Z

Some references for implementing multi-user support on-prem
https://docs.google.com/document/d/1JbYndTaUwRyr4wrU13TN5fzMpnLCUtpehPB4tQnMuSM/edit#heading=h.xq5kl0qs27mm
kubeflow/kubeflow#3096

jlewi · 2019-07-22T01:58:48Z

@jessiezcc Any update on this work? Do you think this is something that will get done in Q3 and thus be part of 0.7?

jessiezcc · 2019-07-22T18:26:25Z

This work is not currently scheduled for Q3.

IronPan · 2019-07-24T21:15:20Z

Some customers express the interests of having ACL for API. e.g. lock down the API for deleting the resource to admin.

krishnadurai · 2019-08-08T18:01:32Z

/cc @krishnadurai

songole · 2019-08-14T21:56:56Z

/cc @songole

yanniszark · 2019-08-19T13:05:18Z

Hi @IronPan.
We (Arrikto) have been exploring this problem for the past month and we generally agree with your overview of the steps required to have multi-user functionality in pipelines.

I'm assigning this to me, we have made good progress and we should have initial support for multi-user pipelines in v0.7.

/assign @yanniszark

Bobgy · 2020-07-10T06:26:03Z

Cross posting for clarification #4197 (comment):

EDIT: described features below will be released with Kubeflow 1.1. You can use these instructions for preview on GCP. It's NOT RELEASED YET.
Installation for Kubeflow 1.1 rc on GCP: https://github.com/kubeflow/gcp-blueprints/tree/v1.1-branch
KFP Multi User instructions: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/edit?usp=sharing

pipeline runs are already designed to run in user namespaces.
The only resource in KFP core system that is not namespace separated (as of today) is static pipeline yaml files you upload to the server. They will remain public to anyone in the cluster. Users can try to launch any pipelines in their own namespaces.

For details about which resources and which services support namespace separation, please read this early access user instruction: https://docs.google.com/document/d/1Ws4X1oNlaczhESNuEanZxbF-cnSfO78B1rBHWOkIAzo/.

A quick list of things we don't support multi user separation in the upcoming KF 1.1 release:

pipeline resources (the static yaml/tar files you upload)
minio artifact storage
MLMD

Bobgy · 2020-07-10T06:30:19Z

If your organization would prefer pipeline resource separated by namespace, please upvote here. We can consider adding the support if there are enough user interest.

EDIT: enough reactions collected, the issue is tracked in #4197 with priority

animeshsingh · 2020-07-10T06:41:49Z

@Bobgy it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.

Bobgy · 2020-07-10T06:43:37Z

@Bobgy it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default.

Yes, I agree if we decide to implement, we'll make it configurable.

yaliqin · 2020-07-10T06:45:15Z

Will upvote. Thanks!

…

On Thu, Jul 9, 2020 at 11:43 PM Yuan (Bob) Gong ***@***.***> wrote: @Bobgy <https://github.com/Bobgy> it should be a feature which is enabled - if users want to "promote" their pipeline resource to be public, its allowed. Else int their namespace by default. Yes, I agree if we decide to implement, we'll make it configurable. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1223 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKSGHEYOKQAMX4ODRSHQ4RDR222CXANCNFSM4HIISE7Q> .

jackwhelpton · 2020-07-10T20:34:39Z

Just working my way through the documentation, thanks for pointing me in that direction. It seems geared around using kfp.Client to execute pipelines; what's the corresponding vision when executing through the UI? I was hoping that pipelines would execute in a namespace based on what's selected in the top drop-down, is that the idea?

Bobgy · 2020-07-11T00:36:01Z

@jackwhelpton Yes, the feature you described is already there. They are not mentioned in the doc just because they work seamlessly.

ca-scribner · 2020-07-16T13:34:04Z

@Bobgy re minio artifact store not being supported in KF 1.1 release, does that mean that a pipeline running in my namespace still writes to a shared artifact store? For example, anything my pipeline writes implicitly (eg: data written when piping results between steps in a pipeline like consumer_op(producer_task.output)) is accessible to anyone who can look inside that artifact store?

Bobgy · 2020-07-17T03:08:21Z

@ca-scribner That's right.
Current suggested workaround is to only pass urls through minio, let components read/write GCS/S3 directly and manage permission there if you care about data separation.
(If you use TFX, that's already the case.)

Or I think minio supports multi tenant natively: https://docs.min.io/docs/multi-tenant-minio-deployment-guide.html, we'd welcome contribution how that can be integrated with KFP multi user mode.

ca-scribner · 2020-07-17T11:36:28Z

@Bobgy ok we lose kfp's helpful automatic piping of real data, but the data is still secure. Only meaningful downside I think is that everyone has to teach their components how to talk to their blob storage rather than offloading it to reusable blob-put/blob-get components. That's a fair compromise.

You're right about minio multi-tenancy (I work in one atm). I'll ask around for ideas.

blairdrummond · 2020-07-17T13:39:43Z

@ca-scribner I think the Minio "Multi tenant" is slightly different than what we're doing; I think we're using OPA or Istio magic or something to provide every namespace with a private bucket on a single tenant (We do have minimal v.s. premium tenants, but that's different). I think the term "tenant" is a bit overloaded here

RoyerRamirez · 2020-07-21T20:09:36Z

@jackwhelpton Yes, the feature you described is already there. They are not mentioned in the doc just because they work seamlessly.

Hi @Bobgy, we're hoping to get more clarification on multi-tenancy and the expected behavior. When you say "seamlessly", does that mean kubeflow will natively assign new experiments to the user's namespace as long as the headers are passed correctly, or do we need to add more components to our pipeline configuration to get the experiments to run under the user's namespace?

The reason I'm asking this is we're currently seeing the following msg in our [ ml-pipeline-scheduledworkflow ] logs:
time="2020-07-21T06:34:19Z" level=info msg="Processing object (inception-v3-transfer-hq5zv): object has no owner." Workflow=inception-v3-transfer-hq5zv

Bobgy · 2020-07-23T08:20:36Z

@RoyerRamirez Yes, experiments will be assigned to user's namespace (the namespace you selected in Kubeflow dashboard). Actions will be authorized by user's header.

The reason I'm asking this is we're currently seeing the following msg in our [ ml-pipeline-scheduledworkflow ] logs:
time="2020-07-21T06:34:19Z" level=info msg="Processing object (inception-v3-transfer-hq5zv): object has no owner." Workflow=inception-v3-transfer-hq5zv

Can you open a separate issue describing how you deployed and what problems you met?

Jeffwan · 2020-11-05T06:55:59Z

@Bobgy

A quick list of things we don't support multi user separation in the upcoming KF 1.1 release:

pipeline resources (the static yaml/tar files you upload)

minio artifact storage

MLMD

Any plans for MLMD?
Are you talking about aggregation? like we only read artifacts/executions belongs to visible KFP resources from user's namespace?
Or native isolation on the MLMD side? I think MLMD schema currently doesn't provide any concept for users?

Bobgy · 2020-11-05T07:13:28Z

@Jeffwan Yes, you understandings are correct.
So far I'm not aware of any plan for MLMD multi-user separation.

/cc @neuromage @dushyanthsc
Is there anything you can share about this?

maganaluis · 2020-11-05T23:02:50Z

@Jeffwan @Bobgy Based on the initial documents that the Karl shared as part of the Model Management group, MLMD was going to support a "Project" context, or at least the ability to create such a context. This project context could be tied to the User's Profile and provide the necessary isolation for metadata.

https://docs.google.com/presentation/d/1HiLIOm-ij0vdS_kEIQSAeICNsGSOl946qhT69WTgK5k/edit#slide=id.g8dfffc9b8a_0_37

Jeffwan · 2020-11-06T02:40:59Z

@maganaluis em. Seems it remove context and bring in project product workflow. Have this proposal reviewed by mlmd team? I feel like this is a big schema change and some projects like TFX need to buy in the proposal which may take some time. At the same time, as a short term solution, we can group artifacts/executions by user's pipeline runs as @Bobgy originally proposed. Currently, I think only KFP use metadata service, so it's kind of safe to do this way.

jlewi · 2020-11-06T14:03:45Z

@maganaluis I think @karlschriek 's doc is just a proposal; so it might change. I think in my discussions with @neuromage we were talking about using labels to group metadata. So "project", "experiment", etc... might just be user defined labels. As such they probably wouldn't be closely tied to multi-user support.

neuromage · 2020-11-06T18:52:48Z

@Jeffwan Yes, you understandings are correct.
So far I'm not aware of any plan for MLMD multi-user separation.

/cc @neuromage @dushyanthsc
Is there anything you can share about this?

Hi, we have no current plans to add multi-user support directly in MLMD at this point in time. As you point out, there is no support for users in the MLMD schemas right now unfortunately. It would be worth exploring the use-cases for multi-user MLMD to figure out the right approach as well.

jlewi · 2020-11-07T20:50:04Z

KFP multi-user shipped in KF 1.1.
I suggest closing this issue and opening up more actionable, scoped issues for further improvements.

jlewi · 2020-11-07T20:50:09Z

/close

k8s-ci-robot · 2020-11-07T20:50:14Z

@jlewi: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…model container (kubeflow#1223) * Add batcher docker publisher * logger readiness probe * Consolidate loggger to agent * Inject logger to agent * Update to golang 1.14 * Inject agent when logger is specified * Fix port * Add readiness probe when injecting logger * Enable logger test * Fix logger and agent tests * Remove logger build * Consolidate files * Add dispatcher in test * Add cloud event check test * Fix agent image * Add debugging * Use a non-common port number

IronPan added the cuj/multi-user label Apr 24, 2019

jlewi added this to To Do in jupyter-1.0 via automation Apr 25, 2019

jlewi added this to New in 0.6.0 via automation Apr 25, 2019

IronPan changed the title ~~Cluster user and admin isolation~~ Multi-User support for Kubeflow Pipelines Apr 26, 2019

IronPan added help wanted The community is welcome to contribute. priority/p1 kind/feature labels Apr 26, 2019

vicaire added area/backend area/frontend area/wide-impact and removed area/backend area/frontend labels May 28, 2019

jlewi removed this from To Do in jupyter-1.0 Jun 4, 2019

jessiezcc added this to the M10 milestone Jul 9, 2019

jessiezcc assigned IronPan Jul 9, 2019

jlewi added this to To Do in 0.7.0 via automation Jul 22, 2019

jlewi removed this from New in 0.6.0 Jul 22, 2019

jlewi mentioned this issue Aug 7, 2019

Kubeflow On-Premise Authentication and Authorization Prototype kubeflow/manifests#195

Merged

5 tasks

yanniszark mentioned this issue Aug 12, 2019

Expose Pipelines as CRD and enable to easy migration from Argo workflow #1132

Closed

k8s-ci-robot assigned yanniszark Aug 19, 2019

Bobgy mentioned this issue Jul 10, 2020

support separate pipeline for each namespace #4197

Open

animeshsingh mentioned this issue Jul 11, 2020

KFP-Tekton V0.3 Plan kubeflow/kfp-tekton#225

Closed

This was referenced Oct 21, 2020

Cannot specify key for input artifact (without full artifact location) argoproj/argo-workflows#3307

Closed

[Multi User] Support separate artifact repository for each namespace #4649

Open

k8s-ci-robot closed this as completed Nov 7, 2020

Kubeflow 1.1 automation moved this from To do to Done Nov 7, 2020

Jeffwan mentioned this issue Nov 20, 2020

[Multi User] Support separate metadata for each namespace #4790

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-User support for Kubeflow Pipelines #1223

Multi-User support for Kubeflow Pipelines #1223

IronPan commented Apr 24, 2019 •

edited by Bobgy

IronPan commented Apr 24, 2019

jlewi commented Apr 25, 2019

IronPan commented Apr 26, 2019 •

edited

IronPan commented Apr 26, 2019

jlewi commented Jul 22, 2019

jessiezcc commented Jul 22, 2019

IronPan commented Jul 24, 2019

krishnadurai commented Aug 8, 2019

songole commented Aug 14, 2019

yanniszark commented Aug 19, 2019

Bobgy commented Jul 10, 2020 •

edited

Bobgy commented Jul 10, 2020 •

edited

animeshsingh commented Jul 10, 2020

Bobgy commented Jul 10, 2020

yaliqin commented Jul 10, 2020 via email

jackwhelpton commented Jul 10, 2020

Bobgy commented Jul 11, 2020

ca-scribner commented Jul 16, 2020

Bobgy commented Jul 17, 2020 •

edited

ca-scribner commented Jul 17, 2020

blairdrummond commented Jul 17, 2020

RoyerRamirez commented Jul 21, 2020

Bobgy commented Jul 23, 2020

Jeffwan commented Nov 5, 2020 •

edited

Bobgy commented Nov 5, 2020 •

edited

maganaluis commented Nov 5, 2020

Jeffwan commented Nov 6, 2020

jlewi commented Nov 6, 2020

neuromage commented Nov 6, 2020

jlewi commented Nov 7, 2020

jlewi commented Nov 7, 2020

k8s-ci-robot commented Nov 7, 2020

Multi-User support for Kubeflow Pipelines #1223

Multi-User support for Kubeflow Pipelines #1223

Comments

IronPan commented Apr 24, 2019 • edited by Bobgy

IronPan commented Apr 24, 2019

jlewi commented Apr 25, 2019

IronPan commented Apr 26, 2019 • edited

IronPan commented Apr 26, 2019

jlewi commented Jul 22, 2019

jessiezcc commented Jul 22, 2019

IronPan commented Jul 24, 2019

krishnadurai commented Aug 8, 2019

songole commented Aug 14, 2019

yanniszark commented Aug 19, 2019

Bobgy commented Jul 10, 2020 • edited

Bobgy commented Jul 10, 2020 • edited

animeshsingh commented Jul 10, 2020

Bobgy commented Jul 10, 2020

yaliqin commented Jul 10, 2020 via email

jackwhelpton commented Jul 10, 2020

Bobgy commented Jul 11, 2020

ca-scribner commented Jul 16, 2020

Bobgy commented Jul 17, 2020 • edited

ca-scribner commented Jul 17, 2020

blairdrummond commented Jul 17, 2020

RoyerRamirez commented Jul 21, 2020

Bobgy commented Jul 23, 2020

Jeffwan commented Nov 5, 2020 • edited

Bobgy commented Nov 5, 2020 • edited

maganaluis commented Nov 5, 2020

Jeffwan commented Nov 6, 2020

jlewi commented Nov 6, 2020

neuromage commented Nov 6, 2020

jlewi commented Nov 7, 2020

jlewi commented Nov 7, 2020

k8s-ci-robot commented Nov 7, 2020

IronPan commented Apr 24, 2019 •

edited by Bobgy

IronPan commented Apr 26, 2019 •

edited

Bobgy commented Jul 10, 2020 •

edited

Bobgy commented Jul 10, 2020 •

edited

Bobgy commented Jul 17, 2020 •

edited

Jeffwan commented Nov 5, 2020 •

edited

Bobgy commented Nov 5, 2020 •

edited