Skip to content

Commit

Permalink
improve docs for connecting Pipelines SDK to Kubeflow (#3021)
Browse files Browse the repository at this point in the history
* improve docs for connecting Pipelines SDK to Kubeflow

* improve docs for connecting Pipelines SDK to Kubeflow

* improve docs for connecting Pipelines SDK to Kubeflow
  • Loading branch information
thesuperzapper committed Sep 23, 2022
1 parent 1cf07ec commit 929aba5
Show file tree
Hide file tree
Showing 2 changed files with 441 additions and 316 deletions.
257 changes: 130 additions & 127 deletions content/en/docs/components/pipelines/v1/overview/multi-user.md
@@ -1,175 +1,178 @@
+++
title = "Multi-user Isolation for Pipelines"
description = "Getting started with Kubeflow Pipelines multi-user isolation"
title = "Multi-user Isolation"
description = "How multi-user isolation works in Kubeflow Pipelines"
weight = 30
+++

Multi-user isolation for Kubeflow Pipelines is an integration to [Kubeflow multi-user isolation](/docs/components/multi-tenancy/).
Multi-user isolation for Kubeflow Pipelines is part of Kubeflow's overall [multi-tenancy](/docs/components/multi-tenancy/) feature.

Refer to [Getting Started with Multi-user isolation](/docs/components/multi-tenancy/getting-started/)
for the common Kubeflow multi-user operations including the following:
{{% alert title="Tip" color="info" %}}
* Kubeflow Pipelines multi-user isolation is only supported in ["full" Kubeflow deployments](/docs/components/pipelines/installation/overview/#full-kubeflow-deployment).
* Refer to [Getting Started with Multi-user isolation](/docs/components/multi-tenancy/getting-started/) for the common Kubeflow multi-user operations
like [Managing contributors](/docs/components/multi-tenancy/getting-started/#managing-contributors-through-the-kubeflow-ui).
{{% /alert %}}

* [Grant user minimal Kubernetes cluster access](/docs/components/multi-tenancy/getting-started/#pre-requisites-grant-user-minimal-kubernetes-cluster-access)
* [Managing contributors through the Kubeflow UI](/docs/components/multi-tenancy/getting-started/#managing-contributors-through-the-kubeflow-ui)
* For Google Cloud: [In-cluster authentication to Google Cloud from Kubeflow](/docs/gke/authentication/#in-cluster-authentication)

Note, Kubeflow Pipelines multi-user isolation is only supported in
[the full Kubeflow deployment](/docs/components/pipelines/installation/overview/#full-kubeflow-deployment)
starting from Kubeflow v1.1 and **currently** on all platforms except OpenShift. For the latest status about platform support, refer to [kubeflow/manifests#1364](https://github.com/kubeflow/manifests/issues/1364#issuecomment-668415871).

Also be aware that the isolation support in Kubeflow doesn’t provide any hard
security guarantees against malicious attempts by users to infiltrate other
user’s profiles.

## How are resources separated?

Kubeflow Pipelines separates its resources by Kubernetes namespaces (Kubeflow profiles).

Experiments belong to namespaces directly and there's no longer a default
experiment. Runs and recurring runs belong to their parent experiment's namespace.
Kubeflow Pipelines separates resources using Kubernetes namespaces that are managed by Kubeflow's [Profile resources](/docs/components/multi-tenancy/overview/#key-concepts).
Other users cannot see resources in your Profile/Namespace without permission, because the Kubeflow Pipelines API server
rejects requests for namespaces that the current user is not authorized to access.

Pipeline runs are executed in user namespaces, so that users can leverage Kubernetes
namespace isolation. For example, they can configure different secrets for other
services in different namespaces.
"Experiments" belong to namespaces directly, runs and recurring runs belong to their parent experiment's namespace.

Other users cannot see resources in your namespace without permission, because
the Kubeflow Pipelines API server rejects requests for namespaces that the
current user is not authorized to access.
"Pipeline Runs" are executed in user namespaces, so that users can leverage Kubernetes namespace isolation.
For example, they can configure different secrets for other services in different namespaces.

Note, there's no multi-user isolation for pipeline definitions right now.
Refer to [Current Limitations](#current-limitations) section for more details.
{{% alert title="Warning" color="warning" %}}
Kubeflow makes no hard security guarantees about Profile isolation.
<br>
User profiles have no additional isolation beyond what is provided by Kubernetes Namespaces.
{{% /alert %}}

### When using the UI
## When using the UI

When you visit the Kubeflow Pipelines UI from the Kubeflow dashboard, it only shows
experiments, runs, and recurring runs in your chosen namespace. Similarly, when
you create resources from the UI, they also belong to the namespace you have
chosen.
When you visit the Kubeflow Pipelines UI from the Kubeflow Dashboard, it only shows "experiments", "runs", and "recurring runs" in your chosen namespace.
Similarly, when you create resources from the UI, they also belong to the namespace you have chosen.

You can select a different namespace to view resources in other namespaces.
{{% alert title="Warning" color="warning" %}}
Pipeline definitions are not isolated right now, and are shared across all namespaces, see [Current Limitations](#current-limitations) for more details.
{{% /alert %}}

### When using the SDK
## When using the SDK

First, you need to connect to the Kubeflow Pipelines public endpoint using the
SDK. For Google Cloud, follow [these instructions](/docs/gke/pipelines/authentication-sdk/#connecting-to-kubeflow-pipelines-in-a-full-kubeflow-deployment).
How to connect Pipelines SDK to Kubeflow Pipelines will depend on __what kind__ of Kubeflow deployment you have, and __from where you are running your code__.

When calling SDK methods for experiments, you need to provide the additional
namespace argument. Runs, recurring runs are owned by an experiment. They are
in the same namespace as the parent experiment, so you can just call their SDK
methods in the same way as before.
* [Full Kubeflow (from inside cluster)](/docs/components/pipelines/sdk/connect-api/#full-kubeflow-subfrom-inside-clustersub)
* [Full Kubeflow (from outside cluster)](/docs/components/pipelines/sdk/connect-api/#full-kubeflow-subfrom-outside-clustersub)
* [Standalone Kubeflow Pipelines (from inside cluster)](/docs/components/pipelines/sdk/connect-api/#standalone-kubeflow-pipelines-subfrom-inside-clustersub)
* [Standalone Kubeflow Pipelines (from outside cluster)](/docs/components/pipelines/sdk/connect-api/#standalone-kubeflow-pipelines-subfrom-outside-clustersub)

For example:
The following Python code will create an experiment (and associated run) from a Pod inside a full Kubeflow cluster.

```python
import kfp
client = kfp.Client(...) # Refer to documentation above for detailed arguments.

client.create_experiment(name='<Your experiment name>', namespace='<Your namespace>')
print(client.list_experiments(namespace='<Your namespace>'))
client.run_pipeline(
experiment_id='<Your experiment ID>', # Experiment determines namespace.
job_name='<Your job ID>',
pipeline_id='<Your pipeline ID>')
print(client.list_runs(experiment_id='<Your experiment ID>'))
print(client.list_runs(namespace='<Your namespace>'))
```
# the namespace in which you deployed Kubeflow Pipelines
kubeflow_namespace = "kubeflow"

To store your user namespace as the default context, use the
[`set_user_namespace`](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html#kfp.Client.set_user_namespace)
method. This method stores your user namespace in a configuration file at
`$HOME/.config/kfp/context.json`. After setting a default namespace, the SDK
methods default to use this namespace if no namespace argument is provided.
# the namespace of your pipelines user (where the pipeline will be executed)
user_namespace = "jane-doe"

```python
# Note, this saves the namespace in `$HOME/.config/kfp/context.json`. Therefore,
# You only need to call this once. The saved namespace context will be picked up
# by other clients you use later.
client.set_user_namespace(namespace='<Your namespace>')
print(client.get_user_namespace())

client.create_experiment(name='<Your experiment name>')
print(client.list_experiments())
client.run_pipeline(
experiment_id='<Your experiment ID>', # Experiment determines namespace.
job_name='<Your job name>',
pipeline_id='<Your pipeline ID>')
print(client.list_runs())
# the KF_PIPELINES_SA_TOKEN_PATH environment variable is used when no `path` is set
# the default KF_PIPELINES_SA_TOKEN_PATH is /var/run/secrets/kubeflow/pipelines/token
credentials = kfp.auth.ServiceAccountTokenVolumeCredentials(path=None)

# Specifying a different namespace will override the default context.
print(client.list_runs(namespace='<Your other namespace>'))
```
# create a client
client = kfp.Client(host=f"http://ml-pipeline-ui.{kubeflow_namespace}", credentials=credentials)

Note, it is no longer possible to access the Kubeflow Pipelines API service from
in-cluster workload directly, read [Current Limitations section](#current-limitations)
for more details.
# create an experiment
client.create_experiment(name="<YOUR_EXPERIMENT_ID>", namespace=user_namespace)
print(client.list_experiments(namespace=user_namespace))

# create a pipeline run
client.run_pipeline(
experiment_id="<YOUR_EXPERIMENT_ID>", # the experiment determines the namespace
job_name="<YOUR_RUN_NAME>",
pipeline_id="<YOUR_PIPELINE_ID>" # the pipeline definition to run
)
print(client.list_runs(experiment_id="<YOUR_EXPERIMENT_ID>"))
print(client.list_runs(namespace=user_namespace))
```

Detailed documentation for the Kubeflow Pipelines SDK can be found in the
[Kubeflow Pipelines SDK Reference](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html).
{{% alert title="Tip" color="info" %}}
* To set a default namespace for Pipelines SDK commands, use the [`kfp.Client().set_user_namespace()`](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html#kfp.Client.set_user_namespace) method,
this method stores your user namespace in a configuration file at `$HOME/.config/kfp/context.json`.
* Detailed documentation for `kfp.Client()` can be found in the [Kubeflow Pipelines SDK Reference](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html).
{{% /alert %}}

### When using REST API or generated Python API client
## When using the REST API

Similarly, when calling [REST API endpoints](/docs/components/pipelines/reference/api/kubeflow-pipeline-api-spec/)
or using [the generated Python API client](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.server_api.html),
namespace argument is required for experiment APIs. Note that namespace is
referred to using a resource reference. The resource reference **type** is
`NAMESPACE` and resource reference **key id** is the namespace name.
When calling the [Kubeflow Pipelines REST API](/docs/components/pipelines/reference/api/kubeflow-pipeline-api-spec/), a namespace argument is required for experiment APIs.
<br>
The namespace is specified by a "resource reference" with `type` of `NAMESPACE` and `key.id` equal to the namespace name.

The following example demonstrates how to use [the generated Python API client (kf-server-api)](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.server_api.html) in a multi-user environment.
The following code uses the [generated python API client](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.server_api.html) to create an experiment and pipeline run.

```python
from kfp_server_api import ApiRun, ApiPipelineSpec, \
ApiExperiment, ApiResourceType, ApiRelationship, \
ApiResourceReference, ApiResourceKey
# or you can also do the following instead
# from kfp_server_api import *

experiment=client.experiments.create_experiment(body=ApiExperiment(
name='test-experiment-1234',
resource_references=[ApiResourceReference(
key=ApiResourceKey(
id='<namespace>', # Replace with your own namespace.
type=ApiResourceType.NAMESPACE,
),
relationship=ApiRelationship.OWNER,
)],
))
import kfp
from kfp_server_api import *

# the namespace in which you deployed Kubeflow Pipelines
kubeflow_namespace = "kubeflow"

# the namespace of your pipelines user (where the pipeline will be executed)
user_namespace = "jane-doe"

# the KF_PIPELINES_SA_TOKEN_PATH environment variable is used when no `path` is set
# the default KF_PIPELINES_SA_TOKEN_PATH is /var/run/secrets/kubeflow/pipelines/token
credentials = kfp.auth.ServiceAccountTokenVolumeCredentials(path=None)

# create a client
client = kfp.Client(host=f"http://ml-pipeline-ui.{kubeflow_namespace}", credentials=credentials)

# create an experiment
experiment: ApiExperiment = client._experiment_api.create_experiment(
body=ApiExperiment(
name="<YOUR_EXPERIMENT_ID>",
resource_references=[
ApiResourceReference(
key=ApiResourceKey(
id=user_namespace,
type=ApiResourceType.NAMESPACE,
),
relationship=ApiRelationship.OWNER,
)
],
)
)
print("-------- BEGIN: EXPERIMENT --------")
print(experiment)
pipeline = client.pipelines.list_pipelines().pipelines[0]
print(pipeline)
client.runs.create_run(body=ApiRun(
name='test-run-1234',
pipeline_spec=ApiPipelineSpec(
pipeline_id=pipeline.id,
),
resource_references=[ApiResourceReference(
key=ApiResourceKey(
id=experiment.id,
type=ApiResourceType.EXPERIMENT,
print("-------- END: EXPERIMENT ----------")

# get the experiment by name (only necessary if you comment out the `create_experiment()` call)
# experiment: ApiExperiment = client.get_experiment(
# experiment_name="<YOUR_EXPERIMENT_ID>",
# namespace=user_namespace
# )

# create a pipeline run
run: ApiRunDetail = client._run_api.create_run(
body=ApiRun(
name="<YOUR_RUN_NAME>",
pipeline_spec=ApiPipelineSpec(
# replace <YOUR_PIPELINE_ID> with the UID of a pipeline definition you have previously uploaded
pipeline_id="<YOUR_PIPELINE_ID>",
),
relationship=ApiRelationship.OWNER,
)],
))
runs=client.runs.list_runs(
resource_references=[ApiResourceReference(
key=ApiResourceKey(
id=experiment.id,
type=ApiResourceType.EXPERIMENT,
),
relationship=ApiRelationship.OWNER,
)
],
)
)
print("-------- BEGIN: RUN --------")
print(run)
print("-------- END: RUN ----------")

# view the pipeline run
runs: ApiListRunsResponse = client._run_api.list_runs(
resource_reference_key_type=ApiResourceType.EXPERIMENT,
resource_reference_key_id=experiment.id,
)
print("-------- BEGIN: RUNS --------")
print(runs)
print("-------- END: RUNS ----------")
```

## Current limitations

### Resources without isolation

The following resources do not currently support isolation and are shared
without access control:
The following resources do not currently support isolation and are shared without access control:

* Pipelines (Pipeline definitions).
* Artifacts, Executions, and other metadata entities in [Machine Learning Metadata (MLMD)](https://www.tensorflow.org/tfx/guide/mlmd).
* [Minio artifact storage](https://min.io/) which contains pipeline runs' input/output artifacts.

## In-cluster API request authentication

Refer to [Connect to Kubeflow Pipelines from the same cluster](/docs/components/pipelines/sdk/connect-api/#connect-to-kubeflow-pipelines-from-the-same-cluster) for details.

Alternatively, in-cluster workloads like Jupyter notebooks or cron tasks can also access Kubeflow Pipelines API through the public endpoint. This option is platform specific and explained in
[Connect to Kubeflow Pipelines from outside your cluster](/docs/components/pipelines/sdk/connect-api/#connect-to-kubeflow-pipelines-from-outside-your-cluster).

0 comments on commit 929aba5

Please sign in to comment.