improve docs for connecting Pipelines SDK to Kubeflow (#3021)

* improve docs for connecting Pipelines SDK to Kubeflow * improve docs for connecting Pipelines SDK to Kubeflow * improve docs for connecting Pipelines SDK to Kubeflow
kubeflow · Sep 23, 2022 · 929aba5 · 929aba5
1 parent 1cf07ec
commit 929aba5
Show file tree

Hide file tree

Showing 2 changed files with 441 additions and 316 deletions.
diff --git a/content/en/docs/components/pipelines/v1/overview/multi-user.md b/content/en/docs/components/pipelines/v1/overview/multi-user.md
@@ -1,175 +1,178 @@
 +++
-title = "Multi-user Isolation for Pipelines"
-description = "Getting started with Kubeflow Pipelines multi-user isolation"
+title = "Multi-user Isolation"
+description = "How multi-user isolation works in Kubeflow Pipelines"
 weight = 30
 +++
 
-Multi-user isolation for Kubeflow Pipelines is an integration to [Kubeflow multi-user isolation](/docs/components/multi-tenancy/).
+Multi-user isolation for Kubeflow Pipelines is part of Kubeflow's overall [multi-tenancy](/docs/components/multi-tenancy/) feature.
 
-Refer to [Getting Started with Multi-user isolation](/docs/components/multi-tenancy/getting-started/)
-for the common Kubeflow multi-user operations including the following:
+{{% alert title="Tip" color="info" %}}
+* Kubeflow Pipelines multi-user isolation is only supported in ["full" Kubeflow deployments](/docs/components/pipelines/installation/overview/#full-kubeflow-deployment).
+* Refer to [Getting Started with Multi-user isolation](/docs/components/multi-tenancy/getting-started/) for the common Kubeflow multi-user operations 
+  like [Managing contributors](/docs/components/multi-tenancy/getting-started/#managing-contributors-through-the-kubeflow-ui).
+{{% /alert %}}
 
-* [Grant user minimal Kubernetes cluster access](/docs/components/multi-tenancy/getting-started/#pre-requisites-grant-user-minimal-kubernetes-cluster-access)
-* [Managing contributors through the Kubeflow UI](/docs/components/multi-tenancy/getting-started/#managing-contributors-through-the-kubeflow-ui)
-* For Google Cloud: [In-cluster authentication to Google Cloud from Kubeflow](/docs/gke/authentication/#in-cluster-authentication)
-
-Note, Kubeflow Pipelines multi-user isolation is only supported in
-[the full Kubeflow deployment](/docs/components/pipelines/installation/overview/#full-kubeflow-deployment)
-starting from Kubeflow v1.1 and **currently** on all platforms except OpenShift. For the latest status about platform support, refer to [kubeflow/manifests#1364](https://github.com/kubeflow/manifests/issues/1364#issuecomment-668415871).
-
-Also be aware that the isolation support in Kubeflow doesn’t provide any hard
-security guarantees against malicious attempts by users to infiltrate other
-user’s profiles.
-
 ## How are resources separated?
 
-Kubeflow Pipelines separates its resources by Kubernetes namespaces (Kubeflow profiles).
-
-Experiments belong to namespaces directly and there's no longer a default
-experiment. Runs and recurring runs belong to their parent experiment's namespace.
+Kubeflow Pipelines separates resources using Kubernetes namespaces that are managed by Kubeflow's [Profile resources](/docs/components/multi-tenancy/overview/#key-concepts).
+Other users cannot see resources in your Profile/Namespace without permission, because the Kubeflow Pipelines API server 
+rejects requests for namespaces that the current user is not authorized to access.
 
-Pipeline runs are executed in user namespaces, so that users can leverage Kubernetes
-namespace isolation. For example, they can configure different secrets for other
-services in different namespaces.
+"Experiments" belong to namespaces directly, runs and recurring runs belong to their parent experiment's namespace.
 
-Other users cannot see resources in your namespace without permission, because
-the Kubeflow Pipelines API server rejects requests for namespaces that the
-current user is not authorized to access.
+"Pipeline Runs" are executed in user namespaces, so that users can leverage Kubernetes namespace isolation. 
+For example, they can configure different secrets for other services in different namespaces.
 
-Note, there's no multi-user isolation for pipeline definitions right now.
-Refer to [Current Limitations](#current-limitations) section for more details.
+{{% alert title="Warning" color="warning" %}}
+Kubeflow makes no hard security guarantees about Profile isolation.
+<br>
+User profiles have no additional isolation beyond what is provided by Kubernetes Namespaces.
+{{% /alert %}}
 
-### When using the UI
+## When using the UI
 
-When you visit the Kubeflow Pipelines UI from the Kubeflow dashboard, it only shows
-experiments, runs, and recurring runs in your chosen namespace. Similarly, when
-you create resources from the UI, they also belong to the namespace you have
-chosen.
+When you visit the Kubeflow Pipelines UI from the Kubeflow Dashboard, it only shows "experiments", "runs", and "recurring runs" in your chosen namespace. 
+Similarly, when you create resources from the UI, they also belong to the namespace you have chosen.
 
-You can select a different namespace to view resources in other namespaces.
+{{% alert title="Warning" color="warning" %}}
+Pipeline definitions are not isolated right now, and are shared across all namespaces, see [Current Limitations](#current-limitations) for more details.
+{{% /alert %}}
 
-### When using the SDK
+## When using the SDK
 
-First, you need to connect to the Kubeflow Pipelines public endpoint using the
-SDK. For Google Cloud, follow [these instructions](/docs/gke/pipelines/authentication-sdk/#connecting-to-kubeflow-pipelines-in-a-full-kubeflow-deployment).
+How to connect Pipelines SDK to Kubeflow Pipelines will depend on __what kind__ of Kubeflow deployment you have, and __from where you are running your code__.
 
-When calling SDK methods for experiments, you need to provide the additional
-namespace argument. Runs, recurring runs are owned by an experiment. They are
-in the same namespace as the parent experiment, so you can just call their SDK
-methods in the same way as before.
+* [Full Kubeflow (from inside cluster)](/docs/components/pipelines/sdk/connect-api/#full-kubeflow-subfrom-inside-clustersub)
+* [Full Kubeflow (from outside cluster)](/docs/components/pipelines/sdk/connect-api/#full-kubeflow-subfrom-outside-clustersub)
+* [Standalone Kubeflow Pipelines (from inside cluster)](/docs/components/pipelines/sdk/connect-api/#standalone-kubeflow-pipelines-subfrom-inside-clustersub)
+* [Standalone Kubeflow Pipelines (from outside cluster)](/docs/components/pipelines/sdk/connect-api/#standalone-kubeflow-pipelines-subfrom-outside-clustersub)
 
-For example:
+The following Python code will create an experiment (and associated run) from a Pod inside a full Kubeflow cluster.
 
 ```python
 import kfp
-client = kfp.Client(...) # Refer to documentation above for detailed arguments.
 
-client.create_experiment(name='<Your experiment name>', namespace='<Your namespace>')
-print(client.list_experiments(namespace='<Your namespace>'))
-client.run_pipeline(
-    experiment_id='<Your experiment ID>', # Experiment determines namespace.
-    job_name='<Your job ID>',
-    pipeline_id='<Your pipeline ID>')
-print(client.list_runs(experiment_id='<Your experiment ID>'))
-print(client.list_runs(namespace='<Your namespace>'))
-```
+# the namespace in which you deployed Kubeflow Pipelines
+kubeflow_namespace = "kubeflow"
 
-To store your user namespace as the default context, use the
-[`set_user_namespace`](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html#kfp.Client.set_user_namespace)
-method. This method stores your user namespace in a configuration file at
-`$HOME/.config/kfp/context.json`. After setting a default namespace, the SDK
-methods default to use this namespace if no namespace argument is provided.
+# the namespace of your pipelines user (where the pipeline will be executed)
+user_namespace = "jane-doe"
 
-```python
-# Note, this saves the namespace in `$HOME/.config/kfp/context.json`. Therefore,
-# You only need to call this once. The saved namespace context will be picked up
-# by other clients you use later.
-client.set_user_namespace(namespace='<Your namespace>')
-print(client.get_user_namespace())
-
-client.create_experiment(name='<Your experiment name>')
-print(client.list_experiments())
-client.run_pipeline(
-    experiment_id='<Your experiment ID>', # Experiment determines namespace.
-    job_name='<Your job name>',
-    pipeline_id='<Your pipeline ID>')
-print(client.list_runs())
+# the KF_PIPELINES_SA_TOKEN_PATH environment variable is used when no `path` is set
+# the default KF_PIPELINES_SA_TOKEN_PATH is /var/run/secrets/kubeflow/pipelines/token
+credentials = kfp.auth.ServiceAccountTokenVolumeCredentials(path=None)
 
-# Specifying a different namespace will override the default context.
-print(client.list_runs(namespace='<Your other namespace>'))
-```
+# create a client
+client = kfp.Client(host=f"http://ml-pipeline-ui.{kubeflow_namespace}", credentials=credentials)
 
-Note, it is no longer possible to access the Kubeflow Pipelines API service from
-in-cluster workload directly, read [Current Limitations section](#current-limitations)
-for more details.
+# create an experiment
+client.create_experiment(name="<YOUR_EXPERIMENT_ID>", namespace=user_namespace)
+print(client.list_experiments(namespace=user_namespace))
+
+# create a pipeline run
+client.run_pipeline(
+    experiment_id="<YOUR_EXPERIMENT_ID>",  # the experiment determines the namespace
+    job_name="<YOUR_RUN_NAME>",
+    pipeline_id="<YOUR_PIPELINE_ID>"  # the pipeline definition to run
+)
+print(client.list_runs(experiment_id="<YOUR_EXPERIMENT_ID>"))
+print(client.list_runs(namespace=user_namespace))
+```
 
-Detailed documentation for the Kubeflow Pipelines SDK can be found in the
-[Kubeflow Pipelines SDK Reference](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html).
+{{% alert title="Tip" color="info" %}}
+* To set a default namespace for Pipelines SDK commands, use the [`kfp.Client().set_user_namespace()`](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html#kfp.Client.set_user_namespace) method, 
+  this method stores your user namespace in a configuration file at `$HOME/.config/kfp/context.json`.
+* Detailed documentation for `kfp.Client()` can be found in the [Kubeflow Pipelines SDK Reference](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.client.html).
+{{% /alert %}}
 
-### When using REST API or generated Python API client
+## When using the REST API
 
-Similarly, when calling [REST API endpoints](/docs/components/pipelines/reference/api/kubeflow-pipeline-api-spec/)
-or using [the generated Python API client](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.server_api.html),
-namespace argument is required for experiment APIs. Note that namespace is
-referred to using a resource reference. The resource reference **type** is
-`NAMESPACE` and resource reference **key id** is the namespace name.
+When calling the [Kubeflow Pipelines REST API](/docs/components/pipelines/reference/api/kubeflow-pipeline-api-spec/), a namespace argument is required for experiment APIs.
+<br>
+The namespace is specified by a "resource reference" with `type` of `NAMESPACE` and `key.id` equal to the namespace name.
 
-The following example demonstrates how to use [the generated Python API client (kf-server-api)](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.server_api.html) in a multi-user environment.
+The following code uses the [generated python API client](https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.server_api.html) to create an experiment and pipeline run.
 
 ```python
-from kfp_server_api import ApiRun, ApiPipelineSpec, \
-    ApiExperiment, ApiResourceType, ApiRelationship, \
-    ApiResourceReference, ApiResourceKey
-# or you can also do the following instead
-# from kfp_server_api import *
-
-experiment=client.experiments.create_experiment(body=ApiExperiment(
-    name='test-experiment-1234',
-    resource_references=[ApiResourceReference(
-        key=ApiResourceKey(
-            id='<namespace>', # Replace with your own namespace.
-            type=ApiResourceType.NAMESPACE,
-        ),
-        relationship=ApiRelationship.OWNER,
-    )],
-))
+import kfp
+from kfp_server_api import *
+
+# the namespace in which you deployed Kubeflow Pipelines
+kubeflow_namespace = "kubeflow"
+
+# the namespace of your pipelines user (where the pipeline will be executed)
+user_namespace = "jane-doe"
+
+# the KF_PIPELINES_SA_TOKEN_PATH environment variable is used when no `path` is set
+# the default KF_PIPELINES_SA_TOKEN_PATH is /var/run/secrets/kubeflow/pipelines/token
+credentials = kfp.auth.ServiceAccountTokenVolumeCredentials(path=None)
+
+# create a client
+client = kfp.Client(host=f"http://ml-pipeline-ui.{kubeflow_namespace}", credentials=credentials)
+
+# create an experiment
+experiment: ApiExperiment = client._experiment_api.create_experiment(
+    body=ApiExperiment(
+        name="<YOUR_EXPERIMENT_ID>",
+        resource_references=[
+            ApiResourceReference(
+                key=ApiResourceKey(
+                    id=user_namespace,
+                    type=ApiResourceType.NAMESPACE,
+                ),
+                relationship=ApiRelationship.OWNER,
+            )
+        ],
+    )
+)
+print("-------- BEGIN: EXPERIMENT --------")
 print(experiment)
-pipeline = client.pipelines.list_pipelines().pipelines[0]
-print(pipeline)
-client.runs.create_run(body=ApiRun(
-    name='test-run-1234',
-    pipeline_spec=ApiPipelineSpec(
-        pipeline_id=pipeline.id,
-    ),
-    resource_references=[ApiResourceReference(
-        key=ApiResourceKey(
-            id=experiment.id,
-            type=ApiResourceType.EXPERIMENT,
+print("-------- END: EXPERIMENT ----------")
+
+# get the experiment by name (only necessary if you comment out the `create_experiment()` call)
+# experiment: ApiExperiment = client.get_experiment(
+#     experiment_name="<YOUR_EXPERIMENT_ID>",
+#     namespace=user_namespace
+# )
+
+# create a pipeline run
+run: ApiRunDetail = client._run_api.create_run(
+    body=ApiRun(
+        name="<YOUR_RUN_NAME>",
+        pipeline_spec=ApiPipelineSpec(
+            # replace <YOUR_PIPELINE_ID> with the UID of a pipeline definition you have previously uploaded
+            pipeline_id="<YOUR_PIPELINE_ID>",
         ),
-        relationship=ApiRelationship.OWNER,
-    )],
-))
-runs=client.runs.list_runs(
+        resource_references=[ApiResourceReference(
+            key=ApiResourceKey(
+                id=experiment.id,
+                type=ApiResourceType.EXPERIMENT,
+            ),
+            relationship=ApiRelationship.OWNER,
+        )
+        ],
+    )
+)
+print("-------- BEGIN: RUN --------")
+print(run)
+print("-------- END: RUN ----------")
+
+# view the pipeline run
+runs: ApiListRunsResponse = client._run_api.list_runs(
     resource_reference_key_type=ApiResourceType.EXPERIMENT,
     resource_reference_key_id=experiment.id,
 )
+print("-------- BEGIN: RUNS --------")
 print(runs)
+print("-------- END: RUNS ----------")
 ```
 
 ## Current limitations
 
 ### Resources without isolation
 
-The following resources do not currently support isolation and are shared
-without access control:
+The following resources do not currently support isolation and are shared without access control:
 
 * Pipelines (Pipeline definitions).
 * Artifacts, Executions, and other metadata entities in [Machine Learning Metadata (MLMD)](https://www.tensorflow.org/tfx/guide/mlmd).
 * [Minio artifact storage](https://min.io/) which contains pipeline runs' input/output artifacts.
-
-## In-cluster API request authentication
-
-Refer to [Connect to Kubeflow Pipelines from the same cluster](/docs/components/pipelines/sdk/connect-api/#connect-to-kubeflow-pipelines-from-the-same-cluster) for details.
-
-Alternatively, in-cluster workloads like Jupyter notebooks or cron tasks can also access Kubeflow Pipelines API through the public endpoint. This option is platform specific and explained in 
-[Connect to Kubeflow Pipelines from outside your cluster](/docs/components/pipelines/sdk/connect-api/#connect-to-kubeflow-pipelines-from-outside-your-cluster).