Skip to content


add KFP docstring style and pipeline metadata information to website (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
connor-mccarthy committed Apr 24, 2023
1 parent bb90517 commit 82708d9
Show file tree
Hide file tree
Showing 3 changed files with 78 additions and 3 deletions.
@@ -0,0 +1,70 @@
title = "Additional Functionality"
description = "More information about authoring KFP components"
weight = 6

### Component docstring format
KFP allows you to document your components and pipelines using Python docstrings. The KFP SDK automatically parses your docstrings and include certain fields in [IR YAML][ir-yaml] when you compile components and pipelines.

For components, KFP can extract your component **input descriptions** and **output descriptions**.

For pipelines, KFP can extract your pipeline **input descriptions** and **output descriptions**, as well as a **description of your full pipeline**.

For the KFP SDK to correctly parse your docstrings, you should write your docstrings in the KFP docstring style. The KFP docstring style is a particular variant on the [Google docstring style][google-docstring-style], with the following changes:
* The `Returns:` section takes the same structure as the `Args:` section, where each return value in the `Returns:` section should take the form `<name>: <description>`. This is distinct from the typical Google docstring `Returns:` section which takes the form `<type>: <description>`, with no names for return values.
* Component outputs should be included in the `Returns:` section, even though they are declared via component function input parameters. This applies to function parameters annotated with [`dsl.OutputPath`][dsl-outputpath] and the [`Output[<Artifact>]`][output-type-marker] type marker for declaring [output artifacts][output-artifacts].
* *Suggested:* Type information, including which inputs are optional/required, should be omitted from the input/output descriptions. This information is duplicative of the annotations.

For example, the KFP SDK can extract input and output descriptions from the following component docstring which uses the KFP docstring style:

def join_datasets(
dataset_a: Input[Dataset],
dataset_b: Input[Dataset],
out_dataset: Output[Dataset],
) -> str:
"""Concatenates two datasets.
dataset_a: First dataset.
dataset_b: Second dataset.
out_dataset: The concatenated dataset.
Output: The concatenated string.

Similarly, KFP can extract the component input descriptions, the component output descriptions, and the pipeline description from the following pipeline docstring:

@dsl.pipeline(display_name='Concatenation pipeline')
def dataset_concatenator(
string: str,
in_dataset: Input[Dataset],
) -> Dataset:
"""Pipeline to convert string to a Dataset, then concatenate with
string: String to concatenate to in_artifact.
in_dataset: Dataset to which to concatenate string.
Output: The final concatenated dataset.

Note that if you provide a `description` argument to the [`@dsl.pipeline`][dsl-pipeline] decorator, KFP will use this description instead of the docstring description.

[ir-yaml]: /docs/components/pipelines/v2/compile-a-pipeline#ir-yaml
[output-artifacts]: /docs/components/pipelines/v2/data-types/artifacts#declaring-inputoutput-artifacts
@@ -1,5 +1,5 @@
title = "Special case: Importer Components"
title = "Special Case: Importer Components"
description = "Import artifacts from outside your pipeline"
weight = 5
Expand Down
Expand Up @@ -50,17 +50,21 @@ KFP pipelines are defined inside functions decorated with the `@dsl.pipeline` de
* `name` is the name of your pipeline. If not provided, the name defaults to a sanitized version of the pipeline function name.
* `description` is a description of the pipeline.
* `pipeline_root` is the root path of the remote storage destination within which the tasks in your pipeline will create outputs. `pipeline_root` may also be set or overridden by pipeline submission clients.
* `display_name` is a human-readable for your pipeline.

You can modify the definition of `pythagorean` to use these arguments:

description='Solve for the length of a hypotenuse of a triangle with sides length `a` and `b`.',
display_name='Pythagorean pipeline.')
def pythagorean(a: float, b: float) -> float:

Also see [Additional Functionality: Component docstring format][component-docstring-format] for information on how to provide pipeline metadata via docstrings.

### Pipeline inputs and outputs

Like [components][components], pipeline inputs and outputs are defined by the parameters and annotations in the pipeline function signature.
Expand Down Expand Up @@ -190,4 +194,5 @@ def pythagorean(a: float = 1.2, b: float = 1.2) -> float:
[output-artifacts]: /docs/components/pipelines/v2/data-types/artifacts#using-output-artifacts
[container-component-outputs]: /docs/components/pipelines/v2/components/container-components#create-component-outputs
[parameters-namedtuple]: /docs/components/pipelines/v2/data-types/parameters#multiple-output-parameters
[component-docstring-format]: /docs/components/pipelines/v2/components/additional-functionality#component-docstring-format

0 comments on commit 82708d9

Please sign in to comment.