Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk)!: Implement Graph Component #8179

Merged
merged 6 commits into from Aug 26, 2022

Conversation

chensun
Copy link
Member

@chensun chensun commented Aug 22, 2022

Description of your changes:

  • Support re-using a pipeline in another pipeline (aka graph component).
  • Refactoring compiler code, and move pipeline compiling trigger to @pipeline decorator.

Checklist:

@chensun chensun removed the request for review from zijianjoy August 22, 2022 22:24
@connor-mccarthy
Copy link
Member

/retest

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chensun, very nice to see this in action! My comments are mostly nitpicks, with a few questions/suggestions. Overall, I think adding more compiler test pipelines would be helpful for verifying compilation behavior.

sdk/RELEASE.md Outdated
@@ -2,10 +2,13 @@

## Major Features and Improvements
* Support parallelism setting in ParallelFor [\#8146](https://github.com/kubeflow/pipelines/pull/8146)
* Support pipeline as a component [\#8179](https://github.com/kubeflow/pipelines/pull/8179)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait to add this until adding support for pipeline outputs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thought I added a reply to this but somehow it wasn't captured)

I think it's fine to add this release note. Pipeline output is a new feature independent from pipeline as component, although without pipeline outputs it limits the use case, this change alone unblocks use case like conditional email notification. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chensun, WDYT about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my reply but I guess somehow it didn't shown up? Anyway, removed the release note for now.

return msg


print_op2 = components.load_component_from_text("""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this included for a v1-v2 compatibility test? If not, WDYT about using the preferred lightweight component authoring approach instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I included this to make sure the inputs/outputs placeholders are still parsed/resolved correctly. The lightweight component that use {{$}} doesn't have this complex code path. I started this sample before @dsl.container_component was ready. Now I can replace this to the new recommended approach to demo best practice.

I'll still keep a v1 yaml sample in the other more complex example to cover v1-v2 compatibility.

@@ -29,13 +29,14 @@
from kfp.components import structures
import yaml

PROJECT_ROOT = os.path.abspath(os.path.join(__file__, *([os.path.pardir] * 5)))
_DEFAULT_PIPELINE_FUNC_NAME = 'my_pipeline'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Why underscore prefix in test file constants? My sense was that we were moving away from this pattern for all files except the main subpackage __init__.py files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We are moving away from prefixing most of the methods assuming all of them unless explicitly exposed via __init__.py are private. That being said, I think underscore prefix can still apply to the local constants and methods which are not supposed to be referenced outside individual python files (modules). The prefix helps linter to flag such access--although we don't always pay attention to the lint errors, we probably should in the future.

One example shown in this PR is that _component_human_name reference which shouldn't exist.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I like the underscore to discouraging cross-module object usage. LGTM.

return pipeline_decorator_handler(func) or func
else:
return func
return component_factory.create_graph_component_from_func(func)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, with this change, sample test code needs to be updated:

  File "/workdir/samples/test/utils/kfp/samples/test/utils.py", line 119, in test_wrapper
    pipeline_name = case.pipeline_func._component_human_name
AttributeError: 'GraphComponent' object has no attribute '_component_human_name'
Error: exit status 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated.

sdk/python/kfp/components/component_factory.py Outdated Show resolved Hide resolved
return pipeline_decorator_handler(func) or func
else:
return func
return component_factory.create_graph_component_from_func(func)

return _pipeline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to not want to change too much at once here, but for a future PR, I wonder if the GraphComponent class eliminates the need to have this Pipeline class. It would certainly take some refactoring, but I don't think we need to get_default_pipeline anymore because the pipeline will be the GraphComponent being created. The GraphComponent object can handle registration of its tasks.

Curious what you think about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you're probably right. I'll give this a bit more thoughts and probably make the change in a follow up PR.

# It can be used by command-line DSL compiler to inject code that runs for every
# pipeline definition.
pipeline_decorator_handler = None


def pipeline(name: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to force this decorator to be called? Or enable both patterns as with @dsl.component? I see the error message for uncalled an @dsl.pipeline decorator was removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look back at the change, I realize the issue was missing the user provided pipeline name which is used as pipeline context. Enforcing the decorator to be called doesn't really enforce that users must provide value for the name parameter. So I think we should probably find another way to make sure we have pipeline name.
With the introduction of graph component, I don't see we have to enforce the name when the pipeline is defined because it can be used only as an inner step. So maybe we can just rely on check at submission time to make sure there's a name for pipeline context? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't pipeline name fall back to a sanitized version of the pipeline function name? It's possible I'm misunderstanding somewhere else name is used. In the following example, I do not provide a pipeline name, yet there is one in IR:

from kfp import dsl
from kfp import compiler
import yaml


@dsl.component
def nothing():
    pass


@dsl.pipeline()
def my_pipeline():
    nothing()


output = 'out.yaml'
compiler.Compiler().compile(pipeline_func=my_pipeline, package_path=output)

with open(output, 'r') as f:
    ir = yaml.safe_load(f.read())

assert ir['pipelineInfo']['name'] == 'my-pipeline'

If I am correct, my current thinking is that irrespective of the introduction of graph components we should consider implementing the @dsl.pipeline decorator similarly to the @dsl.component decorator and not requiring that it be called when used.

This is somewhat unrelated to your PR (other than the removal of those tests) and I think this can be implemented in a separate PR if we go this route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we required users to provide a name explicitly because of it importance--being used for pipeline context. Maybe at some point our implementation added the fallback logic so that explicit name is no longer required.

Other than the name requirement, I can't think of any reason that we want to enforce it to be called.


Args:
root_group: The root group of a pipeline.
def merge_deployment_spec(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function causes a problem if there are multiple pipelines defined in a single Python file/execution, but only some of them are included in the final submitted pipeline. For example, try submitting my_pipeline:

from kfp import dsl


@dsl.component
def print_op1(msg: str) -> str:
    print(msg)
    return msg


@dsl.component
def print_op2(msg: str) -> str:
    print(msg)
    return msg


@dsl.pipeline(name='test-graph-component')
def graph_component(msg: str):
    task = print_op1(msg=msg)
    print_op2(msg=task.output)


@dsl.pipeline(name='test-graph-component')
def another_graph_comp(msg: str):
    task = print_op1(msg=msg)
    print_op2(msg=task.output)
    graph_component(msg='fifth')


@dsl.pipeline(name='pipeline-in-pipeline')
def my_pipeline():
    t = print_op1(msg='Hello')
    graph_component(msg='world')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great insight! This is a critical miss. I updated the code to make sure the merging doesn't modify the "template" in place, and added test case for it.

@chensun
Copy link
Member Author

chensun commented Aug 24, 2022

/test kubeflow-pipelines-samples-v2

Copy link
Contributor

@zichuan-scott-xu zichuan-scott-xu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There are only a few question/comments.

@@ -46,6 +46,7 @@
'two_step_pipeline_containerized',
'pipeline_with_multiple_exit_handlers',
'pipeline_with_parallelfor_parallelism',
'pipeline_in_pipeline',
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There are few comments:

  1. This might be irrevalent, but does this feature allow a component to call itself recursively? Or is it possible that the call stack is like A -> B -> C -> A?
  2. Maybe also put the two compiler tests in v2 sample tests as well to make sure they pass.

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your updates after the initial review! Only remaining comments are about the release note and sample tests.

command:
- echo
- {inputValue: msg}
""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 I like the three styles here

# TODO: remove accessing of protected member
pipeline_name = getattr(case.pipeline_func,
'_component_human_name',
case.pipeline_func.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI:

  File "/workdir/samples/test/utils/kfp/samples/test/utils.py", line 122, in test_wrapper
    case.pipeline_func.name)
AttributeError: 'function' object has no attribute 'name'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... guessing the default is evaluated first regardless if getattr find the attribute.

# Make a copy of the sub_pipeline_spec so that the "template" remains
# unchanged and works even the pipeline is reused multiple times.
sub_pipeline_spec_copy = pipeline_spec_pb2.PipelineSpec()
sub_pipeline_spec_copy.CopyFrom(sub_pipeline_spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 5ce4954 into kubeflow:master Aug 26, 2022
@chensun chensun deleted the graph-component branch August 30, 2022 09:53
jlyaoyuli pushed a commit to jlyaoyuli/pipelines that referenced this pull request Jan 5, 2023
* Implement Graph Component

* isort imports

* release note

* address review comments

* address review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants