-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(sdk)!: Implement Graph Component #8179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chensun, very nice to see this in action! My comments are mostly nitpicks, with a few questions/suggestions. Overall, I think adding more compiler test pipelines would be helpful for verifying compilation behavior.
sdk/RELEASE.md
Outdated
@@ -2,10 +2,13 @@ | |||
|
|||
## Major Features and Improvements | |||
* Support parallelism setting in ParallelFor [\#8146](https://github.com/kubeflow/pipelines/pull/8146) | |||
* Support pipeline as a component [\#8179](https://github.com/kubeflow/pipelines/pull/8179) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we wait to add this until adding support for pipeline outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(thought I added a reply to this but somehow it wasn't captured)
I think it's fine to add this release note. Pipeline output is a new feature independent from pipeline as component, although without pipeline outputs it limits the use case, this change alone unblocks use case like conditional email notification. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chensun, WDYT about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my reply but I guess somehow it didn't shown up? Anyway, removed the release note for now.
sdk/python/kfp/compiler/test_data/pipelines/pipeline_in_pipeline.py
Outdated
Show resolved
Hide resolved
return msg | ||
|
||
|
||
print_op2 = components.load_component_from_text(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this included for a v1-v2 compatibility test? If not, WDYT about using the preferred lightweight component authoring approach instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I included this to make sure the inputs/outputs placeholders are still parsed/resolved correctly. The lightweight component that use {{$}}
doesn't have this complex code path. I started this sample before @dsl.container_component
was ready. Now I can replace this to the new recommended approach to demo best practice.
I'll still keep a v1 yaml sample in the other more complex example to cover v1-v2 compatibility.
sdk/python/kfp/compiler/test_data/pipelines/pipeline_in_pipeline.py
Outdated
Show resolved
Hide resolved
@@ -29,13 +29,14 @@ | |||
from kfp.components import structures | |||
import yaml | |||
|
|||
PROJECT_ROOT = os.path.abspath(os.path.join(__file__, *([os.path.pardir] * 5))) | |||
_DEFAULT_PIPELINE_FUNC_NAME = 'my_pipeline' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Why underscore prefix in test file constants? My sense was that we were moving away from this pattern for all files except the main subpackage __init__.py
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We are moving away from prefixing most of the methods assuming all of them unless explicitly exposed via __init__.py
are private. That being said, I think underscore prefix can still apply to the local constants and methods which are not supposed to be referenced outside individual python files (modules). The prefix helps linter to flag such access--although we don't always pay attention to the lint errors, we probably should in the future.
One example shown in this PR is that _component_human_name
reference which shouldn't exist.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I like the underscore to discouraging cross-module object usage. LGTM.
return pipeline_decorator_handler(func) or func | ||
else: | ||
return func | ||
return component_factory.create_graph_component_from_func(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, with this change, sample test code needs to be updated:
File "/workdir/samples/test/utils/kfp/samples/test/utils.py", line 119, in test_wrapper
pipeline_name = case.pipeline_func._component_human_name
AttributeError: 'GraphComponent' object has no attribute '_component_human_name'
Error: exit status 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Updated.
return pipeline_decorator_handler(func) or func | ||
else: | ||
return func | ||
return component_factory.create_graph_component_from_func(func) | ||
|
||
return _pipeline | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable to not want to change too much at once here, but for a future PR, I wonder if the GraphComponent
class eliminates the need to have this Pipeline
class. It would certainly take some refactoring, but I don't think we need to get_default_pipeline
anymore because the pipeline will be the GraphComponent
being created. The GraphComponent
object can handle registration of its tasks.
Curious what you think about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you're probably right. I'll give this a bit more thoughts and probably make the change in a follow up PR.
# It can be used by command-line DSL compiler to inject code that runs for every | ||
# pipeline definition. | ||
pipeline_decorator_handler = None | ||
|
||
|
||
def pipeline(name: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to force this decorator to be called? Or enable both patterns as with @dsl.component
? I see the error message for uncalled an @dsl.pipeline
decorator was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look back at the change, I realize the issue was missing the user provided pipeline name which is used as pipeline context. Enforcing the decorator to be called doesn't really enforce that users must provide value for the name
parameter. So I think we should probably find another way to make sure we have pipeline name.
With the introduction of graph component, I don't see we have to enforce the name when the pipeline is defined because it can be used only as an inner step. So maybe we can just rely on check at submission time to make sure there's a name for pipeline context? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't pipeline name fall back to a sanitized version of the pipeline function name? It's possible I'm misunderstanding somewhere else name is used. In the following example, I do not provide a pipeline name, yet there is one in IR:
from kfp import dsl
from kfp import compiler
import yaml
@dsl.component
def nothing():
pass
@dsl.pipeline()
def my_pipeline():
nothing()
output = 'out.yaml'
compiler.Compiler().compile(pipeline_func=my_pipeline, package_path=output)
with open(output, 'r') as f:
ir = yaml.safe_load(f.read())
assert ir['pipelineInfo']['name'] == 'my-pipeline'
If I am correct, my current thinking is that irrespective of the introduction of graph components we should consider implementing the @dsl.pipeline
decorator similarly to the @dsl.component
decorator and not requiring that it be called when used.
This is somewhat unrelated to your PR (other than the removal of those tests) and I think this can be implemented in a separate PR if we go this route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we required users to provide a name explicitly because of it importance--being used for pipeline context. Maybe at some point our implementation added the fallback logic so that explicit name is no longer required.
Other than the name requirement, I can't think of any reason that we want to enforce it to be called.
|
||
Args: | ||
root_group: The root group of a pipeline. | ||
def merge_deployment_spec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function causes a problem if there are multiple pipelines defined in a single Python file/execution, but only some of them are included in the final submitted pipeline. For example, try submitting my_pipeline:
from kfp import dsl
@dsl.component
def print_op1(msg: str) -> str:
print(msg)
return msg
@dsl.component
def print_op2(msg: str) -> str:
print(msg)
return msg
@dsl.pipeline(name='test-graph-component')
def graph_component(msg: str):
task = print_op1(msg=msg)
print_op2(msg=task.output)
@dsl.pipeline(name='test-graph-component')
def another_graph_comp(msg: str):
task = print_op1(msg=msg)
print_op2(msg=task.output)
graph_component(msg='fifth')
@dsl.pipeline(name='pipeline-in-pipeline')
def my_pipeline():
t = print_op1(msg='Hello')
graph_component(msg='world')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great insight! This is a critical miss. I updated the code to make sure the merging doesn't modify the "template" in place, and added test case for it.
/test kubeflow-pipelines-samples-v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There are only a few question/comments.
@@ -46,6 +46,7 @@ | |||
'two_step_pipeline_containerized', | |||
'pipeline_with_multiple_exit_handlers', | |||
'pipeline_with_parallelfor_parallelism', | |||
'pipeline_in_pipeline', | |||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There are few comments:
- This might be irrevalent, but does this feature allow a component to call itself recursively? Or is it possible that the call stack is like A -> B -> C -> A?
- Maybe also put the two compiler tests in v2 sample tests as well to make sure they pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your updates after the initial review! Only remaining comments are about the release note and sample tests.
command: | ||
- echo | ||
- {inputValue: msg} | ||
""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 I like the three styles here
# TODO: remove accessing of protected member | ||
pipeline_name = getattr(case.pipeline_func, | ||
'_component_human_name', | ||
case.pipeline_func.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI:
File "/workdir/samples/test/utils/kfp/samples/test/utils.py", line 122, in test_wrapper
case.pipeline_func.name)
AttributeError: 'function' object has no attribute 'name'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... guessing the default is evaluated first regardless if getattr
find the attribute.
# Make a copy of the sub_pipeline_spec so that the "template" remains | ||
# unchanged and works even the pipeline is reused multiple times. | ||
sub_pipeline_spec_copy = pipeline_spec_pb2.PipelineSpec() | ||
sub_pipeline_spec_copy.CopyFrom(sub_pipeline_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Implement Graph Component * isort imports * release note * address review comments * address review comments
Description of your changes:
@pipeline
decorator.Checklist: