Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[To discuss] @node decorator syntax #2471

Closed
merelcht opened this issue Mar 27, 2023 · 7 comments
Closed

[To discuss] @node decorator syntax #2471

merelcht opened this issue Mar 27, 2023 · 7 comments

Comments

@merelcht
Copy link
Member

Introduction

ChatGPT once suggested this syntax to a user

image

Is this a syntax we'd like to add to Kedro?

@merelcht merelcht changed the title To Discuss: @node decorator syntax [To discuss] @node decorator syntax Mar 29, 2023
@jasonmhite
Copy link
Contributor

Some folks on my team have been doing this for a while and it's pretty ergonomic. A basic implementation is trivial, just

from kedro.pipeline import node

def make_node(**kwargs):
  return lambda func: node(func=func, **kwargs)

@antonymilne
Copy link
Contributor

Copying here my comments from #2408.

As noted already, there's a problem here if you want to reuse the same function in multiple nodes and when you start using modular pipelines. These are not insurmountable with the decorator approach but it does get cumbersome.

That said, I actually think there's potentially a lot of value in the decorator method for simple projects. Just like auto-registration of pipelines, anything we can do to make it easier for people to make a simple project is good in my book. Certainly just slapping a node decorator on a function is easier than needing to make a pipeline.py file and probably sufficient in lots of cases.

In fact, a while ago there was a QB tool that did something similar and took the simplification even further:

  • use the name of the function arguments as dataset names by default (which is very common practice already), so you don't even need to write out the inputs mapping unless it's something other than the default
  • use the name of the function to generate the the output(s?) name by default, so you don't even need to write out the outputs mapping unless it's something other than the default
  • automatically detect all functions decorated with node so that you don't even need to write a pipeline.py file at all
  • in cases where I don't want to change input/output name or apply tags etc., there's in fact no need to use the decorator at all - go through and automatically detect node functions by finding functions with a certain pattern in their name like node_xyz and automatically make the pipeline based on this

I actually think there's a lot to be said for at least some aspects of this approach, although it won't ever replace the current system.

@noklam
Copy link
Contributor

noklam commented May 4, 2023

use the name of the function arguments as dataset names by default (which is very common practice already), so you don't even need to write out the mapping of the input unless it's something other than the default

This is a reasonable default, but I suspect this is less scalable as you are likely to use a generic variable name like df, and these name collide.
In addition, I use CAPTIAL letter for the persisted dataset as a convention (I think I saw similar patterns elsewhere as well), this is to differentiate the MemoryDataSet that is generated by Kedro automatically. I

use the name of the function to generate the the output(s?) name by default, so you don't even need to write out the outputs mapping unless it's something other than the default

automatically detect all functions decorated with node so that you don't even need to write a pipeline.py file at all

this is my favourite part, It's sometimes annoying to search over 10 pipelines.py.

in cases where I don't want to change input/output name or apply tags etc., there's in fact no need to use the decorator at all - go through and automatically detect node functions by finding functions with a certain pattern in their name like node_xyz and automatically make the pipeline based on this

@jasonmhite
Copy link
Contributor

Definitely agree with @antonymilne , this functionality definitely would not replace the existing scheme, just act as sugar for what I suspect is a decent chunk of use cases. The possibility to automatically infer names and arguments would be slick.

@noklam
Copy link
Contributor

noklam commented Jun 3, 2023

Just coming back to it again, it may be quite useful for learning kedro smoothing.

I have been teaching some people about Kedro lately and sometimes they get stuck about how to write a node , then I will ask "forget about Kedro, how would you write it in normal python script?". Suddenly they can write the code properly and then just map it to Kedro.

I suspect this happen more for beginner when they trying to jump too fast and think about node. @astrojuanlu Have you ever encountered this, is it worth to check with other people who teaches Kedro?

@astrojuanlu
Copy link
Member

Hi @astrojuanlu, I thought of the decorator syntax but didn't find it satisfactory because of 2 reasons:

  • It conflates functions and nodes. I have functions that are reused in different pipelines (and want to keep open the ability to reuse others in new pipelines). Tying functions to specific inputs and outputs makes reuse harder.
  • It reduces readability because the pipeline is now scattered in different places. To grasp a pipeline, one would need to match input/output dataset names and scroll up and down to collect the nodes in their head. Whereas with the proposed pipeline builder, everything is in one block and one can understand a pipeline with just a glance.

Originally posted by @mle-els in #2726 (comment)

@astrojuanlu
Copy link
Member

I think it's safe to say that the current approach will not go away for the time being, so the question would be whether to maintain this as an alternative, recommended approach. And from that perspective, it looks like a lot of documentation and maintainability effort, plus the potential confusion of users having to pick between the two approaches, all of this for little gain. There have been some very valid concerns about this approach, most importantly that it blends functions with nodes and that hampers reusability of said functions by coupling them to Kedro.

I'm closing this issue, if you disagree and think this deserves further discussion feel free to leave a comment.

@astrojuanlu astrojuanlu closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants