Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespaced Nodes #3679

Open
lordsoffallen opened this issue Mar 4, 2024 · 7 comments
Open

Namespaced Nodes #3679

lordsoffallen opened this issue Mar 4, 2024 · 7 comments

Comments

@lordsoffallen
Copy link
Contributor

lordsoffallen commented Mar 4, 2024

Description

I was under the impression that when I namespace a node, then I was able to run it via kedro run --nodes mynode --namespace ns. However kedro always looks for ns.mynode so passing just the node name is not enough. Shouldn't the behavior be other way around? Me as a user, I don't have to know about kedro's special syntax for namespacing so passing that should take care of the rest, no?

Please correct the issue type if it's not considered a bug.

@datajoely
Copy link
Contributor

This is a good push!

I think the feature has been designed with the mindset:

'I want to run all nodes within this namespace'

The workflow you described is more of a combinatorial filter (boolean AND) and I think makes a lot of sense, we should do more to support your workflow

@lordsoffallen
Copy link
Contributor Author

It would be nice to support this case as this would reduce some duplication. I use namespace to connect different data sources to same functions. I can give them different names but it looks way better via namespacing in terms of name consistency

@noklam
Copy link
Contributor

noklam commented Mar 8, 2024

@datajoely is correct that it's more like AND, there are more use cases that are only possible with Pipeline API but not CLI, this is one of the example. On the other hand, seems like very little people structure pipelines in a way that requires complex filtering (I asked about this in Slack but very few responses). From my observation people usually use one of the filter, rarely using many at the same time. i.e. kedro run -p <pipeline> or kedro run --tags <tag1,tag2>

Can you elaborate on how your pipelines/nodes are structured? I cannot wrap it in my head yet. I think there are value to enable this but it shouldn't disrupt the existing workflow as it seems to work for most people.

@lordsoffallen
Copy link
Contributor Author

When I think about the namespace, I remember k8s generally. Basically same nodes/pipelines can be stored under different namespace. This helps running flow where user can use the same function but input/output different datasets.

Imagine you have a function which takes and filters some data on a column. Same column exists in other ones so you wanna use the same function for another dataset this time. I can def do it by giving nodes a different name like:

node(myfunc, input='data1', output='out1', name='myfunc1')
node(myfunc, input='data2', output='out2', name='myfunc2')

It's just that I figured namespace would provide this instead of me doing a different naming. So I expected this instead:

node(myfunc, input='data1', output='out1', name='myfunc', namespace='1')
node(myfunc, input='data2', output='out2', name='myfunc', namespace='2')

This way I could just filter other nodes by using the namespace.

Regarding the -p pipeline filter, I mainly use that to group certain nodes together. For instance, data processing nodes are moved to data pipeline and model logic is in the model pipeline which helps me run all with a single argument.

@noklam
Copy link
Contributor

noklam commented Apr 4, 2024

@lordsoffallen I think I understand where is this coming from. I am not sure if this is clearly better.

node(myfunc, input='data1', output='out1', name='myfunc', namespace='a')
node(myfunc, input='data2', output='out2', name='other_function', namespace='b')

How should the CLI work in an unambiguous way?

kedro run --nodes a.myfunc,b.other_function
kedro run --nodes ??? --namespace a,b

One may argue it's clearly bad to apply multiple namespace here. But then it's perfectly reasonable to select multiple namespace #3056

kedro run --namespace a,b (This is actually not supported yet, I don't know why)

@noklam noklam added this to the Something about namespace milestone Apr 4, 2024
@lordsoffallen
Copy link
Contributor Author

@noklam Your example involves different function names, that case wouldnt require a new namespace imo but if a user did that, following should work

kedro run --nodes myfunc --namespace a
kedro run --nodes other_func --namespace b
OR run both
kedro run --namespace a,b

For me namespacing should be similar to how it is in k8s if that makes sense.

Ideally same function with different input/output would be super useful to support.

@noklam
Copy link
Contributor

noklam commented Apr 10, 2024

@lordsoffallen would you mind pointing me to some k8s resource that related to this? I am not very familiar with it.

I do think conceptually they are different things, the first example that you try to run it separately will require two runs. We cannot assume two namespace is completely separate so the second example you gives is only correct for some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants