Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kedro run --namespace __default__ and kedro run --namespace a,b #3056

Open
noklam opened this issue Sep 21, 2023 · 2 comments
Open

Add kedro run --namespace __default__ and kedro run --namespace a,b #3056

noklam opened this issue Sep 21, 2023 · 2 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@noklam
Copy link
Contributor

noklam commented Sep 21, 2023

Description

See full discussion here.

I have data_processing pipeline from spaceflights starter = no namespaces
I have multiple, modular data_science pipelines = namespaced
I want to run data_processing (w/o ns) + one of the data_science (with single ns) (edited)

User @marrrcin ask for two things:

  1. To run pipeline without namespace.
  2. To specify multiple namespace

Context

Kedro has the ability to run a specific namespace, but not the opposite. If you want to run only the pipeline without namespace, you have to fiddle with pipeline_registry.py and it's not as flexible.

In addition, one would like to specify multiple namespaces. Consider a "base" pipeline without namespace + many independent namespace pipeline (models pipeline)
image

The user want to do:

  • kedro run --namespace without_namespace & namespace_a
  • kedro run --namespace without_namespace & namespace_b
  • kedro run --namespace without_namespace & namespace_c

Or different combination of them. Even the pipeline could be independent, it's beneficial to run them in a single kedro run because user can leverage ParallelRunner to speed up.

Possible Implementation

Possible Alternatives

@noklam noklam added the Issue: Feature Request New feature or improvement to existing feature label Sep 21, 2023
@noklam noklam changed the title kedro run --namespace __default__ and kedro run --namespace a,b Add kedro run --namespace __default__ and kedro run --namespace a,b Sep 21, 2023
@noklam
Copy link
Contributor Author

noklam commented Sep 21, 2023

One possible solution:

def only_nodes_with_namespace(self, node_namespace: str) -> Pipeline:
"""Creates a new ``Pipeline`` containing only nodes with the specified
namespace.
Args:
node_namespace: One node namespace.
Raises:
ValueError: When pipeline contains no nodes with the specified namespace.
Returns:
A new ``Pipeline`` containing nodes with the specified namespace.
"""
nodes = [
n
for n in self.nodes
if n.namespace and n.namespace.startswith(node_namespace)
]
if not nodes:
raise ValueError(
f"Pipeline does not contain nodes with namespace '{node_namespace}'"
)
return Pipeline(nodes)

We can assign a default namespace like __default__ (similar to pipeline)
Pro:

  • Easy to implement and not much change for namespace

Con:

  • It's hacky and kedro need to treat __default__ specifially
  • Downstream application such as kedro-viz need to handle this too
  • Not consistent with the namespace pattern because in catalog & parameters you will have entries without the prefix.

This is a bad implementation but

@marrrcin
Copy link
Contributor

before/after_pipeline_filtered would also do the job here 🤪 #3000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: No status
Development

No branches or pull requests

2 participants