Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define modular pipelines with config file #713

Open
bpmeek opened this issue Jun 4, 2024 · 2 comments
Open

Define modular pipelines with config file #713

bpmeek opened this issue Jun 4, 2024 · 2 comments

Comments

@bpmeek
Copy link

bpmeek commented Jun 4, 2024

Description

As a Kedro user I have always wanted to be able to define modular pipelines in a config file.

Context

I believe that doing so will reduce the likelihood of a user inadvertently impacting a pipeline other than the one intended when making changes to pipeline_registry.

Possible Implementation

I created PR #3904 in the Kedro core repository and @datajoely mentioned it might make more sense to make this a plugin rather than part of Kedro Core, but I'm unsure which directory it would belong in.

Possible Alternatives

@datajoely mentioned a possible alternative here

@astrojuanlu
Copy link
Member

As a Kedro user I have always wanted to be able to define modular pipelines in a config file.

And you're not alone!

Your data team will go through this cycle, sorry I don’t make the rules

null — Christian Minich (@ChristianNolan@data-folks.masto.host) 4/25/2024, 5:41:48 PM

First of all, this doesn't have to start its life in kedro-org/kedro-plugins. I encourage you to create your plugin in your personal account, see for example https://github.com/astrojuanlu/kedro-init or https://github.com/noklam/kedro-softfail-runner. Happy to keep this issue open to help you make progress on that.

To create the package, we don't have a plugin template yet kedro-org/kedro#2685 but you can start with https://github.com/astrojuanlu/copier-pylib (shameless self-plug) and take it from there.

This is just one idea on how the Developer Experience could be:

$ kedro new ... && cd my_project && uv venv && source .venv/bin/activate  # assume (.venv) in all prompts
$ uv pip install -r requirements.txt
$ uv pip install kedro-yaml-pipelines  # Your plugin
$ kedro yaml-pipeline create data_processing  # Not particularly beautiful, open to ideas here
$ tree src/my_project/pipelines
src/my_project/pipelines/
├── __init__.py
├── data_processing
│   ├── __init__.py
│   ├── nodes.py
│   └── pipeline.yaml  # <------- The YAML definition!
$ # Or, alternatively, YAML pipelines are defined in a central location?
$ tree src/my_project
src/my_project/
├── __init__.py
├── __main__.py
├── hooks.py
├── pipeline_registry.py
├── pipelines
│   ├── __init__.py
│   ├── pipelines.yaml  # <---- All pipelines are defined here?
│   ├── data_processing.py  # <---- Maybe even no need for `nodes.py`?
$ # Make edits to nodes, YAML definition
$ kedro run
...
# Everything works as usual!

Now, the only blocker I see off the top of my head is the pipeline_registry.py. Maybe you could tell the user to slightly modify it as follows:

 from kedro.framework.project import find_pipelines
 from kedro.pipeline import Pipeline
 
+from kedro_yaml_pipelines.registry import find_pipelines as find_yaml_pipelines
+
 
 def register_pipelines() -> Dict[str, Pipeline]:
     """Register the project's pipelines.
@@ -12,5 +14,6 @@ def register_pipelines() -> Dict[str, Pipeline]:
         A mapping from pipeline names to ``Pipeline`` objects.
     """
     pipelines = find_pipelines()
+    pipelines += find_yaml_pipelines()
     pipelines["__default__"] = sum(pipelines.values())
     return pipelines

and hopefully this should be it.

Of course there are lots of variations for this. The key points are

  1. This is your plugin. So feel free to tailor the DX to your needs. Don't let us dictate how it should be - my suggestions above are just suggestions.
  2. We are happy to guide you on how to create such a plugin. Whether or not that becomes official is another story - but if we see enough traction, I think we should seriously consider it!

@astrojuanlu
Copy link
Member

astrojuanlu commented Jun 5, 2024

Notice that this is a departure from your original request - that pipelines.yml live under conf/. This is based on the opinion I stated in kedro-org/kedro#3904 (comment) - but again, your plugin, your rules :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants