Nested parameters update via command line / config file #750

noklam · 2021-04-19T03:18:26Z

Description

Currently, when passing a parameter to override configurations like
kedro run --params key:value, it only supports top-level key but not nested dictionary. This is not flexible for experiments.

Similarly, kedro run --config=config.yml, overwriting via config file suffers from the same top-level key only problem.

Ideally.

# paramters.yml
group1:
   key1: 1
   key2: 2
   key3: 3

Ideally, this should be the expected result

# paramters.yml
group1:
   key1: 1
   key2: 2
   key3: 4

Current behavior

# paramters.yml
group1:
   key3: 4

Related #605

Context

A lot of parameters are organized in groups instead of being a top-level key.

Possible Implementation

Maybe a simple nest dict update is enough, I am not sure

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

The text was updated successfully, but these errors were encountered:

antonymilne · 2021-04-19T09:19:34Z

Hello @noklam, I totally understand the need for this and agree that there should be a better mechanism for overwriting a nested parameter. But I'm not sure how easy it is to resolve just this without some more holistic solution that reworks how configuration is handled more generally, and as per @lorenabalan's comment in #605 there's a bit of a conversation ongoing about this (N.B. @datajoely).

If we allow kedro run --params key.subkey:value to overwrite a nested parameter then it immediately raises the question of why we don't allow this sort of nested overwrite in configuration environments as you wanted to do in #605. So while it sounds like a very useful feature to add, it leads to inconsistency between the CLI and the parameters.yml approaches to specifying configuration. In an ideal world you would be able to do this sort of thing from both CLI and parameters.yml, but that is part of the larger conversation I think (and in addition to overwriting keys should also consider things like deleting and adding keys).

This nested overwriting behaviour is already possible through use of TemplatedConfigLoader and globals. I know that's a bit hacky and not a nice solution, but at least it is possible to currently do this. So my opinion would tend to be: very nice idea, I 100% agree we need a better solution than we currently have, but I think it needs to come as part of the complete solution rather than just implementing part of it that only works using the CLI.

noklam · 2021-04-19T15:02:24Z

@AntonyMilneQB Thank you for your prompt response. I agree with what you said, it leads to inconsistent. but I really hope kedro team may consider adding this. TemplatedConfigLoader only works in very limited context.

Here is one thing that I want to do. I want to have a dynamic pipeline (which isn't easy to do in kedro). So instead of creating a pipeline, I want to

I have a machine learning model pipeline, which takes t + delta to t + 4 + delta weeks as training data, then give prediction of next week. Let's say I want to do a rolling prediction for every week.

Train: t + d to t+4+d, Predict: t + 4+ d week, where d =1
Train: t + d to t+4+d,  Predict: t + 4+ d  week, where d = 2
Train: t + d to t+4+d, Predict: t + 4+ d  week, where d = 3
Train: t + d to t+4+d, Predict:t + 4+ d  week, where d = 4
Train: t + d to t+4+d, Predict: t + 4+ d  week, where d = 5

As a workaround to avoid re-writing the pipeline for dynamic pipeline generation. I can do this instead.

kedro run --params d:1
kedro run --params d:2
kedro run --params d:3
kedro run --params d:4
kedro run --params d:5

I can just paste this in a terminal and it will run the 5 pipelines for me. Now, this only works for top-level keys, so I have to move my sub-key to become a top-level key to make this hack works. This kind of "dynamic" pipeline generation is very common for machine learning tasks. At first, I try to re-write the pipeline to generate pipeline on the fly. Over time, I found doing this CLI style is easier to understand and maintain.

I would really love to hear Kedro team's opinion about what's the best/common way of handling these kinds of situations with Kedro. So far all solution I have found online for the dynamic pipeline is fragile, and easily break compatibility between different Kedro's version, and it makes the code unreadable and less reusable too.

antonymilne · 2021-04-19T16:44:10Z

The "official" kedro viewpoint is basically that pipelines should be static (see #650). At the moment I think there are a couple of options, but I 100% do sympathise with you here and think we should have a better solution for this sort of thing. Interested to hear what other people think, but my suggestions for now would be as follows.

Modular pipelines (docs)

Generate each pipeline with a different namespace. In pipeline_registry.py, something like this:

def register_pipelines() -> Dict[str, Pipeline]:
    base_pipeline = create_pipeline()
    pipelines = {}
    for delta in range(1, 4):
        pipelines[f"d_{delta}"] = pipeline(
            base_pipeline,
            parameters={"params:key.subkey.subsubkey_0": f"params:key.subkey:subsubkey_{delta}"},
            namespace=f"d_{delta}"
        )
    return pipelines

And then in parameters.yml:

key:
    subkey:
        subsubkey_0: 0
        subsubkey_1: 1
        subsubkey_2: 2
        subsubkey_3: 3

And then do kedro run --pipeline=d_1, for example. This is kind of hacky I know. But it is quite flexible as you can add together pipelines in your pipeline_registry to define, e.g. a pipeline that runs both delta=1 and delta=2. It also means that you can overwrite an arbitrarily nested parameter, so long as you refer to it directly as a node input.

Custom config loader

Depending on exactly what you want to do, this is maybe the cleanest solution. It's kind of a more "proper" implementation of the TemplatedConfigLoader workaround. Write your own ConfigLoader that merges parameter dictionaries rather than overwriting top level keys, and then run using a different environment for each delta. Note this new ConfigLoader potentially creates its own problems though (e.g. you can't delete keys without introducing a new syntax for that). @SteveLerQB wrote something for this on a client project and it worked pretty well I think.

To get this working from the CLI as well as parameters.yml across different environments it looks like just making a new ConfigLoader wouldn't be enough, and you'd additionally need to overwrite KedroContext.params as you did in your PR. Actually it's worth noting that altering KedroContext is something that you can easily do, since you can set your own context class using CONTEXT_CLASS in settings.py. So you could actually achieve what you're doing in #751 that way already, without needing a new ConfigLoader.

noklam · 2021-04-20T08:02:41Z

@AntonyMilneQB Thanks, I like this solution better. Just one thing.

Is there a way to define this 1 and 4 in paramters.yml and pass it to the pipeline instead of hard-coding it inside the code?
for delta in range(1, 4):

antonymilne · 2021-04-20T08:39:00Z

I think the basic answer is no, sorry. To do this you would need to access the catalog, which is currently only possible through the context. Registration of pipelines (through configure_project) happens before this, so won't have the current session or context available yet. In the future it should be possible to load the catalog without needing a context (as you can now do with from kedro.framework.project import pipelines) but not yet. Even when it becomes possible, I don't think it will be a generally recommended workflow to use the catalog when registering pipelines though.

For the time being I think you'd need to either manually load the parameters you want from parameters.yml inside register_pipelines or inject them some other way (e.g. os.getenv).

noklam · 2021-04-20T08:57:42Z

In older Kedro's version `def create_pipeline(**kwargs):` actually takes a keyword argument. In later versions, it is removed. Can you elaborate on why it is not recommended? So you think these "meta-parameters" about the pipeline creation should be kept in separate files?

…

On Tue, 20 Apr 2021 at 16:39, Antony Milne ***@***.***> wrote: I think the basic answer is no, sorry. To do this you would need to access the catalog, which is currently only possible through the context. Registration of pipelines (through configure_project) happens before this, so won't have the current session or context available yet. In the future it should be possible to load the catalog without needing a context (as you can now do with from kedro.framework.project import pipelines) but not yet. Even when it becomes possible, I don't think it will be a generally recommended workflow to use the catalog when registering pipelines though. For the time being I think you'd need to either manually load the parameters you want from parameters.yml inside register_pipelines or inject them some other way (e.g. os.getenv). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#750 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AELAWLYLWLZBHFF4C7FY2K3TJU4TPANCNFSM43E6H3MQ> .

antonymilne · 2021-04-21T09:21:17Z

In older Kedro's version def create_pipeline(**kwargs): actually takes a keyword argument. In later versions, it is removed.

As far as I can tell this still takes **kwargs. What might have changed is that the kedro new template no longer produces any pipelines, since with the addition of modular pipelines these are now made by kedro pipeline create.

Can you elaborate on why it is not recommended? So you think these "meta-parameters" about the pipeline creation should be kept in separate files?

I think Ivan's post here gives quite a nice explanation of it. Basically it boils down to "pipelines are static and do not change after you deploy your code".

With that said, my personal opinion is actually that sometimes dynamic pipeline creation makes sense. In these cases I think passing arguments into create_pipeline is a good way to do it. Combined with the modular pipelines, this is quite powerful and flexible and you should be able to achieve whatever you need once you've got the the "meta-parameters" in the right place.

As for where the meta-parameters should live: I do think that, even if it were possible to put them in parameters.yml, this isn't really the right place for them. parameters from parameters.yml are treated as catalog entries and can be used as inputs to nodes. A parameter that describes a pipeline seems quite different from this as it's not an input to a node; it's an input to the pipeline creation mechanism.

So I would recommend using a different place for defining meta-parameters, either in meta_parameters.yml or through an environment variable. If you still want to use the same conf system so that a conf/local/meta_parameters.yml can overwrite conf/base/meta_parameters.yml, you should still be able to use ConfigLoader to achieve this, just it will be outside a kedro context. If you don't need that sort of inheritance then it's probably easiest just to load it yourself without worrying about the whole kedro config system.

lorenabalan · 2021-04-23T08:59:47Z

I'll go ahead and close this as answered for now, but we made a note of it in the wider configuration workstream. Thank you for sharing!

noklam · 2022-08-30T09:36:48Z

#927 is an equivalent PR that get merged

noklam added the Issue: Feature Request New feature or improvement to existing feature label Apr 19, 2021

noklam mentioned this issue Apr 19, 2021

[Draft Pull Request] - Feature/nest update parameters #751

Closed

6 tasks

lorenabalan closed this as completed Apr 23, 2021

noklam mentioned this issue Jun 30, 2022

Assess OmegaConf as replacement for anyconfig #1657

Closed

noklam mentioned this issue Oct 24, 2022

YAML-based configuration pipeline - Good or Bad #1963

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested parameters update via command line / config file #750

Nested parameters update via command line / config file #750

noklam commented Apr 19, 2021

antonymilne commented Apr 19, 2021

noklam commented Apr 19, 2021

antonymilne commented Apr 19, 2021

noklam commented Apr 20, 2021

antonymilne commented Apr 20, 2021

noklam commented Apr 20, 2021 via email

antonymilne commented Apr 21, 2021 •

edited

Loading

lorenabalan commented Apr 23, 2021

noklam commented Aug 30, 2022

Nested parameters update via command line / config file #750

Nested parameters update via command line / config file #750

Comments

noklam commented Apr 19, 2021

Description

Context

Possible Implementation

Possible Alternatives

antonymilne commented Apr 19, 2021

noklam commented Apr 19, 2021

antonymilne commented Apr 19, 2021

Modular pipelines (docs)

Custom config loader

noklam commented Apr 20, 2021

antonymilne commented Apr 20, 2021

noklam commented Apr 20, 2021 via email

antonymilne commented Apr 21, 2021 • edited Loading

lorenabalan commented Apr 23, 2021

noklam commented Aug 30, 2022

antonymilne commented Apr 21, 2021 •

edited

Loading