-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested parameters update via command line / config file #750
Comments
Hello @noklam, I totally understand the need for this and agree that there should be a better mechanism for overwriting a nested parameter. But I'm not sure how easy it is to resolve just this without some more holistic solution that reworks how configuration is handled more generally, and as per @lorenabalan's comment in #605 there's a bit of a conversation ongoing about this (N.B. @datajoely). If we allow This nested overwriting behaviour is already possible through use of |
@AntonyMilneQB Thank you for your prompt response. I agree with what you said, it leads to inconsistent. but I really hope kedro team may consider adding this. Here is one thing that I want to do. I want to have a dynamic pipeline (which isn't easy to do in kedro). So instead of creating a pipeline, I want to I have a machine learning model pipeline, which takes t + delta to t + 4 + delta weeks as training data, then give prediction of next week. Let's say I want to do a rolling prediction for every week.
As a workaround to avoid re-writing the pipeline for dynamic pipeline generation. I can do this instead.
I can just paste this in a terminal and it will run the 5 pipelines for me. Now, this only works for top-level keys, so I have to move my sub-key to become a top-level key to make this hack works. This kind of "dynamic" pipeline generation is very common for machine learning tasks. At first, I try to re-write the pipeline to generate pipeline on the fly. Over time, I found doing this CLI style is easier to understand and maintain. I would really love to hear Kedro team's opinion about what's the best/common way of handling these kinds of situations with Kedro. So far all solution I have found online for the dynamic pipeline is fragile, and easily break compatibility between different Kedro's version, and it makes the code unreadable and less reusable too. |
The "official" kedro viewpoint is basically that pipelines should be static (see #650). At the moment I think there are a couple of options, but I 100% do sympathise with you here and think we should have a better solution for this sort of thing. Interested to hear what other people think, but my suggestions for now would be as follows. Modular pipelines (docs)Generate each pipeline with a different namespace. In pipeline_registry.py, something like this:
And then in parameters.yml:
And then do Custom config loaderDepending on exactly what you want to do, this is maybe the cleanest solution. It's kind of a more "proper" implementation of the TemplatedConfigLoader workaround. Write your own ConfigLoader that merges parameter dictionaries rather than overwriting top level keys, and then run using a different environment for each To get this working from the CLI as well as parameters.yml across different environments it looks like just making a new ConfigLoader wouldn't be enough, and you'd additionally need to overwrite |
@AntonyMilneQB Thanks, I like this solution better. Just one thing. Is there a way to define this |
I think the basic answer is no, sorry. To do this you would need to access the catalog, which is currently only possible through the context. Registration of pipelines (through For the time being I think you'd need to either manually load the parameters you want from |
In older Kedro's version `def create_pipeline(**kwargs):` actually takes a
keyword argument. In later versions, it is removed.
Can you elaborate on why it is not recommended? So you think these
"meta-parameters" about the pipeline creation should be kept in
separate files?
…On Tue, 20 Apr 2021 at 16:39, Antony Milne ***@***.***> wrote:
I think the basic answer is no, sorry. To do this you would need to access
the catalog, which is currently only possible through the context.
Registration of pipelines (through configure_project) happens before
this, so won't have the current session or context available yet. In the
future it should be possible to load the catalog without needing a context
(as you can now do with from kedro.framework.project import pipelines)
but not yet. Even when it becomes possible, I don't think it will be a
generally recommended workflow to use the catalog when registering
pipelines though.
For the time being I think you'd need to either manually load the
parameters you want from parameters.yml inside register_pipelines or
inject them some other way (e.g. os.getenv).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#750 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AELAWLYLWLZBHFF4C7FY2K3TJU4TPANCNFSM43E6H3MQ>
.
|
As far as I can tell this still takes
I think Ivan's post here gives quite a nice explanation of it. Basically it boils down to "pipelines are static and do not change after you deploy your code". With that said, my personal opinion is actually that sometimes dynamic pipeline creation makes sense. In these cases I think passing arguments into As for where the meta-parameters should live: I do think that, even if it were possible to put them in parameters.yml, this isn't really the right place for them. parameters from parameters.yml are treated as catalog entries and can be used as inputs to nodes. A parameter that describes a pipeline seems quite different from this as it's not an input to a node; it's an input to the pipeline creation mechanism. So I would recommend using a different place for defining meta-parameters, either in |
I'll go ahead and close this as answered for now, but we made a note of it in the wider configuration workstream. Thank you for sharing! |
#927 is an equivalent PR that get merged |
Description
Currently, when passing a parameter to override configurations like
kedro run --params key:value
, it only supports top-level key but not nested dictionary. This is not flexible for experiments.Similarly,
kedro run --config=config.yml
, overwriting via config file suffers from the same top-level key only problem.Ideally.
Ideally, this should be the expected result
Current behavior
Related #605
Context
A lot of parameters are organized in groups instead of being a top-level key.
Possible Implementation
Maybe a simple nest dict update is enough, I am not sure
Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered.
The text was updated successfully, but these errors were encountered: