Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Multirun does not work with differing parameter sets groups #1052

Closed
2 tasks done
goens opened this issue Oct 11, 2020 · 6 comments
Closed
2 tasks done

[Bug] Multirun does not work with differing parameter sets groups #1052

goens opened this issue Oct 11, 2020 · 6 comments
Labels
enhancement Enhanvement request
Milestone

Comments

@goens
Copy link

goens commented Oct 11, 2020

馃悰 Bug

Description

When running multirun with two configurations, it's impossible to change a variable that is not in one of the configurations, even if it is in the other one. You might not classify this as a bug, but I believe it is, since it renders it impossible to use multirun in such a case. Perhaps it makes more sense looking at the example below.

Checklist

  • I checked on the latest version of Hydra
  • I created a minimal repro

To reproduce

** Minimal Code/Config snippet to reproduce **
Imagine you have two database systems (like in the documentation), I'll call them, very creatively, one and two.
You could have a basic setup as follows:

  • main.py
  • config.yaml
  • db/one.yaml
  • db/two.yaml

This is the content of the files in such a minimal config:

main.py:

import hydra
@hydra.main(config_name='config')
def main(cfg):
    print(cfg['db'])
main()

config.yaml:

# @package _global_
defaults:
  - db: one

db/one.yaml:

#@package _group_
shared_feature : true

db/two.yaml:

#@package _group_
shared_feature: false
fancy_exclusive_feature : false

Say you have a shared feature, shared_feature, and you want to run your program with both databases by iterating that shared feature. You can run something like:

$> python main.py db=one,two db.shared_feature=true,false -m                                                                                                                                                   
[2020-10-11 21:31:17,032][HYDRA] Launching 4 jobs locally
[2020-10-11 21:31:17,032][HYDRA] 	#0 : db=one db.shared_feature=True
{'shared_feature': True}
[2020-10-11 21:31:17,224][HYDRA] 	#1 : db=one db.shared_feature=False
{'shared_feature': False}
[2020-10-11 21:31:17,409][HYDRA] 	#2 : db=two db.shared_feature=True
{'shared_feature': True, 'fancy_exclusive_feature': False}
[2020-10-11 21:31:17,597][HYDRA] 	#3 : db=two db.shared_feature=False
{'shared_feature': False, 'fancy_exclusive_feature': False}

This works perfectly fine. Now assume I want to run one of the two databases, two, with a fancy exclusive feature which the other one does not have. I have two options:

python main.py db=one,two +db.fancy_exclusive_feature=true,false -m

which could potentially add it two the first one (and hopefully not break anything, just unnecessarily run it twice with it). Or the other option:

python main.py db=two,one db.fancy_exclusive_feature=true,false -m

Where it would just change it for two and ideally make three runs, one with db=one and two with db=two and db.fancy_exclusive_feature=true and false.

However, neither option works. Here are the corresponding error messages:

** Stack trace/error message **

$> python main.py db=one,two db.fancy_exclusive_feature=true,false -m                                                                                                                                          Could not override 'db.fancy_exclusive_feature'.
To append to your config use +db.fancy_exclusive_feature=True
Key 'fancy_exclusive_feature' is not in struct
	full_key: db.fancy_exclusive_feature
	reference_type=Optional[Dict[Union[str, Enum], Any]]
	object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

and in the other case:

$> python main.py db=one,two +db.fancy_exclusive_feature=true,false -m                                                                                                                                         
Could not append to config. An item is already at 'db.fancy_exclusive_feature'.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Obviously Hydra checks that the added option db.fancy_exclusive_feature works with every case of db in the full "product" of configuration schemes, so in both ways one case breaks and multirun breaks.

Expected Behavior

Ideally, if this item exists in one option and not in the rest, hydra would just add the options conditionally to where it makes sense, producing three configurations when running:

python main.py db=two,one db.fancy_exclusive_feature=true,false -m

It is also thinkable that the other option, when adding the feature additionally, would produce four configurations:

python main.py db=two,one +db.fancy_exclusive_feature=true,false -m

System information

  • Hydra Version : 1.0.3
  • Python version : 3.8.6
  • Virtual environment type and version : virtualenv 20.0.29+ds
  • Operating system : debian sid (unstable)
@goens goens added the bug Something isn't working label Oct 11, 2020
@omry
Copy link
Collaborator

omry commented Oct 11, 2020

Thanks for the report, this is more of a design issue than a bug.
You can try overriding with a dictionary, which behaves differently (it's merge and not assignment, and as such the semantics allow that if you use +).
You can learn more about dictionaries as overrides here.

Example (here timeout originally exists only in postgresql).

$ python my_app.py  -m '+db={timeout:10}' db=mysql,postgresql
[2020-10-11 13:00:47,752][HYDRA] Launching 2 jobs locally
[2020-10-11 13:00:47,752][HYDRA]        #0 : +db={timeout:10} db=mysql
db:
  driver: mysql
  user: omry
  pass: secret
  timeout: 10

[2020-10-11 13:00:47,860][HYDRA]        #1 : +db={timeout:10} db=postgresql
db:
  driver: postgresql
  user: postgre_user
  pass: drowssap
  timeout: 10

I will keep this open to consider changing the behavior of + (or extending it) for 1.1,

@omry omry added enhancement Enhanvement request and removed bug Something isn't working labels Oct 11, 2020
@goens
Copy link
Author

goens commented Oct 11, 2020

that makes sense, thanks!

@omry omry added this to the 1.1.0 milestone Oct 11, 2020
@omry
Copy link
Collaborator

omry commented Oct 11, 2020

Great!
Dictionary and list support in overrides are one of the new things in 1.0. Spend some time reading the override grammar documentation, it has many new and awesome features (also for multirun: like glob, shuffle and more).

@omry
Copy link
Collaborator

omry commented Apr 1, 2021

#1440 add support for force-adding a config variable (++foo.bar=10).
I think it provides a reasonable solution to this. Please try and reopen if you run into issues.

@omry omry closed this as completed Apr 1, 2021
@IlyasMoutawwakil
Copy link

hi @goens, did you find a way to only produce 3 configurations ?

@omry
Copy link
Collaborator

omry commented May 2, 2023

You can define 3 experiment configs and sweep over them.

https://hydra.cc/docs/patterns/configuring_experiments/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhanvement request
Projects
None yet
Development

No branches or pull requests

4 participants
@omry @goens @IlyasMoutawwakil and others