Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to implement CLI shortcuts for nested configs? #2022

Closed
jbaczek opened this issue Feb 9, 2022 · 13 comments
Closed

Question: How to implement CLI shortcuts for nested configs? #2022

jbaczek opened this issue Feb 9, 2022 · 13 comments

Comments

@jbaczek
Copy link
Contributor

jbaczek commented Feb 9, 2022

So far I have resolved tons of config management issues with OmegaConf's custom resolvers. It isn't the neatest solution but from the user perspective it gets the job done. But there is one thing that I cannot do: I want to create shortcut(alias) in the top level config that will be resolved to a full path. Example:

trainer:
    criterion:
        _target_: [...]
        arg: 1337
    config:
        batch_size: 32

Typing python hydra_app.py trainer.config.batch_size=32 trainer/criterion=L1 is really cumbersome. I'd like to type this interchangeably: python hydra_app.py batch_size=32 criterion=L1. Is there any possibility to do this?

@pixelb
Copy link
Contributor

pixelb commented Feb 9, 2022

I'm not sure that could be achieved from a general point of view.
One might have a search mechanism, but that would be very brittle if the config was changed, thus causing other items to be matched.

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 10, 2022

At least for batch_size, wouldn't interpolation to a top-level key do the trick?

trainer:
    criterion:
        _target_: [...]
        arg: 1337
    config:
        batch_size: ${batch_size}
batch_size: 32

This would allow python hydra_app.py batch_size=64.

@jbaczek
Copy link
Contributor Author

jbaczek commented Feb 10, 2022

This solution would leave batch_size in the root level config. I want to keep my final config tidy (I use it for recursive instantiation).

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 10, 2022

Got it. In that case, the only workaround I can think of are:

  • deleting the "aliases" before instantiating, or
  • using the root level as a "scratch space", and having the instantiated config be nested one level deeper.

deleting the top-level "aliases" before instantiating:

aliases: List[str] = ["batch_size"]

@hydra.main(...)
def app(cfg):
    OmegaConf.resolve(cfg)
    for alias in aliases:
        if alias in cfg:
            del cfg[alias]
    instantiate(cfg)

Edit: Added a call to OmegaConf.resolve before deleting deleting aliases

using the root level as a "scratch space":

instantiate_me:
  trainer:
      criterion:
          _target_: [...]
          arg: 1337
      config:
          batch_size: ${batch_size}
# scratch:
batch_size: 32
@hydra.main(...)
def app(cfg):
    instantiate(cfg.instantiate_me)

@jbaczek
Copy link
Contributor Author

jbaczek commented Feb 11, 2022

This will work with recursive instantiation but in turn won't work with duplicate config detection. In experimet_dir/.hydra/config.yaml we have unresolved config. The config created with python hydra_app.py batch_size=64 would be different from python hydra_app.py trainer.config.batch_size=64. (in the first case it will contain scratch space and interpolation, in the latter the value of batch_size, would be different from trainer.config.batch_size). I use these configs to detect whether my experiment has been already performed or killed, so I can avoid doing unnecessary computation. I need a solution that will be resolved at config construction.

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 12, 2022

I use these configs to detect whether my experiment has been already performed or killed, so I can avoid doing unnecessary computation. I need a solution that will be resolved at config construction.

Interesting! Would you be willing to share the code that you use for comparing an experiment with previous experiments? I'm wondering whether a strategically-placed call to OmegaConf.resolve (in the comparison logic) would smooth over the differences between python hydra_app.py batch_size=64 and python hydra_app.py trainer.config.batch_size=64. I'm also curious to see the method you're using to re-load the config from a previous run (as this is an area where Hydra might be improved).

@jbaczek
Copy link
Contributor Author

jbaczek commented Feb 14, 2022

My approach is really simple and not very pretty :) First I set root of a search to the outputs directory. Then I use os.walk to find all hydra configs in the tree. I don't need any resolution in my case because all interpolations are computed from the other fields and if the formulas are the same then it's value will be the same as well. I leverage the fact that configs have a string representation and compare them this way.

def detect_duplicated_run():                                                                                          
    """                                                                                                               
    Returns list of paths of the runs with the same config as provided                                                
    """                                                                                                               
    # This is meant to be called in a trainer class, which means that this doesn't have access to the top level config
    current_config = OmegaConf.load('.hydra/config.yaml')                                                             
    # I know it's bad, but location of output dir of run and multirun aren't held in the same location in the config                                       
    rel = os.path.relpath(os.getcwd(), get_original_cwd())                                                            
    rel = next(x for x in rel.split(os.path.sep))                                                                     
    result_dir = os.path.join(get_original_cwd(), rel)                                                                
                                                                                                                      
    duplicated = []                                                                                                   
    for p, s, f in os.walk(result_dir):                                                                               
        if '.hydra' in s:                                                                                             
            c = OmegaConf.load(os.path.join(p, '.hydra/config.yaml'))                                                 
            if hash(c) == hash(current_config):                                                                       
                duplicated.append(p)                                                                                  
    # Don't take into account runs that ended before any checkpoint had been saved                                    
    # or current run (at this point hydra's config has already been saved)                                            
    duplicated = [p for p in duplicated if os.path.exists(os.path.join(p,'last_checkpoint.zip'))]                     
                                                                                                                      
    return duplicated                                                                                                 

This gives me potential candidates for duplicated experiments. Then I do artifacts checks.
And then I check the appropriate log file to know from which point I should resume my experiment.

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 14, 2022

Cool, thanks for sharing!

I'm wondering whether a strategically-placed call to OmegaConf.resolve (in the comparison logic) would smooth over the differences ...

My idea was to modify def detect_duplicated_run as below:

def detect_duplicated_run():                                                                                          
    current_config = OmegaConf.load('.hydra/config.yaml')                                                             
    OmegaConf.resolve(curent_config)
    for alias in aliases:
        current_config.pop(alias, None)
    ...
        if '.hydra' in s:                                                                                             
            OmegaConf.resolve(c)
            for alias in aliases:
                c.pop(alias, None)
    ...

This has the drawback that you'd need the same list aliases to be compatible with both your current config and all of your past configs -- it doesn't solve the problem of experiment-specific aliases.

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 14, 2022

What if you use a top-level key in your config to store a list of experiment-specific alias keys?
You could then use the following logic in your main function:

@hydra.main(...)
def app(cfg):
    OmegaConf.resolve(cfg)
    for alias in cfg.pop("_aliases", []):
        if alias in cfg:
            del cfg[alias]
    instantiate(cfg)

And you could add something like this to your duplicate config detection function (before the call to hash):

def detect_duplicated_run():                                                                                          
    current_config = OmegaConf.load('.hydra/config.yaml')                                                             
    OmegaConf.resolve(curent_config)
    for alias in current_config.pop("_aliases", []):
        current_config.pop(alias, None)
    ...
        if '.hydra' in s:                                                                                             
            OmegaConf.resolve(c)
            for alias in c.pop("_aliases", []):
                c.pop(alias, None)
    ...

@Jasha10 Jasha10 closed this as completed Feb 14, 2022
@Jasha10 Jasha10 reopened this Feb 14, 2022
@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 14, 2022

Closed accidentally.

@jbaczek
Copy link
Contributor Author

jbaczek commented Feb 15, 2022

The idea of maintaining list of aliases in a root config is a really good one, I think. In my application I have couple different entry points which reuse config files (for training/inference/deployment/preprocessing). This way I have to add couple config specific python lines to each of them. This feels like it should be handled by hydra. Should I open a feature request?

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 15, 2022

I can't promise that we'd implement such a feature; it can be implemented on the user side or by a third party library, and implementing it on the Hydra side would involve committing to a new top-level reserved key (e.g. _aliases). That being said, if you're inclined to open a feature request, that would certainly be welcome, and would give us a chance to see if there is a lot of demand for such a feature among the community of Hydra users more broadly.

@jbaczek
Copy link
Contributor Author

jbaczek commented Feb 16, 2022

Moving discussion to the FR #2032

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants