Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multirun is inconsistently misinterpreting cmdline values for float as str, but only for specific (and arbitrary) range of floats [Bug] #999

Closed
2 tasks done
JacobARose opened this issue Sep 22, 2020 · 6 comments · Fixed by #1000
Labels
bug Something isn't working
Milestone

Comments

@JacobARose
Copy link

🐛 Bug

Description

This bug just cost me about a day of frustration looking deep into tensorflow's source code before I realized it disappeared immediately when I stopped using the '--multirun' cmdline flag.

Basically, it seems like Hydra is interpreting numerical values passed via cmdline as a different type depending on the numerical value, in a way that does not raise any explicit warnings and seems arbitrary enough that I can't imagine an actual reason for its inclusion. Thus, I believe it's a bug.

Tldr;

  1. --multirun is leading to inconsistent parsing of cmdline floating point overridden parameters that differs from when the user passes the same values without multirun.
  2. The parser seems to have a preference for interpreting any floats <=1e-5 and >=1e-9 as string type, regardless of any possible user specification via formatting.

Checklist

  • I checked on the latest version of Hydra
  • I created a minimal repro

To reproduce

** Minimal Code/Config snippet to reproduce **

demo.py

from omegaconf import DictConfig
from omegaconf import OmegaConf
import hydra
    
@hydra.main(config_path='configs', config_name='config')
def main(config):

    print('='*40)
    print(f'lr={config.model.lr}, type(config.model.lr)={type(config.model.lr)}')
    print(f'l2={config.model.regularization.l2}, type(config.model.regularization.l2)={type(config.model.regularization.l2)}')
    
    print('-'*30)
    for k,v in config.model.items():
        if type(v)==DictConfig:
            for k0,v0 in v.items():
                print(k0, v0, type(v0))
        else:
            print(k, v, type(v))
          
    print('='*40)
    
if __name__ == "__main__":
    main()

The directory containing demo.py also contains

configs/config.yaml

# @package _global_

model:
    lr: 1.0e-5
    lr_momentum: 0.0
    regularization:
        l2: 4.0e-10
    coefficients:
        c1: 0.01

When I call the script using the command:
python demo.py --multirun model.regularization.l2=1e-3 model.lr=1.0e-5 model.lr_momentum=1e-5,1e-6,1e-7,1e-8,1e-9,1e-10
I get the output:

** Stack trace/error message **

[2020-09-22 00:49:10,280][HYDRA] Launching 6 jobs locally
[2020-09-22 00:49:10,280][HYDRA]        #0 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-05
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-05 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:10,617][HYDRA]        #1 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-06
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-06 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:10,950][HYDRA]        #2 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-07
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-07 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:11,283][HYDRA]        #3 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-08
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-08 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:11,617][HYDRA]        #4 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-09
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-09 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:11,950][HYDRA]        #5 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-10
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-10 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================

Highlighting the important info from above:

  • lr_momentum is interpreted as a str for values 1e-5 through 1e-9, and for some reason is then interpreted as a float for 1e-10.

A second, contrasting example below:

(hydra) jacob@serrep3:~/Documents/github issues$ python demo.py --multirun model.regularization.l2=1e-3 model.lr_momentum=1.0,1e-1,1e-2,1e-3,1e-4
[2020-09-22 00:53:56,412][HYDRA] Launching 5 jobs locally
[2020-09-22 00:53:56,412][HYDRA]        #0 : model.regularization.l2=0.001 model.lr_momentum=1.0
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 1.0 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:56,753][HYDRA]        #1 : model.regularization.l2=0.001 model.lr_momentum=0.1
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.1 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:57,083][HYDRA]        #2 : model.regularization.l2=0.001 model.lr_momentum=0.01
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.01 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:57,410][HYDRA]        #3 : model.regularization.l2=0.001 model.lr_momentum=0.001
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.001 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:57,737][HYDRA]        #4 : model.regularization.l2=0.001 model.lr_momentum=0.0001
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.0001 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
  • lr_momentum is correctly interpreted as float for the range 1.0,1e-1,1e-2,1e-3,1e-4
  • lr is defined as 1.0e-5 in the config, and when no override is passed via cmdline the config value is interpreted correctly as float. However, if the same exact value formatted identically is passed via cmdline it's interpreted as a str!
  • wrapping each statement in single quotations does not change the behavior.
    e.g.
    'model.lr=1.0e-5'
    is the same as
    model.lr=1.0e-5
  • Similarly, formatting the same floats using decimal instead of scientific notation does not result in any different behavior.
  • Finally, passing any value < 1e-4 or >= 1e-9 is interpreted as str (e.g. 9.99e-5 is a str, but 9.99e-10 is still a float)

Expected Behavior

I'd expect the cmdline overrides to be interpreted in a consistent way independent of the numerical value. At the moment it appears as if Hydra simply expects the user to manually convert certain numerical values after they've been parsed for the config, and the specification of which values those are is nonexistent and highly arbitrary.

My expectation, based on having read through the docs, is that the user should expect all of the following to be interpreted as float:

model.lr=1e-5
'model.lr=1e-5'
model.lr=1e-5,1e-6 --multirun
'model.lr=1e-5,1e-6' --multirun

and expect all of the following to be interpreted as str:

model.lr="1e-5"
'model.lr="1e-5"'
model.lr="1e-5","1e-6" --multirun
'model.lr="1e-5","1e-6"' --multirun

System information

  • Hydra Version : 1.0.2
  • Python version : 3.7.9
  • Virtual environment type and version : conda 4.7.12
  • Operating system : Ubuntu Linux

Please let me know if you'd like any other info/find the cause and a way to fix that doesnt require me to write a whole config verification script (especially after I just typed all this out!)

Cheers,
Jacob

@JacobARose JacobARose added the bug Something isn't working label Sep 22, 2020
@omry
Copy link
Collaborator

omry commented Sep 22, 2020

Thanks for reporting, and sorry for the frustrating bug hunt.
At a glance, this looks like a bug.
A smaller (not book long) repro:

$ python examples/tutorials/basic/your_first_hydra_app/1_simple_cli/my_app.py +lr=1.0e-5
lr: 1.0e-05

$ python examples/tutorials/basic/your_first_hydra_app/1_simple_cli/my_app.py -m +lr=1.0e-5
[2020-09-21 23:09:56,378][HYDRA] Launching 1 jobs locally
[2020-09-21 23:09:56,378][HYDRA]        #0 : +lr=1e-05
lr: '1e-05'

@omry omry added this to the 1.0.3 milestone Sep 22, 2020
@omry
Copy link
Collaborator

omry commented Sep 22, 2020

I strongly recommend that you utilize Structured Configs. They would either tell you of a type error, or in this case actually convert the input to the correct type:

from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING

@dataclass
class Config:
    lr: float = MISSING


cs = ConfigStore.instance()
# Registering the Config class with the name 'config'.
cs.store(name="config", node=Config)


@hydra.main(config_name="config")
def my_app(cfg: Config) -> None:
    print(cfg)


if __name__ == "__main__":
    my_app()
$ python my_app.py -m +lr=1.0e-5
[2020-09-21 23:14:29,352][HYDRA] Launching 1 jobs locally
[2020-09-21 23:14:29,352][HYDRA]        #0 : +lr=1e-05
{'lr': 1e-05}
$ python my_app.py +lr=1.0e-5
{'lr': 1e-05}

@omry
Copy link
Collaborator

omry commented Sep 22, 2020

During multirun, the input you are providing is parsed, and the converted to strings that are being passed to the launcher, which in turn parses each value again.

e.g:
Your command line:

--multirun lr=1.0e-5,1.0e-6

Broken down into two floats:
1.0e-5 and 1.0e-6.
converted to strings:

lr=1.0e-05
lr=1.0e-06

Ooops, can you spot the difference?
Python like to convert the float 1.0e-5 to 1.0e-05.

There is a bug in the parser that causes it to not recognize it as a proper float which means it's interpreted as a string.
#1000 fixes it.

CC @odelalleau, OMG: You killed Kenny.

@odelalleau
Copy link
Collaborator

My condolences to Kenny's friends and family.

@omry
Copy link
Collaborator

omry commented Sep 23, 2020

@odelalleau can you make a note to update the upcoming OmegaConf grammar with it? it probably has the same issue.

@odelalleau
Copy link
Collaborator

@odelalleau can you make a note to update the upcoming OmegaConf grammar with it? it probably has the same issue.

Yup -- already done, thanks for checking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants