multirun is inconsistently misinterpreting cmdline values for float as str, but only for specific (and arbitrary) range of floats [Bug] #999

JacobARose · 2020-09-22T05:37:01Z

🐛 Bug

Description

This bug just cost me about a day of frustration looking deep into tensorflow's source code before I realized it disappeared immediately when I stopped using the '--multirun' cmdline flag.

Basically, it seems like Hydra is interpreting numerical values passed via cmdline as a different type depending on the numerical value, in a way that does not raise any explicit warnings and seems arbitrary enough that I can't imagine an actual reason for its inclusion. Thus, I believe it's a bug.

Tldr;

--multirun is leading to inconsistent parsing of cmdline floating point overridden parameters that differs from when the user passes the same values without multirun.
The parser seems to have a preference for interpreting any floats <=1e-5 and >=1e-9 as string type, regardless of any possible user specification via formatting.

Checklist

I checked on the latest version of Hydra
I created a minimal repro

To reproduce

** Minimal Code/Config snippet to reproduce **

demo.py

from omegaconf import DictConfig
from omegaconf import OmegaConf
import hydra
    
@hydra.main(config_path='configs', config_name='config')
def main(config):

    print('='*40)
    print(f'lr={config.model.lr}, type(config.model.lr)={type(config.model.lr)}')
    print(f'l2={config.model.regularization.l2}, type(config.model.regularization.l2)={type(config.model.regularization.l2)}')
    
    print('-'*30)
    for k,v in config.model.items():
        if type(v)==DictConfig:
            for k0,v0 in v.items():
                print(k0, v0, type(v0))
        else:
            print(k, v, type(v))
          
    print('='*40)
    
if __name__ == "__main__":
    main()

The directory containing demo.py also contains

configs/config.yaml

# @package _global_

model:
    lr: 1.0e-5
    lr_momentum: 0.0
    regularization:
        l2: 4.0e-10
    coefficients:
        c1: 0.01

When I call the script using the command:
python demo.py --multirun model.regularization.l2=1e-3 model.lr=1.0e-5 model.lr_momentum=1e-5,1e-6,1e-7,1e-8,1e-9,1e-10
I get the output:

** Stack trace/error message **

[2020-09-22 00:49:10,280][HYDRA] Launching 6 jobs locally
[2020-09-22 00:49:10,280][HYDRA]        #0 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-05
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-05 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:10,617][HYDRA]        #1 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-06
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-06 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:10,950][HYDRA]        #2 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-07
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-07 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:11,283][HYDRA]        #3 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-08
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-08 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:11,617][HYDRA]        #4 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-09
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-09 <class 'str'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:49:11,950][HYDRA]        #5 : model.regularization.l2=0.001 model.lr=1e-05 model.lr_momentum=1e-10
========================================
lr=1e-05, type(config.model.lr)=<class 'str'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'str'>
lr_momentum 1e-10 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================

Highlighting the important info from above:

lr_momentum is interpreted as a str for values 1e-5 through 1e-9, and for some reason is then interpreted as a float for 1e-10.

A second, contrasting example below:

(hydra) jacob@serrep3:~/Documents/github issues$ python demo.py --multirun model.regularization.l2=1e-3 model.lr_momentum=1.0,1e-1,1e-2,1e-3,1e-4
[2020-09-22 00:53:56,412][HYDRA] Launching 5 jobs locally
[2020-09-22 00:53:56,412][HYDRA]        #0 : model.regularization.l2=0.001 model.lr_momentum=1.0
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 1.0 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:56,753][HYDRA]        #1 : model.regularization.l2=0.001 model.lr_momentum=0.1
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.1 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:57,083][HYDRA]        #2 : model.regularization.l2=0.001 model.lr_momentum=0.01
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.01 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:57,410][HYDRA]        #3 : model.regularization.l2=0.001 model.lr_momentum=0.001
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.001 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================
[2020-09-22 00:53:57,737][HYDRA]        #4 : model.regularization.l2=0.001 model.lr_momentum=0.0001
========================================
lr=1e-05, type(config.model.lr)=<class 'float'>
l2=0.001, type(config.model.regularization.l2)=<class 'float'>
------------------------------
lr 1e-05 <class 'float'>
lr_momentum 0.0001 <class 'float'>
l2 0.001 <class 'float'>
c1 0.01 <class 'float'>
========================================

lr_momentum is correctly interpreted as float for the range 1.0,1e-1,1e-2,1e-3,1e-4
lr is defined as 1.0e-5 in the config, and when no override is passed via cmdline the config value is interpreted correctly as float. However, if the same exact value formatted identically is passed via cmdline it's interpreted as a str!
wrapping each statement in single quotations does not change the behavior.
e.g.
'model.lr=1.0e-5'
is the same as
model.lr=1.0e-5
Similarly, formatting the same floats using decimal instead of scientific notation does not result in any different behavior.
Finally, passing any value < 1e-4 or >= 1e-9 is interpreted as str (e.g. 9.99e-5 is a str, but 9.99e-10 is still a float)

Expected Behavior

I'd expect the cmdline overrides to be interpreted in a consistent way independent of the numerical value. At the moment it appears as if Hydra simply expects the user to manually convert certain numerical values after they've been parsed for the config, and the specification of which values those are is nonexistent and highly arbitrary.

My expectation, based on having read through the docs, is that the user should expect all of the following to be interpreted as float:

model.lr=1e-5
'model.lr=1e-5'
model.lr=1e-5,1e-6 --multirun
'model.lr=1e-5,1e-6' --multirun

and expect all of the following to be interpreted as str:

model.lr="1e-5"
'model.lr="1e-5"'
model.lr="1e-5","1e-6" --multirun
'model.lr="1e-5","1e-6"' --multirun

System information

Hydra Version : 1.0.2
Python version : 3.7.9
Virtual environment type and version : conda 4.7.12
Operating system : Ubuntu Linux

Please let me know if you'd like any other info/find the cause and a way to fix that doesnt require me to write a whole config verification script (especially after I just typed all this out!)

Cheers,
Jacob

The text was updated successfully, but these errors were encountered:

omry · 2020-09-22T06:10:05Z

Thanks for reporting, and sorry for the frustrating bug hunt.
At a glance, this looks like a bug.
A smaller (not book long) repro:

$ python examples/tutorials/basic/your_first_hydra_app/1_simple_cli/my_app.py +lr=1.0e-5
lr: 1.0e-05

$ python examples/tutorials/basic/your_first_hydra_app/1_simple_cli/my_app.py -m +lr=1.0e-5
[2020-09-21 23:09:56,378][HYDRA] Launching 1 jobs locally
[2020-09-21 23:09:56,378][HYDRA]        #0 : +lr=1e-05
lr: '1e-05'

omry · 2020-09-22T06:16:46Z

I strongly recommend that you utilize Structured Configs. They would either tell you of a type error, or in this case actually convert the input to the correct type:

from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING

@dataclass
class Config:
    lr: float = MISSING


cs = ConfigStore.instance()
# Registering the Config class with the name 'config'.
cs.store(name="config", node=Config)


@hydra.main(config_name="config")
def my_app(cfg: Config) -> None:
    print(cfg)


if __name__ == "__main__":
    my_app()

$ python my_app.py -m +lr=1.0e-5
[2020-09-21 23:14:29,352][HYDRA] Launching 1 jobs locally
[2020-09-21 23:14:29,352][HYDRA]        #0 : +lr=1e-05
{'lr': 1e-05}
$ python my_app.py +lr=1.0e-5
{'lr': 1e-05}

omry · 2020-09-22T07:18:54Z

During multirun, the input you are providing is parsed, and the converted to strings that are being passed to the launcher, which in turn parses each value again.

e.g:
Your command line:

--multirun lr=1.0e-5,1.0e-6

Broken down into two floats:
1.0e-5 and 1.0e-6.
converted to strings:

lr=1.0e-05
lr=1.0e-06

Ooops, can you spot the difference?
Python like to convert the float 1.0e-5 to 1.0e-05.

There is a bug in the parser that causes it to not recognize it as a proper float which means it's interpreted as a string.
#1000 fixes it.

CC @odelalleau, OMG: You killed Kenny.

odelalleau · 2020-09-23T13:57:06Z

My condolences to Kenny's friends and family.

omry · 2020-09-23T17:10:47Z

@odelalleau can you make a note to update the upcoming OmegaConf grammar with it? it probably has the same issue.

odelalleau · 2020-09-23T17:18:36Z

@odelalleau can you make a note to update the upcoming OmegaConf grammar with it? it probably has the same issue.

Yup -- already done, thanks for checking

JacobARose added the bug Something isn't working label Sep 22, 2020

omry added this to the 1.0.3 milestone Sep 22, 2020

omry mentioned this issue Sep 22, 2020

Fix float parsing of 1e-05 #1000

Merged

omry closed this as completed in #1000 Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multirun is inconsistently misinterpreting cmdline values for float as str, but only for specific (and arbitrary) range of floats [Bug] #999

multirun is inconsistently misinterpreting cmdline values for float as str, but only for specific (and arbitrary) range of floats [Bug] #999

JacobARose commented Sep 22, 2020

omry commented Sep 22, 2020 •

edited

Loading

omry commented Sep 22, 2020

omry commented Sep 22, 2020 •

edited

Loading

odelalleau commented Sep 23, 2020

omry commented Sep 23, 2020

odelalleau commented Sep 23, 2020

multirun is inconsistently misinterpreting cmdline values for float as str, but only for specific (and arbitrary) range of floats [Bug] #999

multirun is inconsistently misinterpreting cmdline values for float as str, but only for specific (and arbitrary) range of floats [Bug] #999

Comments

JacobARose commented Sep 22, 2020

🐛 Bug

Description

Checklist

To reproduce

Expected Behavior

System information

omry commented Sep 22, 2020 • edited Loading

omry commented Sep 22, 2020

omry commented Sep 22, 2020 • edited Loading

odelalleau commented Sep 23, 2020

omry commented Sep 23, 2020

odelalleau commented Sep 23, 2020

omry commented Sep 22, 2020 •

edited

Loading

omry commented Sep 22, 2020 •

edited

Loading