Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] @dataclass __post_init__ for config validation #377

Closed
Jasha10 opened this issue Sep 19, 2020 · 5 comments
Closed

[Feature Request] @dataclass __post_init__ for config validation #377

Jasha10 opened this issue Sep 19, 2020 · 5 comments
Labels

Comments

@Jasha10
Copy link
Collaborator

Jasha10 commented Sep 19, 2020

As per this comment, I am filing this issue against the OmegaConf repository after originally filing here against the Hydra repository.


🚀 Feature Request

The __post_init__ hook called by the __init__ function defined in dataclasses can be used for post-processing or validation of configuration information.

Motivation

One possible use case for the __post_init__ hook defined in PEP 557 -- Data Classes is the validation of configuration data. Here is an example:

# post_init_example.py
from dataclasses import dataclass

@dataclass
class MySQLConfig:
    host: str = "localhost"
    port: int = 3306
    def __post_init__(self):
        assert 7000 < self.port < 99999

cfg = MySQLConfig()

When running this file, we get an assertion error because the port number is outside the allowable range:

jbss@rig1:~$ python post_init_example.py
Traceback (most recent call last):
  File "post_init_example.py", line 10, in <module>
    cfg = MySQLConfig()
  File "<string>", line 5, in __init__
  File "post_init_example.py", line 8, in __post_init__
    assert 7000 < self.port < 99999
AssertionError

However, when the MySQLConfig class is used in a @hydra.main app, the validation code from __post_init__ does not get run:

# post_init_example2.py, adapted from Hydra docs
from dataclasses import dataclass

import hydra
from hydra.core.config_store import ConfigStore

@dataclass
class MySQLConfig:
    host: str = "localhost"
    port: int = 3306
    def __post_init__(self):
        assert 7000 < self.port < 99999

cs = ConfigStore.instance()
# Registering the Config class with the name 'config'.
cs.store(name="config", node=MySQLConfig)

@hydra.main(config_name="config")
def my_app(cfg: MySQLConfig) -> None:
    print(cfg)

if __name__ == "__main__":
    my_app()
jbss@rig1:~$ python post_init_example2.py
{'host': 'localhost', 'port': 3306}

No AssertionError was triggered. This may be unexpected behavior for users who are familiar with the normal behavior of the @dataclass decorator.

Pitch

Describe the solution you'd like

In the type signature for the my_app function above, the type hint for the cfg argument is MySQLConfig, but at runtime the type of cfg is omegaconf.dictconfig.DictConfig. What if there were the option to have hydra return a bona fide instance of MySQLConfig, so that assert isinstance(cfg, MySQLConfig) is true within the body of the my_app function? If the option were available to work with instances of the user-defined dataclass, rather than instances of DictConfig, clients could make use of methods defined on their custom config dataclass, including but not limited to the __post_init__ method that is automatically called upon dataclass initialization.

Describe alternatives you've considered

  1. One alternative is to use keyword expansion to convert the parsed DictConfig object into a MySQLConfig object:

     cfg = MySQLConfig(**cfg)  # convert DictConfig -> MySQLConfig
    

    This does not play well with nested config structures, e.g. if MySQLConfig has a field that is another (different) dataclass.

  2. Another alternative is to employ the dacite library to convert the DictConfig object into an instance of MySQLConfig: after calling import dacite and import omegaconf, the following definition of my_app can be used to achieve the desired effect:

     @hydra.main(config_name="config")
     def my_app(cfg0: omegaconf.DictConfig) -> None:
         cfg1: typing.Dict = omegaconf.OmegaConf.to_container(cfg0)
         cfg2: MySQLConfig = dacite.from_dict(data_class=MySQLConfig, data=cfg1)
         ...
    

    The MySQLConfig.__init__ and MySQLConfig.__post_init__ functions get called when dacite.from_dict creates the MySQLConfig instance. To be clear, this alternative involves a three-step procedure: (1) use hydra to parse the DictConfig object, (2) use omegaconf's to_container to get a native python container, and (3) use dacite to parse the python container as a MySQLConfig instance. This alternative does work with nested dataclass structures; it is the solution that I am using currently.

  3. Of course, there are other packages besides hydra for parsing configuration/settings directly into a dataclass, e.g. argparse-dataclasses or the SimpleParsing package. Would be nice if hydra supported this feature though.

Are you willing to open a pull request?
I think some brainstorming would be required regarding the API for this feature. How would the user indicate whether they want a DictConfig or a MySQLConfig object? Perhaps a keyword argument could be exposed by the @hydra.main decorator or by the hydra.core.config_store.ConfigStore class? There are some edge cases to consider too, such as how to handle cases of mixed config (e.g. yaml config mixed with dataclasses).

At this point the hydra project seems fairly tightly coupled to the omegaconf project. I'm not sure how easy it would be to implement this feature or where to start. The dacite solution mentioned above continues to be useful. Nevertheless, I am creating this issue to share the idea with others...

@omry
Copy link
Owner

omry commented Sep 19, 2020

Thanks, copying my response from there here for better context:


Thanks for the thoughtful feature request @Jasha10.
This feature request should be against OmegaConf and not Hydra, but I can respond here.

__post_init__ depends on Structured Configs method support in OmegaConf. Currently methods are not supported and it's something I will consider for a later OmegaConf version.

The use cases you are describing will be addressed by #131.
post_init method feels a bit too powerful for my liking, it allows raising of unexpected exceptions and also potentially changing the object in breaking ways.
Another thing here is that the self for such a method would not be the dataclass instance but the DictConfig instance, which in turn means this is not really a normal post_init (although for most purposes it can written normally).

@omry omry added enhancement New feature or request wishlist blocked labels Sep 19, 2020
@omry
Copy link
Owner

omry commented Jun 4, 2021

Given that we now have a method to convert to native dataclasses, I think this is no longer needed and I suggest we close this.

@Jasha10
Copy link
Collaborator Author

Jasha10 commented Jun 5, 2021

Sounds good to me, closing now :)

Given that we now have a method to convert to native dataclasses...

For people watching this issue or discovering it later: The idea is to use the OmegaConf.to_object method, which is new in OmegaConf 2.1.rc1.

@dataclass
class MySQLConfig:
    host: str = "localhost"
    port: int = 3306
    def __post_init__(self):
        assert 7000 < self.port < 99999

cfg = OmegaConf.structured(MySQLConfig)  # create an omegaconf.DictConfig object
cfg2 = OmegaConf.to_object(cfg)  # convert the DictConfig into an instance of MySQLConfig

OmegaConf.to_object instantiates an instance of MySQLConfig, and the MySQLConfig.__post_init__ method gets run during instantiation.

@Jasha10 Jasha10 closed this as completed Jun 5, 2021
@karthikprasad
Copy link

karthikprasad commented Aug 12, 2021

@dataclass
class MySQLConfig:
    host: str = "localhost"
    port: int = 3306
    def __post_init__(self):
        assert 7000 < self.port < 99999

cfg = OmegaConf.structured(MySQLConfig)  # create an omegaconf.DictConfig object
cfg2 = OmegaConf.to_object(cfg)  # convert the DictConfig into an instance of MySQLConfig

OmegaConf.to_object instantiates an instance of MySQLConfig, and the MySQLConfig.__post_init__ method gets run during instantiation.

FTR, you can also trigger __post_init__() by simply running

cfg = OmegaConf.structured(MySQLConfig())  # pass the config instance instead of class

@Jasha10
Copy link
Collaborator Author

Jasha10 commented Aug 12, 2021

FTR, you can also trigger __post_init__() by simply running

cfg = OmegaConf.structured(MySQLConfig()). # pass the config instance instead of class

To be clear, this is triggering MySQLConfig.__post_init__ when the instance is created, before OmegaConf becomes involved. It is equivalent to the following:

_instance = MySQLConfig()  # __post_init__ is run here
cfg: DictConfig = OmegaConf.structured(_instance)  # this does not run __post_init__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants