Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strict validation against Hydra's supported primitives #163

Merged
merged 38 commits into from Nov 26, 2021

Conversation

rsokl
Copy link
Contributor

@rsokl rsokl commented Nov 17, 2021

The Eyes of T.J. Eckleburg, The Great Gatsby (2013)

The Eyes of T.J. Eckleburg, The Great Gatsby (2013)

Closes #149

Adding strict validation against Hydra's supported primitives

Prior to this PR, hydra-zen would not check configured values to see if they are compatible with Hydra's supported primitives. This means that you could construct a config via builds and make_config that are destined to raise whenever Hydra tries to serialize or instantiate the config. This is no good!

Before:

from hydra_zen import make_config, to_yaml, instantiate, builds

class MyCustomClass:
    pass
>>> not_supported = MyCustomClass()
>>> Conf = make_config(a=not_supported)  # hydra-zen is OK with this even though it is doomed to fail
>>> to_yaml(Conf)   # will raise
>>> instantiate(Conf) # will raise

Also, our annotations did not provide any feedback along these lines

image

After:

Now builds, make_config, and hydrated_dataclass strive to strictly validate all configured values -- whether they are specified by the user or auto-populated from an object's signature. The goal here is to catch mistakes in configs, which would eventually be "snagged" by omegaconf/Hydra, before you even launch your app.

>>> not_supported = MyCustomClass()
>>> Conf = make_config(a=not_supported) 
---------------------------------------------------------------------------
HydraZenUnsupportedPrimitiveError         Traceback (most recent call last)
HydraZenUnsupportedPrimitiveError: Building: dict ..
 The configured value <__main__.MyCustomClass object at 0x000001E01B72B4C0>, for field `a`, is not supported by Hydra. Serializing or instantiating this config would ultimately result in an error.
Consider using `hydra_zen.builds` to create a config for this particular value.

I hope that the error message is sufficiently clear

HydraZenUnsupportedPrimitiveError: Building: dict ..
The configured value <main.MyCustomClass object at 0x000001E01B72B4C0>, for field a, is not supported by Hydra. Serializing or instantiating this config would ultimately result in an error.
Consider using hydra_zen.builds to create a config for this particular value.

We even peek inside containers and validate their contents

>>> builds(dict, a=[1, not_supported])  # will raise  (inside list)
>>> make_config(a={1: not_supported}))  # will raise  (as value in dict)
>>> make_config(a={not_supported: "a"}))  # will raise  (as key in dict)

Note that all of our type-checking used for this validation should adhere to the form: type(x) is in SUPPORTED_TYPE, and not isinstance(x, SUPPORTED_TYPES) because we do not handle/permit subclasses of valid primitive types.

Providing Support for Additional Primitives

This PR isn't just about narc-ing on bad configs, it also adds new capabilities! We can add support for common types that Hydra doesn't handle. Presently, we add support for:

  • bytes
  • bytearray
  • complex
  • Counter
  • deque
  • Path
  • PosixPath
  • WindowsPath
  • range
  • set
  • frozenset
from pathlib import Path
from hydra_zen import to_yaml

def pp(x): print(to_yaml(x))
>>> pp(make_config(a=Path.home()))
a:
  _args_:
  - C:\Users\Ryan Soklaski
  _target_: pathlib.Path

>>> instantiate(make_config(a=Path.home()))
{'a': WindowsPath('C:/Users/Ryan Soklaski')}

>>> pp(make_config(a={1, 2, 3}))
a:
  _target_: builtins.set
  _args_:
  - - 1
    - 2
    - 3

>>> pp(make_config(a=1+2j))
a:
  real: 1.0
  imag: 2.0
  _target_: builtins.complex

We're pretty fancy here - we even permit things like sets of complex numbers!

>>> pp(make_config(a={1+2j, -10j}))
a:
  _target_: builtins.set
  _args_:
  - - real: -0.0
      imag: -10.0
      _target_: builtins.complex
    - real: 1.0
      imag: 2.0
      _target_: builtins.complex

Improved Annotations

The annotation for our config-creating functions are much better now; type-checkers will now flag bad inputs. The following reflects the behavior of pyright/pylance under the "basic" type-checking mode.

image

Feedback on this PR

Exception vs Warning (for now)

Given that we raise an exception for this validation, upgrading from v0.3 to v0.4 could be a rude awakening for folks who are, for some reason, using hydra-zen to a bunch of ultimately-invalid configs.

I have been considering having this PR raise a warning on bad-configurations, and note that in a future version the warning will become an exception. The reasons why I opted for going the more abrupt route of raising an error is:

  1. For current user code for which this PR will introduce exceptions, I expect that their code would be raising already, but just downstream. I suppose that that folks could be using builds and make_config in a way that is detached from Hydra... but that really isn't what hydra_zen is for, and I wouldn't be surprised if literally no one is doing this.
  2. It is harder to track down the source of a warning than it is the emitter of an exception. So going the route of raising an exception actually makes it easier for new users to quickly diagnose the source of their error.
  3. I want people to immediately be able to incorporate this improved validation in the automated tests. Converting warnings to exceptions on the user-side is way too much work.

The above is my reasoning on this topic. I am open to discussion if there are other opinions.

Any Other Low-Hanging Fruit for Supporting Primitives?

I plan to add support for numpy.ndarray and torch.Tensor, but in a separate PR because handling the dynamic imports is a bit trickier.

That all being said, are there other primitives, beyond set, frozenset, complex, and pathlib.Path, that we should add support for? For example, pathlib.Path jumped out at me as being super useful. Obviously there are plenty of fish in the std-lib sea; support can be added incrementally.

Possible Future Capability: Enable Users to Register Their Own Primitives / Conversion Strategies

There are various ways by which we could enable users to register new types and corresponding conversion strategies.

The biggest "Pro" is that if someone is regularly using some custom type in their config, registering a converter makes config-design much more elegantly.

The biggest "Con" that I can think of is that this can introduce "secret" dependencies on dynamic code. E.g. A user's config-generating code can only be run after that converter has been registered. Fortunately, the resulting config (either the dataclass or yaml) should be independent of this dependency. This is an improvement of omegaconf's capability for registering resolvers -- where even yamls can become secretly tethered to third-party code.

Implementation Details

Restricted Primitives for Dictionary Keys

omegaconf does not permit dataclasses as keys in dictionaries, even if they would instantiate to a hashable value. Thus the extended primitive support is not provided for dict-keys

image

No Support for attrs (for now)

We currently do not provide any support for the attrs library; it is likely that attrs instances will get caught by our validation in its current form. This is not likely to impact most hydra-zen users in the near term, but we should eventually add support for it.

Support for Enum

Although Omegaconf provides support for Enum, the yaml serialization process only retains the enum-member's name:

from enum import Enum

from hydra_zen import make_config, builds, to_yaml, instantiate
from omegaconf import OmegaConf, ListConfig, DictConfig

class Color(Enum):
    red = 1
    green = 2
>>> OmegaConf.create(to_yaml({1: Color.red}))
{1: 'red'}

It appears that the enum-member can only be restored if a schema is provided that contains the Enum subclass in the annotation. Thus hydra-zen instead provides specialized support for Enum:

>>> OmegaConf.create(to_yaml(make_config(a={1: Color.red})))
{'a': {1: {'_target_': '__main__.Color', '_args_': [1]}}}

this way, one can roundtrip from a yaml and restore the actual enum-member:

>>> instantiate(OmegaConf.create(to_yaml(make_config(a={1: Color.red}))))
{'a': {1: <Color.red: 1>}}

@rsokl rsokl added this to the hydra-zen 0.4.0 milestone Nov 17, 2021
@rsokl rsokl added the enhancement New feature or request label Nov 17, 2021
@rsokl rsokl marked this pull request as ready for review November 17, 2021 18:37
@rsokl rsokl requested a review from jgbos November 17, 2021 18:45
@jgbos
Copy link
Contributor

jgbos commented Nov 18, 2021

HydraZenUnsupportedPrimitiveError: Building: dict ..
The configured value <main.MyCustomClass object at 0x000001E01B72B4C0>, for field a, is not supported by Hydra. Serializing or instantiating this config would ultimately result in an error.
Consider using hydra_zen.builds to create a config for this particular value.

I'm not sure this is clear enough for beginners. Do you mean that a user should try to build a configuration for the value rather than directly using the value, e.g., builds(MyCustomClass) instead? Or in the "before this PR world" when I needed to create a special class for complex types and also create a configuration?

Maybe just reorder the sentence, "Consider creating a configuration for the particular value using hydra_zen.builds".

@jgbos
Copy link
Contributor

jgbos commented Nov 18, 2021

Your logic on the the "exception vs. warning" is good.

I can only think of one use case that might benefit from warnings instead: what if the user doesn't use the Hydra backend but wants to use the configs?

I know, not a strong argument against exceptions, just trying to think of any good reason to go to warnings.

Copy link
Contributor

@jgbos jgbos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes, nothing pops out as unusual. Excited to have complex support :)

@rsokl
Copy link
Contributor Author

rsokl commented Nov 18, 2021

I just realized that we should include omegaconf containers among the supported primitives.

Copy link
Contributor

@jgbos jgbos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just used the enum support for some configs :)

@rsokl
Copy link
Contributor Author

rsokl commented Nov 26, 2021

Finally wrapping up on this. I added some additional discussion and remarks on implementation details in the original post.

@rsokl rsokl merged commit d70d29a into main Nov 26, 2021
@rsokl rsokl deleted the improve-value-validation branch November 26, 2021 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complex Numbers
2 participants