Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fields annotated as Dict now has their struct flag set to False by default #581

Closed
wants to merge 4 commits into from

Conversation

omry
Copy link
Owner

@omry omry commented Mar 7, 2021

close #461

The expectation of a Structured Config with a Dict field is that the dict field is free to accept new keys:

@dataclass
class Foo:
  bar: Dict[str, str]

This expectation is violated if the Structured Config has a parent in struct mode.

cfg = OmegaConf.create({"foo" : Foo})
OmegaConf.set_struct(cfg, True)  # setting struct flag on the parent.
cfg.foo.bar["a"] = 10 # this will fail now.

This PR is fixing the issue by explicitly setting DictConfig typed as Dict[K, V] to have a struct flag set to False.
There could be some side effects to this (this will open up any non DictConfig children (not structured configs) of the dict), but that seems pretty unlikely to cause a real issue in practice.

This is a potential use case for non recursive config flags, but for now I am parking that idea and will revisit if there is better use case.

@omry omry force-pushed the 461_annotated_dicts_are_open branch from 8c792dc to ad9580c Compare March 7, 2021 05:21
@lgtm-com
Copy link

lgtm-com bot commented Mar 8, 2021

This pull request introduces 2 alerts when merging 0b362b6 into 7dbb11a - view on LGTM.com

new alerts:

  • 2 for Unused import

@omry omry force-pushed the 461_annotated_dicts_are_open branch from 0b362b6 to 170786f Compare March 8, 2021 00:52
@omry omry force-pushed the 461_annotated_dicts_are_open branch from 170786f to b096f3f Compare March 8, 2021 01:08
@omry omry requested review from odelalleau and Jasha10 March 8, 2021 01:16
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides some minor typos, I'm going to need to better understand this struct stuff in order to review this PR

.flake8 Show resolved Hide resolved
news/461.bugfix Outdated Show resolved Hide resolved
omegaconf/dictconfig.py Outdated Show resolved Hide resolved
omegaconf/dictconfig.py Show resolved Hide resolved
@@ -906,6 +906,9 @@ def _node_wrap(
)
if is_dict:
key_type, element_type = get_dict_key_value_types(type_)
flags = {}
if is_dict_annotation(type_):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with the distinction between type_ and ref_type in this function: they are both set to the same value in the code calling it, but tests seem to use different values sometimes. Also in this function there are both occurrences of ref_type=type_ and ref_type=ref_type, which doesn't help understanding what is going on.

Would it be possible to clarify in a docstring or something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both type_ and ref_type are constant within this function's body.
We can refactor by replacing each type_ with ref_type and dropping type_ from the function signature.
Nevermind, @odelalleau you make a good point about the tests calling with different values.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is complicated and the desired behavior is driven by the tests.
At a high level ref_type is equivalent to an annotation type and object_type is, well, the object type.

x : Any = "str"

In the above, the ref_type of x would be Any and the object_type would be str.
ref_type is usually used when assigning an object:

x : str = "str"
x = 10 # will trigger a validation error.

# in actual OmegaConf code:
cfg = OmegaConf.create({"x" : StringNode("str")})
cfg.x = 10

# or with dataclass:
@dataclass
class Foo:
  x : str = "str"

cfg = OmegaConf.structured(Foo)
cfg.x = 10

There are some scenarios, especially during merge - where we use the ref_type for more things.

@@ -906,6 +906,9 @@ def _node_wrap(
)
if is_dict:
key_type, element_type = get_dict_key_value_types(type_)
flags = {}
if is_dict_annotation(type_):
flags["struct"] = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this change, but I was trying to understand how things are working, and I'm confused by this:

cfg = OmegaConf.create({"a": {"b": 0}})
OmegaConf.set_struct(cfg, True)
cfg.a = {"b": 1}

which raises this exception:

ConfigKeyError: Key 'b' is not in struct
    full_key: a.b
    object_type=dict

What's going on there? (it's the same on master btw)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it a bug.
file an issue. I am expecting this assignment to succeed.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odelalleau
Copy link
Collaborator

Also, at higher level, it feels a bit like a hack, and I tend to agree with what you said in #461: "From the perspective of OmegaConf, this does not make sense".

What would make more sense to me would be to support "real" Python dicts / lists as ValueNodes. Something like:

@dataclass
class Example:
    real_dict: Dict[str, int]
    dict_config: DictConfig[str, int]

The first one (real_dict) wouldn't get turned into a DictConfig, but would remain a plain dictionary (while the second one would behave as what we have now with dicts)

That'd be a big change though (and I definitely haven't thought it through)... so I won't be pushing back against this PR, but I still wanted to mention it.

omry and others added 2 commits March 8, 2021 15:12
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
@omry
Copy link
Owner Author

omry commented Mar 8, 2021

Some context about the evolution of OmegaConf.

Initially, there was no type safety at all. The behavior was simialar to a plain python dict, with some additions like interpolation, support for missing values, etc.
There was an obvious need for some kind of type safety, the first attempt to add it was struct mode.

When it was introduced, DictConfig access of a none existing field was returning None (this is changing in 2.1), and assignment of none existing fields created them as expected.
One easy thing to do is to use the structure of a config object as the schema, making it more like a C struct with fields.
This is struct mode. Struct mode is recursive like all flags. A node is in struct mode if struct mode is set for it or if it's parent is in struct mode.
Struct mode was working nicely, but it was not good enough: It is good for structure validation but not for type validation.

The next step is Structured Configs.
Initially I thought that Structured Configs can be implemented in terms of the struct flag, but it's not the case for various reasons, the primary being that the effect should not be recursive.
If a node is a Structured Config, it should be closed to changes/access of fields not in the class even if struct mode is not set. and it children should be considered structs automatically just because that node is.

In some scenarios, we want a dataclass that both defines some fields but also allow the assignment of arbitrary new fields.
This is implemented by extending Dict[K, V].
K, V also indicate the valid key and value type for fields (I believe this should cover both fields mentioned explictly and newly added fields - I think this is somewhat conflicting with the support added by @Jasha10 when he added support for new key types, but it's a corner case).

The high level bit for Structured Configs extending Dict, is that they are open to changed even though the node is a Structured Config.

Another thing that is supported is a field that is typed as Dict[K, V].
Such a field is useful if there is no good schema for a node inside a Structured Config (or if a Schema is too expensive to create and the user just want to go free-style), and the Dict annotation means that that node can accept new compatible keys.

In both of of the above cases, Dict is used to soften the strictness of Structured Configs.
Once a field is annotated as a Dict, or a node is extending Dict - there is question to about their behavior in the presence of the struct flag:
If it's on the node itself, we should respect that (The struct flag can also open Structured Configs node for changes).
If it's in a parent, it's not as clear and this diff changes the behavior by automatically setting the struct flag to False on the such nodes to make it so they do not inherit the struct flag from the parent.

If you think about the evolution of OmegaConf it would make more sense but I do agree it's not obviously the right choice.
Generally speaking, the struct flag is a "poor man's Structured Config". Once we have Structured Configs it should not get in the way.

I will take a second look at this to see if there is an alternative way to support the behavior I need, in particular I am thinking to introduce a mechanism that will mark a node as flags root and will stop the recursive querying of parents.
This might solve the problem I have in Hydra in a less intrusive way.

@odelalleau
Copy link
Collaborator

Thanks a lot for the detailed extra context, that helps! (I'll re-read this PR later in light of this -- will probably happen only tomorrow though)

Just a quick question though: what's the motivation for not setting the struct flag to True on the root node when calling OmegaConf.structured()?
(something that confused me is that Structured Configs would not have this flag set)

@omry
Copy link
Owner Author

omry commented Mar 9, 2021

The reason I added OmegaConf.structured() was to support ducktyping:

user: User = OmegaConf.structured(User).

Your question is actually why is the struct flag not set to True on Structured Config nodes.
The reason is that I did not want the effect to be recursive.
Unstructured nodes inside Structured nodes should not be closed to addition of new fields just because they happen to be inside a Structured Config node.

I think this could have been implemented that way had I introduced non recursive node flags as well.

@omry
Copy link
Owner Author

omry commented Mar 9, 2021

Also, at higher level, it feels a bit like a hack, and I tend to agree with what you said in #461: "From the perspective of OmegaConf, this does not make sense".

What would make more sense to me would be to support "real" Python dicts / lists as ValueNodes. Something like:

@dataclass
class Example:
    real_dict: Dict[str, int]
    dict_config: DictConfig[str, int]

The first one (real_dict) wouldn't get turned into a DictConfig, but would remain a plain dictionary (while the second one would behave as what we have now with dicts)

That'd be a big change though (and I definitely haven't thought it through)... so I won't be pushing back against this PR, but I still wanted to mention it.

I just wanted to respond to it:
OmegaConf convert dicts to DictConfig automatically and ALWAYS.
It also convert dataclasses to DictConfig.
I have no intention of supporting plain dicts inside the config.
I hope the context I provided explains the motivation for supporting Dict annotated fields (that motivation is not to say that those are actual dicts).

You are trying to (ab)use a different notation to indicate that some things should not happen on the node. What about interpolations? what about type safety? is Dict[str, int] guaranteed to only contain int values or is it a free dict and the user can actually put in incompatible values?

In short, I don't think this is a good direction.

@odelalleau
Copy link
Collaborator

In short, I don't think this is a good direction.

Fair enough. Just to quickly answer your question, this would have been a new subclass of ValueNode, with behavior similar to IntegerNode, StringNode, etc.

@omry
Copy link
Owner Author

omry commented Mar 9, 2021

Fair enough. Just to quickly answer your question, this would have been a new subclass of ValueNode, with behavior similar to IntegerNode, StringNode, etc.

If I understand you correctly, this means it would get assignment validation only, but later it will be allowed to drift from the schema.

@omry
Copy link
Owner Author

omry commented Mar 9, 2021

I am closing this in favor of the more subtle #588.

@omry omry closed this Mar 9, 2021
@odelalleau
Copy link
Collaborator

Fair enough. Just to quickly answer your question, this would have been a new subclass of ValueNode, with behavior similar to IntegerNode, StringNode, etc.

If I understand you correctly, this means it would get assignment validation only, but later it will be allowed to drift from the schema.

That's correct (static type checking should still work though, similar to regular Python dicts)

@omry
Copy link
Owner Author

omry commented Mar 9, 2021

Fair enough. Just to quickly answer your question, this would have been a new subclass of ValueNode, with behavior similar to IntegerNode, StringNode, etc.

If I understand you correctly, this means it would get assignment validation only, but later it will be allowed to drift from the schema.

That's correct (static type checking should still work though, similar to regular Python dicts)

What about merging from the command line?
How does static type checkers help there? :)

@odelalleau
Copy link
Collaborator

What about merging from the command line?
How does static type checkers help there? :)

Use a DictConfig :)

I'm definitely not saying these should replace the current DictConfig mechanics. Just that I can see a reason to support plain dicts as well (maybe because we want to rely on some features not implemented in DictConfig, or want to avoid some DictConfig features leaking to the dict -- like the struct flag -- or for performance reasons)

@omry
Copy link
Owner Author

omry commented Mar 10, 2021

Python supports plain dict plenty.
DictConfig is an enhanced version.
It comes with a cost. if it's not acceptable you can use a plain dict directly.

@omry omry deleted the 461_annotated_dicts_are_open branch June 7, 2021 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Be able to add field to dict types unless the node itself is marked as structured mode
3 participants