Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent ordering of Optional[Union[...]] types #71

Closed
gshank opened this issue Feb 15, 2022 · 10 comments
Closed

Inconsistent ordering of Optional[Union[...]] types #71

gshank opened this issue Feb 15, 2022 · 10 comments

Comments

@gshank
Copy link
Contributor

gshank commented Feb 15, 2022

We have a NodeConfig class that has a 'unique_key' attribute defined: unique_key: Optional[Union[str, List[str]]] = None

I have two identical tests in two different test directories ('test' and 'tests'). One of them is failing because "unique_key": "id" is converted to "unique_key": ["i", "d"], which implies that the List is processed first. The other one is not failing.

When I dump out the compiled code, I see a difference in the order:

< raise InvalidFieldValue('unique_key',typing.Union[str, typing.List[str], None],value,cls)
`

            raise InvalidFieldValue('unique_key',typing.Union[typing.List[str], str, None],value,cls)

`
I'm not sure where to look in the code for the order of the Union types.

@Fatal1ty
Copy link
Owner

Hi @gshank.

If str is on the first place of Union args, then it will be on the first place in Union inside of InvalidFieldValue exception. There is no reordering under the hood. I suggest you double check the dataclass field types.

Anyway, if you put List[str] on the first place, then input data might be coerced to List[str] from a str value because str supports iterator protocol and there is no strict validation with isinstance at the moment. However, in this case you can avoid this problem by coercing a value to a list of values:

@dataclass
class NodeConfig(DataClassDictMixin):
    unique_key: Optional[Union[List[str], str]] = None

    @classmethod
    def __pre_deserialize__(cls, d):
        unique_key = d.get('unique_key')
        if isinstance(unique_key, str):
            d['unique_key'] = [unique_key]
        return d

@gshank
Copy link
Contributor Author

gshank commented Feb 16, 2022

'str' is in the first place of the Union args, but somehow it's still ending up in second place in the InvalidFieldValue exception, so something is reordering it. Thanks for the example, but we'd prefer it the other way, where if it's a single str it stays that way. I wonder if this could be a Python bug? I'll try it with some other Python versions.

@gshank
Copy link
Contributor Author

gshank commented Feb 16, 2022

You can see the definition here, in the NodeConfig class: https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/contracts/graph/model_config.py

@Fatal1ty
Copy link
Owner

I wonder if this could be a Python bug?

I don't think so.

I see that you make some non-trivial changes to the input data with hooks. If the problem occurs in the tests only, I think logging the data value at the beginning and at the end of __pre_deserialize__ method can help to figure it out. I'd like to know what's going on with the initial dictionary after pre_hook and post_hook applied.

@gshank
Copy link
Contributor Author

gshank commented Feb 16, 2022

It looks like it's a bug in typing.get_type_hints. In that particular case it returns: 'unique_key': typing.Union[typing.List[str], str, NoneType]. I'm going to change the definition from Optional[Union[str, List[str]] to Union[str, List[str], None], which so far seems to work. It's intermittent (since in most other case it doesn't do that), and I'm not up for debugging Python.

Thanks for taking a look.

@Fatal1ty
Copy link
Owner

Fatal1ty commented Feb 16, 2022

@gshank I think you didn't try mashumaro 2.11 at least? There were some fixes for Union and None: https://github.com/Fatal1ty/mashumaro/releases/tag/v2.11.
Release 2.9.1 also contains a fix for deserialization override for Union types: https://github.com/Fatal1ty/mashumaro/releases/tag/v2.9.1

P.S. I should definitely create a template for the issues :)

@Fatal1ty
Copy link
Owner

Anyway, if the problem is solved, we can close this issue.

@gshank
Copy link
Contributor Author

gshank commented Feb 16, 2022

We're on mashumaro 2.9 right now. I did a typing.get_type_hints(NodeConfig) (which leaves mashumaro out of it entirely), and the order came out different than it was in the code. I believe that for Unions mashumaro constructs code that tries the various types in order?

It's not totally clear to me that this is actually a typing.get_type_hintsbug. In theory, I guess that elements in the Union should be transposable. But this doesn't work very well for some types when trying to serialize by trying them out. Perhaps the solution is some kind of combined str/List[str] type which would handle this properly.

@Fatal1ty
Copy link
Owner

I believe that for Unions mashumaro constructs code that tries the various types in order?

Yes, it does it in the initial order. Union arguments are stored in __args__ attribute.

In theory, I guess that elements in the Union should be transposable.

Ideally, yes. But without strict validation of input data types it's not possible. I'm working on a validation feature that would help with this.

Perhaps the solution is some kind of combined str/List[str] type which would handle this properly.

Or just create a list of one element during deserialization in case of str value since List[str] is a valid type here :)

@gshank
Copy link
Contributor Author

gshank commented Feb 16, 2022

That works too :)

@gshank gshank closed this as completed Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants