Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn about buggy type resolution #529

Merged

Conversation

DavidCain
Copy link
Contributor

This type annotation resolution is a flawed implementation -- ideally a
proper fix can come later, but for now it's worth warning about the
present behavior.

dataclasses-json does not behave correctly when using string-style
annotations (a frequent tactic for annotating types that have not yet
been defined at that point in the file).

The core problem comes from the assumption that type annotations in a
module will match sys.modules() -- maybe_resolved will frequently
resolve to a different value than a typechecker would!

A short example of the error

example_types.py:

from dataclasses import dataclass

from dataclasses_json import dataclass_json

@dataclass_json
@dataclass
class Config:
    options: list["Option"]

@dataclass_json
@dataclass
class Option:
    label: str

Expected behavior

_decode_items correctly identifies Option as coming from
example_types:

>>> from example_types import Config
>>> print(Config.from_dict({"options": [{"label": "repro"}]}).options)
[Option(label='repro')]

Unexpected behavior

_decode_items() incorrectly identifies "Option" as from a third
party module that is not in scope at time of annotation:

repro.py:

import click.parser  # Has `Option`
from example_types import Config

print(Config.from_dict({"options": [{"label": "repro"}]}).options)
$ python3.10 repro.py
[{'label': 'repro'}]

Related fix -- truthiness

The conditional if maybe_resolved is meant to handle the case of an
attribute not being found in the module (it could basically be written
as if maybe_resolved is not None). However, that doesn't cover the
case of types that are falsy at runtime!

CustomNoneType = None

values: list['CustomNoneType'] = [None, None]

We can just use hasattr() instead.

This type annotation resolution is a flawed implementation -- ideally a
proper fix can come later, but for now it's worth warning about the
present behavior.

`dataclasses-json` does not behave correctly when using string-style
annotations (a frequent tactic for annotating types that have not yet
been defined at that point in the file).

The core problem comes from the assumption that type annotations in a
module will match `sys.modules()` -- `maybe_resolved` will frequently
resolve to a different value than a typechecker would!

A short example of the error
============================

`example_types.py`:
```python
from dataclasses import dataclass

from dataclasses_json import dataclass_json

@dataclass_json
@DataClass
class Config:
    options: list["Option"]

@dataclass_json
@DataClass
class Option:
    label: str
```

Expected behavior
-----------------
`_decode_items` correctly identifies `Option` as coming from
`example_types`:

```python
>>> from example_types import Config
>>> print(Config.from_dict({"options": [{"label": "repro"}]}).options)
[Option(label='repro')]
```

Unexpected behavior
-------------------
`_decode_items()` incorrectly identifies `"Option"` as from a third
party module that is not in scope at time of annotation:

`repro.py`:
```python
import click.parser  # Has `Option`
from example_types import Config

print(Config.from_dict({"options": [{"label": "repro"}]}).options)
```

```bash
$ python3.10 repro.py
[{'label': 'repro'}]
```

Related fix -- truthiness
=========================
The conditional `if maybe_resolved` is meant to handle the case of an
attribute not being found in the module (it could basically be written
as `if maybe_resolved is not None`). However, that doesn't cover the
case of types that are falsy at runtime!

```python
CustomNoneType = None

values: list['CustomNoneType'] = [None, None]
```

We can just use `hasattr()` instead.
@DavidCain
Copy link
Contributor Author

I expect that #442 may well address this issue with a refactor (which is why I didn't make any effort to change behavior here). However, I at least wanted to document this issue & hopefully help others who stumble on it!

@DavidCain
Copy link
Contributor Author

For anybody else reading this (and using this package on Python 3.7 through 3.10) a good workaround is to use from __future__ import annotations -- that way you can use postponed evaluations:

+from __future__ import annotations

 @dataclass_json
 @dataclass
 class Config:
-    options: list["Option"]
+    options: list[Option]

@DavidCain
Copy link
Contributor Author

@lidatong - do you have any thoughts about this? I'm hoping a warning can save some others a lot of time & confusion -- took me a little while to realize that this package resolves names to the first module that happens to have a variable of that name.

@george-zubrienko
Copy link
Collaborator

@DavidCain is it possible for you to add some units proving the fix? Also note that after 3.11 this is not super relevant since 'self' types are resolved there w/o string hacks

@george-zubrienko
Copy link
Collaborator

For anybody else reading this (and using this package on Python 3.7 through 3.10) a good workaround is to use from __future__ import annotations -- that way you can use postponed evaluations:

+from __future__ import annotations



 @dataclass_json

 @dataclass

 class Config:

-    options: list["Option"]

+    options: list[Option]

No you should not use that import with DCJ as it breaks the type handling in a lot of cases. Please review documentation :)

@DavidCain
Copy link
Contributor Author

@DavidCain is it possible for you to add some units proving the fix?

@george-zubrienko - I can certainly add unit tests. Just for clarity though, are you suggesting that I:

  • test this current approach of maintaining the buggy behavior (but emitting warnings?)
  • or change the way string annotations are resolved, fixing the bug

Also note that after 3.11 this is not super relevant since 'self' types are resolved there w/o string hacks

Yup, agreed that Python 3.11 helps! That's why I recommended from __future__ import annotations -- it's basically opting into Python 3.11 postponed evaluations.

Though note that the issue demonstrated here is not annotating the type of the class itself, but instead annotating a class that comes later in the file and hasn't been interpreted yet at time of declaration.

@DavidCain
Copy link
Contributor Author

No you should not use that import with DCJ as it breaks the type handling in a lot of cases. Please review documentation :)

@george-zubrienko , I assume you're referring to the line here -- https://github.com/lidatong/dataclasses-json?tab=readme-ov-file#handle-recursive-dataclasses

I understand that the __future__ import may not be "sanctioned" but given the choice between whatever bugs may arise from Python 3.11-style typing or "pick the first module which happens to have a variable of that name" that's an easy choice.

In other words, using 'Option' creates a bug 100% of the time, but I haven't seen bugs from using annotations). Finally, it should be safe to use Python's __future__ imports! To not support future imports is to be unable to support newer Python versions.

@george-zubrienko
Copy link
Collaborator

Finally, it should be safe to use Python's future imports! To not support future imports is to be unable to support newer Python versions.

You misunderstand the concept of future imports. Those are things that might not even make it into next release. Using those is always at your own risk and no, no stable and reliable code should rely or support future imports. This is especially important when you are working with type handling which in Python is screwed to the rest of the language with rusty nails

@george-zubrienko
Copy link
Collaborator

I understand that the future import may not be "sanctioned" but given the choice between whatever bugs may arise from Python 3.11-style typing or "pick the first module which happens to have a variable of that name" that's an easy choice.

Easy choice that works for easy cases. Metaclasses, apps that rely on pickling such as pyspark will have issues with that specific import, not to mention that by adding it you implicitly add a necessity to verify all your other imports behaviours as they might be affected as well.

@george-zubrienko
Copy link
Collaborator

test this current approach of maintaining the buggy behavior (but emitting warnings?)
or change the way string annotations are resolved, fixing the bug

It is a tough one to fix pre-3.11. You'll have play the string matching game which is tough, I fixed that for one of the cases some months ago, but I feel like its just chasing the mice without addressing the root issue. Unfortunately we have to wait for 3.10 EOL, so a warning should be fine, I'd just like to see smth that covers the code you are adding
If you have a good idea for a proper fix, I would be glad to review and get it in if implemented

@DavidCain
Copy link
Contributor Author

You misunderstand the concept of future imports. Those are things that might not even make it into next release.

Fair enough! I know that any future import may ultimately never be added to the language ("MandatoryRelease may also be None, meaning that a planned feature got dropped or that it is not yet decided.")

https://docs.python.org/3/library/__future__.html

What I meant to say is that all current possible imports in __future__ already have been released (or will be released). They should all be safe to use.

>>> from __future__ import annotations
>>> annotations.getMandatoryRelease()
(3, 11, 0, 'alpha', 0)

Using those is always at your own risk and no, no stable and reliable code should rely or support future imports.

Disagree -- __future__ is a canonical way to support multiple Python versions at once.

This is especially important when you are working with type handling which in Python is screwed to the rest of the language with rusty nails

Hah no disagreement there, but again really can't stress enough how much "pick the first module that happens to have a variable of this name" is also screwed.

@DavidCain
Copy link
Contributor Author

It is a tough one to fix pre-3.11. You'll have play the string matching game which is tough, I fixed that for one of the cases some months ago, but I feel like its just chasing the mice without addressing the root issue. Unfortunately we have to wait for 3.10 EOL, so a warning should be fine, I'd just like to see smth that covers the code you are adding If you have a good idea for a proper fix, I would be glad to review and get it in if implemented

Great, I'd be happy to add some test coverage for what I've done here! If I have the time for a proper fix, I can follow up with a proposal.

@george-zubrienko
Copy link
Collaborator

pick the first module that happens to have a variable of this name" is also screwed.

We have a running joke in the office that all Python development is hope-driven ;)
If you manage to get a quick test for the warning in, I'll release this one and one more for abstract classes on the weekend.
Also small disclaimer, I plan to remove tests for 3.7 and 3.8 as being required. If you run into issues with 3.7 in future contributions, just assert True/accept failure if ver <=3.9

Copy link
Collaborator

@george-zubrienko george-zubrienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Let's get this merged :)

@george-zubrienko george-zubrienko merged commit dc63902 into lidatong:master Jun 9, 2024
9 checks passed
renovate bot added a commit to ixm-one/pytest-cmake-presets that referenced this pull request Jun 9, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [dataclasses-json](https://togithub.com/lidatong/dataclasses-json)
([changelog](https://togithub.com/lidatong/dataclasses-json/releases)) |
`0.6.6` -> `0.6.7` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/dataclasses-json/0.6.7?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/dataclasses-json/0.6.7?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/dataclasses-json/0.6.6/0.6.7?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/dataclasses-json/0.6.6/0.6.7?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>lidatong/dataclasses-json (dataclasses-json)</summary>

###
[`v0.6.7`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.6.7)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.6.6...v0.6.7)

#### What's Changed

- feat: support abstract collections by
[@&#8203;PJCampi](https://togithub.com/PJCampi) in
[lidatong/dataclasses-json#532
- Warn about buggy type resolution by
[@&#8203;DavidCain](https://togithub.com/DavidCain) in
[lidatong/dataclasses-json#529

#### New Contributors

- [@&#8203;DavidCain](https://togithub.com/DavidCain) made their first
contribution in
[lidatong/dataclasses-json#529

**Full Changelog**:
lidatong/dataclasses-json@v0.6.6...v0.6.7

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/ixm-one/pytest-cmake-presets).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZTpkZXBlbmRlbmNpZXMiXX0=-->

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@DavidCain DavidCain deleted the dcain-document-typing-issue branch June 20, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants