Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Type coercion since 0.5.9 #466

Closed
jfsmith-at-coveo opened this issue Aug 9, 2023 · 17 comments
Closed

[BUG] Type coercion since 0.5.9 #466

jfsmith-at-coveo opened this issue Aug 9, 2023 · 17 comments

Comments

@jfsmith-at-coveo
Copy link

jfsmith-at-coveo commented Aug 9, 2023

Description

I'll let you guys explain that this is intended and why, but starting with 0.5.9 there is type coercion going on when using the from_dict() method.

A non-zero int becomes True on a bool field, for instance. An int becomes its stringified version on a string field.

This feels to me like a breaking change, but I don't see this anywhere in the release notes. Nor in the version number in any case.

Was this intended? If so, why?

Code snippet that reproduces the issue

from dataclasses import dataclass
from dataclasses_json import DataClassJsonMixin

@dataclass
class MyDataclass(DataClassJsonMixin):
    this_must_be_bool: bool
    this_must_be_string: str


d = {"this_must_be_bool": 1234, "this_must_be_string": 1234}
c = MyDataclass.from_dict(d)

assert isinstance(c.this_must_be_bool, bool)
assert isinstance(c.this_must_be_string, str)

Describe the results you expected

So this snippet raises with 0.5.8. Anything after and including 0.5.9 does not raise.

Python version you are using

3.11.4

Environment description

Tested with dataclasses-json 0.5.9, 0.5.12 and 0.5.14

@jfsmith-at-coveo jfsmith-at-coveo added the bug Something isn't working label Aug 9, 2023
@george-zubrienko
Copy link
Collaborator

This is not even type coercion, this is defo a regression because 1234 is not a boolean value. I'm looking into this.

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Aug 11, 2023

This is the culprit in core.py lines 247-250

    elif _issubclass_safe(field_type, (int, float, str, bool)):
        res = (field_value
               if isinstance(field_value, field_type)
               else field_type(field_value))

I'll check where this was introduced. This PR: #375

@george-zubrienko
Copy link
Collaborator

Now to sum up:

  • The 42: str to 42: int is a valid auto-conversion in many frameworks and DCJ should not be an exception, thus this should not raise, so
d = {"this_must_be_int": 1234, "this_must_be_string": 1234}
MyClass.from_dict(d) # MyClass(this_must_be_int=1234, this_must_be_string="1234")

should be fine. Sames goes for floats etc. But for bool this goes side way because bool(any non-empty value) in python will always return True and that is definitely wrong. I will submit a PR fixing this for bool, let me know your thoughts on this change, if you think this warrants a bump to 0.6 @lidatong @s-vitaliy @matt035343 @USSX-Hares

@USSX-Hares
Copy link
Collaborator

@george-zubrienko in v1 all these coercions should be a feature flag. It may still be the default behavior btw.

@matt035343
Copy link
Collaborator

Not sure minor bump is needed, since I would categorize it as a patch

@matt035343
Copy link
Collaborator

@USSX-Hares I agree that it should be a feature flag. @george-zubrienko please add to task list.

I changed my mind, I recommend bumping to 0.6.0.

@jfsmith-at-coveo
Copy link
Author

jfsmith-at-coveo commented Aug 14, 2023

Thanks everyone for the quick feedback! I still have some thoughts on this issue:

  • The 42: str to 42: int is a valid auto-conversion in many frameworks and DCJ should not be an exception, thus this should not raise, so [...] Sames goes for floats etc.

I do not use Python frameworks on a daily basis and I do not know any that would do that to strings and numerals. I think this is a bad idea, for three reasons:

  1. As python is a strongly typed language, no Python developer should ever expect a type change at runtime.

  2. Type annotations are annotations. I think they should stay that way. It feels wrong to have them suddenly begin to have any operational value at runtime.

  3. Regardless of the above (those are just my two cents, you guys do your stuff and it's fine), it still is an unexpected change of behavior between DCJ versions. Whatever you guys decided to do with booleans, I strongly believe the other types have to be treated the same way on that basis alone.

Thanks again all!

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Aug 14, 2023

As python is a strongly typed language, no Python developer should ever expect a type change at runtime.

Python also has dynamic typing and is an interpreted language, the very existence of Union type contradicts this statement. Type coercion is a common thing where JSON/YAML are used in real life, esp when dealing with configurations, so this is in fact a common practice. If you use app configurations (appsettings.json, HOCON etc), this is daily life, unless you go strict mode for everything. In HOCON you can implicitly convert stuff like 2s to timespan of 2 seconds and this is actually an awesome feature. If you want a strict typecheck on JSON read, this is possible to do, but I agree this should be a feature flag and v0 of DCJ doesn't offer that, but I am working on v1 PR that will enable that.

Type annotations are annotations. I think they should stay that way. It feels wrong to have them suddenly begin to have any operational value at runtime.

This statement is the opposite of the first one. If we treat everything as recommendation, then we should have coercions as default - remember JSON is a text format and doesn't carry schema information like AVRO or PARQUET. In fact, it is a quite simple thing, how target types should be treated, but you seem to be more inclined towards strict compliance, which is one way to look at this, but not the only way ;)

it still is an unexpected change of behavior between DCJ versions

I'd argue with this. As stated in the reasoning from the original author #375, since DCJ has marshmellow interop, quote

This differs from marshmallow decode behavior which properly converts to the integer value.

we should behave the way marshmellow users expect, and lack of type coercion was a bug, that turned out to be a feature for some. Thus, I agree that for some users - definitely, for others - long awaited fix that doesn't warrant a minor bump.

However, even the boolean change I've opened a PR for can cause even more issues to be opened, where people either:

  • are annoyed coercion happens first place
  • are annoyed coercion happens, but stuff like [] or {} are not converted into booleans. Btw, conversion of "empty" objects into booleans is language-implementation specific, so we risk a huge flame war if we try to enforce something specific here

Thus, I will keep this issue opened and ref it to #442 where we will make both crowds happy by moving this behaviour to a feature flag, so everybody can have their own little DCJ doing the right thing. Until then, help welcome/stay tuned :)

@george-zubrienko george-zubrienko added requires-follow-up api/v0 and removed bug Something isn't working labels Aug 14, 2023
@jfsmith-at-coveo
Copy link
Author

jfsmith-at-coveo commented Aug 14, 2023

Python also has dynamic typing and is an interpreted language, the very existence of Union type contradicts this statement.

I don't want this to sidetrack into unrelated discussions and all, but I should point out that you are confusing dynamic/static with weak/strong typing. You should know that python is both dynamic AND strongly typed. Names that refer to objects are not bound to any type and may be reassigned freely. That's the dynamic part, and that's what Union is for. Union is not a type for the object it annotates, i.e. it's not your object that is "Union", it's the annotation itself. And that tells the type checker that the name X may refer to an object of either of A or B (or more) type. At runtime X will be strictly typed, and it will be A or B, and Union will have no existence. That's all what this means. Type is not dynamic. References are.

That's the context of my intervention here. Like I said, you proceed as you see fit, but I do believe that treating python dataclasses as JSON/YAML-like objects (by default or not) is, though convenient, definitely not pythonic. 😅

@george-zubrienko
Copy link
Collaborator

george-zubrienko commented Aug 14, 2023

I don't want this to sidetrack into unrelated discussions and all

That is exactly what is happening here. If you are unhappy with the response to the issue and you think some changes are in order, that is one thing and can be discussed, but so far I do not see any response to the reasons outlined in my response, including the marshmellow interop alignment, which was the main reason for this change in the first place. We have to take into account the broader audience here :)

In case you create an issue and put a bug label on it, we must investigate and decide on the best course of action, which includes you as the issue author, taking responsibility and helping us in this journey - for example push for minor version bump - do you think this is needed, or not? The but I do believe that treating python dataclasses as JSON/YAML-like objects (by default or not) is, though convenient, definitely not pythonic does not really help to decide the course of action and frankfully just adds clutter to the issue thread, we could have gone with a Discussion instead then ;)

You should know that python is both dynamic AND strongly typed

I never said the opposite :) Important part to note that we are dealing with text format serde in a language where you effectively have very little control of the actual type you are getting after deserialization, the magic "at runtime strictly typed" X.a, if you seen that part of DCJ source code, is actually decided by the type hint on a dataclass field, because there is no other way to get that information. This can lead to issues, like the ones we see with booleans. That being said, please focus on the original problem and what do you think of the proposed solution above. I'd like to read that part instead.

@jfsmith-at-coveo
Copy link
Author

Those are fair points. My apologies. I do acknowledge there is a broader context here with DCJ, which I tend to forget. I mainly use from_dict(), with which the stakes are simpler. I have nothing else to say about the solution, in the sense that in the end I believe it is the developers' call. Bumping to 0.6.0 is fine! 😃

@george-zubrienko
Copy link
Collaborator

Bumping to 0.6.0 is fine

Good, I tend to think we should have probably done this too, which @matt035343 agreed as well. But given that the damage has already been done, what do you think is the best course of action? Should we just bump the next release, will that be an acceptable solution?

@jfsmith-at-coveo
Copy link
Author

jfsmith-at-coveo commented Aug 15, 2023

Bumping to a new version and then documenting seems to me the best thing to do in this case. In the release notes for 0.6.0, just specify that there's a fix for a regression introduced in 0.5.9, and then explain what is the intended behavior from this version onward.

@george-zubrienko
Copy link
Collaborator

Alright, then this is what we'll do. I think we'll have some commits to release by end of this week, so expect 0.6.0 around Sunday :)

@george-zubrienko
Copy link
Collaborator

Opened an issue for v1 - when we get there, would be nice if you have time to participate in the discussion :) Thanks a lot for bringing this up!

@george-zubrienko
Copy link
Collaborator

Small update, moving this to next week as we might have another commit which can introduce some funky behaviours, so I'm pushing 0.6.0 to next release some time next week

@george-zubrienko
Copy link
Collaborator

@jfsmith-at-coveo v0.6.0 released with a note on type coercion and a few other things. I'm closing this issue as resolved and promise this will be further addressed in v1 API when it comes out.

renovate bot added a commit to ixm-one/pytest-cmake-presets that referenced this issue Dec 10, 2023
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [dataclasses-json](https://togithub.com/lidatong/dataclasses-json)
([changelog](https://togithub.com/lidatong/dataclasses-json/releases)) |
`^0.5.7` -> `^0.6.0` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/dataclasses-json/0.6.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/dataclasses-json/0.6.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/dataclasses-json/0.5.9/0.6.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/dataclasses-json/0.5.9/0.6.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the warning logs for
more information.

---

### Release Notes

<details>
<summary>lidatong/dataclasses-json (dataclasses-json)</summary>

###
[`v0.6.3`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.6.3)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.6.2...v0.6.3)

#### What's Changed

- Fixes catchall inheritance issue by
[@&#8203;jasonrock-a3](https://togithub.com/jasonrock-a3) in
[lidatong/dataclasses-json#500

#### New Contributors

- [@&#8203;jasonrock-a3](https://togithub.com/jasonrock-a3) made their
first contribution in
[lidatong/dataclasses-json#500

**Full Changelog**:
lidatong/dataclasses-json@v0.6.2...v0.6.3

###
[`v0.6.2`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.6.2)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.6.1...v0.6.2)

#### What's Changed

- fix: allow using CatchAll with postponed evaluation of annotations by
[@&#8203;2ynn](https://togithub.com/2ynn) in
[lidatong/dataclasses-json#490
- Fix PEP 0673 before 3.11 by
[@&#8203;george-zubrienko](https://togithub.com/george-zubrienko) in
[lidatong/dataclasses-json#487
- fix mypy error when assigning to dataclass_json_config by
[@&#8203;MickaelBergem](https://togithub.com/MickaelBergem) in
[lidatong/dataclasses-json#486
- Fix an issue introduced with hetero tuple decode by
[@&#8203;george-zubrienko](https://togithub.com/george-zubrienko) in
[lidatong/dataclasses-json#493

#### New Contributors

- [@&#8203;2ynn](https://togithub.com/2ynn) made their first
contribution in
[lidatong/dataclasses-json#490
- [@&#8203;MickaelBergem](https://togithub.com/MickaelBergem) made their
first contribution in
[lidatong/dataclasses-json#486

**Full Changelog**:
lidatong/dataclasses-json@v0.6.1...v0.6.2

###
[`v0.6.1`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.6.1)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.6.0...v0.6.1)

##### What's Changed

- Add links to make PyPI a better maintainer reference by
[@&#8203;pydanny](https://togithub.com/pydanny) in
[lidatong/dataclasses-json#482
- improve Union deserialization when "\__type" field specifier is not
present by [@&#8203;idbentley](https://togithub.com/idbentley) in
[lidatong/dataclasses-json#478

##### New Contributors

- [@&#8203;pydanny](https://togithub.com/pydanny) made their first
contribution in
[lidatong/dataclasses-json#482
- [@&#8203;idbentley](https://togithub.com/idbentley) made their first
contribution in
[lidatong/dataclasses-json#478

**Full Changelog**:
lidatong/dataclasses-json@v0.6.0...v0.6.1

###
[`v0.6.0`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.6.0)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.5.14...v0.6.0)

##### What's Changed

- Improve dataclass_json and \_process_class type annotations by
[@&#8203;ringohoffman](https://togithub.com/ringohoffman) in
[lidatong/dataclasses-json#475
- Update Poetry version used for 3.7 test suite and change
Requires-Python boundary by
[@&#8203;george-zubrienko](https://togithub.com/george-zubrienko) in
[lidatong/dataclasses-json#476
- Fix for
[#&#8203;239](https://togithub.com/lidatong/dataclasses-json/issues/239):
Union inside List or Dict is not deserialized as the correspond… by
[@&#8203;pawelwilczewski](https://togithub.com/pawelwilczewski) in
[lidatong/dataclasses-json#464

Due to a behaviour change discovered in
[lidatong/dataclasses-json#466
and also as a matter of preparation for full deprecation of Py3.7, we
are bumping the minor version to 0.6.0. Most important change is that
since 0.5.9 builtins are coerced automatically without throwing an
exception. Please visit the issue for more info :)

##### New Contributors

- [@&#8203;ringohoffman](https://togithub.com/ringohoffman) made their
first contribution in
[lidatong/dataclasses-json#475
- [@&#8203;pawelwilczewski](https://togithub.com/pawelwilczewski) made
their first contribution in
[lidatong/dataclasses-json#464

**Full Changelog**:
lidatong/dataclasses-json@v0.5.15...v0.6.0

###
[`v0.5.14`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.5.14)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.5.13...v0.5.14)

#### What's Changed

- Temporarily disable code coverage publish for fork PRs by
[@&#8203;george-zubrienko](https://togithub.com/george-zubrienko) in
[lidatong/dataclasses-json#444
- fix mashmallow fields.Tuple creation by
[@&#8203;healthmatrice](https://togithub.com/healthmatrice) in
[lidatong/dataclasses-json#434
- Allow the global config dictionary keys to also be Optional\[type] by
[@&#8203;sumnerevans](https://togithub.com/sumnerevans) in
[lidatong/dataclasses-json#255
- Fix global_config.mm_fields having no effect by
[@&#8203;sigmunau](https://togithub.com/sigmunau) in
[lidatong/dataclasses-json#253
- Fixes recursion bug related to enum flags by
[@&#8203;matt035343](https://togithub.com/matt035343) in
[lidatong/dataclasses-json#447
- Update Python version boundaries to include 3.12 by
[@&#8203;cdce8p](https://togithub.com/cdce8p) in
[lidatong/dataclasses-json#449

#### New Contributors

- [@&#8203;healthmatrice](https://togithub.com/healthmatrice) made their
first contribution in
[lidatong/dataclasses-json#434
- [@&#8203;sigmunau](https://togithub.com/sigmunau) made their first
contribution in
[lidatong/dataclasses-json#253
- [@&#8203;cdce8p](https://togithub.com/cdce8p) made their first
contribution in
[lidatong/dataclasses-json#449

**Full Changelog**:
lidatong/dataclasses-json@v0.5.13...v0.5.14

###
[`v0.5.13`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.5.13)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.5.12...v0.5.13)

#### What's Changed

- Fixes bug related to Tuples defined with ellipsis by
[@&#8203;matt035343](https://togithub.com/matt035343) in
[lidatong/dataclasses-json#440
- Revert type hint in annotation call by
[@&#8203;george-zubrienko](https://togithub.com/george-zubrienko) in
[lidatong/dataclasses-json#441

**Full Changelog**:
lidatong/dataclasses-json@v0.5.12...v0.5.13

###
[`v0.5.12`](https://togithub.com/lidatong/dataclasses-json/releases/tag/v0.5.12)

[Compare
Source](https://togithub.com/lidatong/dataclasses-json/compare/v0.5.9...v0.5.12)

#### What's Changed

- Fix multiline scripts in CI by
[@&#8203;george-zubrienko](https://togithub.com/george-zubrienko) in
[lidatong/dataclasses-json#439

**Full Changelog**:
lidatong/dataclasses-json@v0.5.11...v0.5.12

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/ixm-one/pytest-cmake-presets).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi43OS4xIiwidXBkYXRlZEluVmVyIjoiMzcuODEuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==-->

---------

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Izzy Muerte <63051+bruxisma@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Izzy Muerte <63051+bruxisma@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants