Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected casting order of List[Union[float, int]] when importing Pandas (1.2.4) #2835

Closed
3 tasks done
wfranceys opened this issue May 25, 2021 · 9 comments
Closed
3 tasks done
Labels
bug V1 Bug related to Pydantic V1.X

Comments

@wfranceys
Copy link

Checks

  • I added a descriptive title to this issue
  • I have searched (google, github) for similar issues and couldn't find anything
  • I have read and followed the docs and still think this is a bug

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.8.2
            pydantic compiled: True
                 install path: /Users/wfranceys/tmp/pydantic-casting/.venv/lib/python3.7/site-packages/pydantic
               python version: 3.7.4 (default, Sep 13 2019, 10:50:17)  [Clang 10.0.1 (clang-1001.0.46.4)]
                     platform: Darwin-19.6.0-x86_64-i386-64bit
     optional deps. installed: ['typing-extensions']

Output of python -c "import pandas as pd; print(pd.__version__)"

1.2.4
# main.py

from pydantic import BaseModel

from typing import List, Union

# Toggle import on and off to get different result
import pandas as pd

class Model(BaseModel):
    data: List[Union[float, int]]


data = [0.1234, 0.4567]
print(Model(data=data).dict())

What I would expect based on the behaviour of Unions:

{'data': [0.1234, 0.4567]}

What I see running the script above (with the import pandas as pd):

$ python main.py
{'data': [0, 0]}

What I see when removing the pandas import from above:

$ python main.py
{'data': [0.1234, 0.4567]}

Notes

Dockerfile to reproduce

Dockerfile:

FROM python:slim-buster

RUN python -m pip install pydantic==1.8.2 pandas==1.2.4

# See above
COPY main.py .

CMD python main.py

Results:

$ docker build -t pydantic-test .
$ docker run -t pydantic-test
{'data': [0, 0]}
Weirdness with pandas - what is causing the issue?

Now the part which looses me completely

Added into this collapsible section as it seems like a red herring..

  • Create a virtual environment (.venv): python -m venv .venv
  • Activate the environment: source .venv/bin/activate
  • Install dependencies: pip install pandas==1.2.4 pydantic==1.8.2
  • Run the main.py script above:
$ python main.py
{'data': [0, 0]}
  • Go to .venv/lib/python3.7/site-packages/pandas/io/formats/format.py
  • Go to ~L1565, it should look like
def format_percentiles(
    percentiles: Union[
        np.ndarray, List[Union[int, float]], List[float], List[Union[str, float]]
    ]
) -> List[str]:
  • Remove the types on the input, it should now look like
def format_percentiles(
    percentiles
) -> List[str]:
  • Now rerun the above script:
$ python main.py
{'data': [0.1234, 0.4567]}

And we now have the correct parsing again 🤔

Toggle the types on and off to get the correct results

@wfranceys wfranceys added the bug V1 Bug related to Pydantic V1.X label May 25, 2021
@leuduan
Copy link

leuduan commented May 26, 2021

Actually I guess this has nothing to do with pydantic. It's something weird from pandas
You can try this snipnet (no pydantic model involved)

from typing import List, Union

# Toggle import on and off to get different result
# import pandas as pd

class Model(object):
    data: List[Union[float, int]]

print(Model.__annotations__['data'])

It will print typing.List[typing.Union[float, int]] if pandas is not imported (which is correct) and print typing.List[typing.Union[int, float]] (which is incorrect and leads to that behavior in your code) if pandas is imported.
Quite weird!

@PrettyWood
Copy link
Member

Thanks @leuduan for catching this! The side effect comes indeed from pandas.
FYI @wfranceys it should not be a problem in v1.9 with Config.smart_union

from pydantic import BaseModel

from typing import List, Union

import pandas as pd


class Model(BaseModel, smart_union=True):
    data: List[Union[float, int]]


data = [0.1234, 0.4567]
assert Model(data=data).dict() == {'data': [0.1234, 0.4567]}

@wfranceys
Copy link
Author

wfranceys commented May 26, 2021

Thanks both @leuduan @PrettyWood - good catch on the separation from Pydantic!

Found an smaller example a smaller example which shows the side effect,

from typing import List, Union

# Toggle comments for this to change behaviour
# def example(x: List[Union[int, float]]):
#     pass

class Model(object):
    data: List[Union[float, int]]


print(Model.__annotations__['data'])

And an even smaller example showing how the ordering itself changes in typing

from typing import List, Union

a = List[Union[float, int]]
a.__args__
>>> (typing.Union[float, int],)

b = List[Union[int, float]]
b.__args__
>>> (typing.Union[float, int],)


id(a)
>>> 4479983440
id(b)
>>> 4479983440

So it looks as soon as a is created, we will retrieve back a when we define b as order is ignored.

@djpugh
Copy link
Contributor

djpugh commented May 26, 2021

@PrettyWood it would probably be helpful to flag that this can occur in the docs - given https://docs.python.org/3/library/typing.html#typing.Union says

When comparing unions, the argument order is ignored, e.g.:
Union[int, str] == Union[str, int]

That the docs on https://pydantic-docs.helpmanual.io/usage/types/#unions should maybe have a warning about that

@djpugh
Copy link
Contributor

djpugh commented May 26, 2021

it actually occurs in the Union arg directly

In [23]: x = typing.Union[int, float]

In [24]: y = typing.Union[float, int]

In [25]: x
Out[25]: typing.Union[int, float]

In [26]: y
Out[26]: typing.Union[float, int]

In [27]: x.__hash__()
Out[27]: -8081589475256396770

In [28]: y.__hash__()
Out[28]: -8081589475256396770

djpugh added a commit to djpugh/pydantic that referenced this issue May 26, 2021
Related to pydantic#2835 but should be resolved by pydantic#2092
@djpugh
Copy link
Contributor

djpugh commented May 26, 2021

I've opened a PR for a docs update #2839 (but may not be needed depending on how docs are built and the release plan for #2092)

@asmodehn
Copy link

Just a quick note that I experienced the same issue with Optional[Union[float,str]] and import requests

@wfranceys
Copy link
Author

Just a quick note that I experienced the same issue with Optional[Union[float,str]] and import requests

@asmodehn It's a tricky bug isn't it!

@asmodehn
Copy link

asmodehn commented Feb 1, 2022

Tricky to track down, but it boils down to this:

The pydantic doc says:

By default, as explained here, pydantic tries to validate (and coerce if it can) in the order of the Union.

However there is no such thing as "the order of the Union" as explained in python docs : https://docs.python.org/3/library/stdtypes.html#types-union

int | str == str | int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V1 Bug related to Pydantic V1.X
Projects
None yet
Development

No branches or pull requests

5 participants