Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Conversion of Series dtype from object to Int16 etc. fails #41060

Open
2 of 3 tasks
RagBlufThim opened this issue Apr 20, 2021 · 1 comment
Open
2 of 3 tasks

BUG: Conversion of Series dtype from object to Int16 etc. fails #41060

RagBlufThim opened this issue Apr 20, 2021 · 1 comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Dtype Conversions Unexpected or buggy dtype conversions NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@RagBlufThim
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

str_obj_ser = pd.Series(["1", "2", "3", None], dtype="object")
mix_obj_ser = pd.Series([1, "2", 3.0, None], dtype="object")
num_obj_ser = pd.Series([1, 2, 3.0, None], dtype="object")

str_obj_ser.astype("Int16")  # Exception
mix_obj_ser.astype("Int16")  # different Exception
num_obj_ser.astype("Int16")  # works

str_obj_ser.astype("string").astype("Int16")  # works
mix_obj_ser.astype("string").astype("Int16")  # Exception
num_obj_ser.astype("string").astype("Int16")  # Exception

str_obj_ser.astype("string").astype("Float64").astype("Int16")  # works
mix_obj_ser.astype("string").astype("Float64").astype("Int16")  # works
num_obj_ser.astype("string").astype("Float64").astype("Int16")  # works

str_obj_ser.astype("Float64").astype("Int16")  # Exception
mix_obj_ser.astype("Float64").astype("Int16")  # works
num_obj_ser.astype("Float64").astype("Int16")  # works

Problem description

The conversion of an object-series with some text in it to one of the nullable integer dtypes fails even though all elements of the series are convertable to integers (or to pd.NA).

This issue seems to be related to #40729, but the workaround described there for Floatxx doesn't work in all cases here:
The detour via dtype string is not evough if an element in the object-series is a float because int("3.0") doesn't work (but int(3.0) does). A detour via string and then Float64 is necessary for all examples given above to work (for some cases, but not all, the string step can be omitted).
But even the detour via string and Float64 to Int16 is not guaranteed to always work, e.g. if an element of the series is an object with an __int__() method (returning a number) and a __str__() method (returning a description, not an integer literal).

I think the topic of this issue has also been mentioned in the discussion of #39616.

Expected Output

The conversion of a series of dtype object to one of the nullable integer dtypes should always work if all elements of the series are convertable to the target dtype.

At least something along the lines of

  • element is None, pd.NA, np.nan, ... -> pd.NA
  • otherwise -> int(element)

I'd even prefer something like

  • element is None, pd.NA, np.nan, ... -> pd.NA
  • element is string, bytes or bytearray -> int(element, 0)
  • otherwise -> int(element)

such that string literals like "0x7f" work.
As the latter doesn't work with the current string -> Int16 conversion though, that would be more like an enhancement than a bugfix.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.9.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
...

pandas : 1.2.4
numpy : 1.20.2
pytz : 2021.1
dateutil : 2.8.1
...

@RagBlufThim RagBlufThim added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 20, 2021
@mzeitlin11
Copy link
Member

Thanks for looking into this with such detail @RagBlufThim. Cleaning up this behavior would be great

@mzeitlin11 mzeitlin11 added API - Consistency Internal Consistency of API/Behavior Dtype Conversions Unexpected or buggy dtype conversions NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 1, 2021
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone Jul 1, 2021
@mzeitlin11 mzeitlin11 mentioned this issue Nov 6, 2021
3 tasks
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Dtype Conversions Unexpected or buggy dtype conversions NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

No branches or pull requests

3 participants