Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_dtypes fails with int and str #32117

Closed
pjadzinsky opened this issue Feb 19, 2020 · 4 comments · Fixed by #32126
Closed

convert_dtypes fails with int and str #32117

pjadzinsky opened this issue Feb 19, 2020 · 4 comments · Fixed by #32126
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@pjadzinsky
Copy link

Code Sample, a copy-pastable example if possible

Thanks for an amazing package and maintaining it

# Your code here
s = pd.Series(["h", "i", 1])
s.convert_dtypes()

Problem description

Example raises an error, when it shouldn't. I think the case is missing from the tests

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

series with object dtype ?

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
In [6]: pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.2.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@pjadzinsky
Copy link
Author

As far as I can tell

inferred_dtype = lib.infer_dtype(input_array)

returns 'mixed-integer' which according to the definition in

>>> infer_dtype(['a', 1])

is correct. However this

if isinstance(inferred_dtype, str) and (

executes to True and
(
inferred_dtype = target_int_dtype
)

sets the 'inferred_dtype' to "Int64" which I think is wrong

Hope this helps

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 19, 2020

Yes, for this case taking "mixed-integer" as integer is indeed wrong:

if isinstance(inferred_dtype, str) and (
inferred_dtype == "mixed-integer"
or inferred_dtype == "mixed-integer-float"
):

@Dr-Irv do you remember for which use case mixed-integer is needed?
As for mix of integers and floats (which could still be all integer-like) it gives "mixed-integer-float"

@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 19, 2020
@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 19, 2020
@jorisvandenbossche
Copy link
Member

And thanks for the report, @pjadzinsky !

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 20, 2020

@Dr-Irv do you remember for which use case mixed-integer is needed?
As for mix of integers and floats (which could still be all integer-like) it gives "mixed-integer-float"

I think this is just my mistake in understanding the meaning of "mixed-integer" when I wrote convert_dtypes(). I will take a look at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants