DataFrame with Int64 columns casts to float64 with .max()/.min() #32651

qwhelan · 2020-03-12T03:23:23Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

int64_info = np.iinfo("int64")
s = pd.Series([int64_info.max, None, int64_info.min], dtype=pd.Int64Dtype())
df = pd.DataFrame({"Int64": s})

df.max()
Int64    9.223372e+18
dtype: float64

Problem description

pd.Int64 data is converted to np.float64 in certain reduction operations on pd.DataFrame. This causes data corruption, as pd.Int64 is intended to avoid this exact issue.

Expected Output

df.max() should probably return a pd.Series of dtype='object' wrapping a pd.Int64 value.

Output of `pd.show_versions()`

``` INSTALLED VERSIONS ------------------ commit : 27ad779 python : 3.7.5.final.0 python-bits : 64 OS : Linux OS-release : 5.3.0-29-generic Version : #31-Ubuntu SMP Fri Jan 17 17:27:26 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+779.g27ad77971
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.4.0.dev0+62.g8ac3a4c8
fastparquet : 0.3.2
gcsfs : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
numba : 0.48.0

</details>

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-03-12T07:20:30Z

This is expected for now (but not less wrong of course) given how it is implemented (by converting to float). This will be solved by something like #30982 (but then for min/max)

qwhelan · 2020-03-12T09:30:39Z

@jorisvandenbossche Thanks for confirmation and the pointer. I put up a PR that's a bit of a work in progress still, but I think I could probably get it working over the weekend.

simonjayhawkins · 2020-07-16T09:32:37Z

fixed in #35254

jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Mar 12, 2020

qwhelan mentioned this issue Mar 12, 2020

BUG: process Int64 as ints for preservable ops, not as float64 #32652

Closed

5 tasks

simonjayhawkins mentioned this issue May 16, 2020

BUG: DataFrame with Int64 columns casts to float64 with .max()/.min() #34210

Closed

5 tasks

jorisvandenbossche added this to the 1.1 milestone Jun 15, 2020

jorisvandenbossche mentioned this issue Jun 15, 2020

ENH/PERF: enable column-wise reductions for EA-backed columns #32867

Closed

simonjayhawkins mentioned this issue Jul 13, 2020

BUG: Use correct ExtensionArray reductions in DataFrame reductions #35254

Merged

5 tasks

simonjayhawkins closed this as completed Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame with Int64 columns casts to float64 with .max()/.min() #32651

DataFrame with Int64 columns casts to float64 with .max()/.min() #32651

qwhelan commented Mar 12, 2020

jorisvandenbossche commented Mar 12, 2020

qwhelan commented Mar 12, 2020

simonjayhawkins commented Jul 16, 2020

DataFrame with Int64 columns casts to float64 with .max()/.min() #32651

DataFrame with Int64 columns casts to float64 with .max()/.min() #32651

Comments

qwhelan commented Mar 12, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jorisvandenbossche commented Mar 12, 2020

qwhelan commented Mar 12, 2020

simonjayhawkins commented Jul 16, 2020

Output of `pd.show_versions()`