You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pd.Int64 data is converted to np.float64 in certain reduction operations on pd.DataFrame. This causes data corruption, as pd.Int64 is intended to avoid this exact issue.
Expected Output
df.max() should probably return a pd.Series of dtype='object' wrapping a pd.Int64 value.
Output of pd.show_versions()
```
INSTALLED VERSIONS
------------------
commit : 27ad779
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-29-generic
Version : #31-Ubuntu SMP Fri Jan 17 17:27:26 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
This is expected for now (but not less wrong of course) given how it is implemented (by converting to float). This will be solved by something like #30982 (but then for min/max)
@jorisvandenbossche Thanks for confirmation and the pointer. I put up a PR that's a bit of a work in progress still, but I think I could probably get it working over the weekend.
Code Sample, a copy-pastable example if possible
Problem description
pd.Int64
data is converted tonp.float64
in certain reduction operations onpd.DataFrame
. This causes data corruption, aspd.Int64
is intended to avoid this exact issue.Expected Output
df.max()
should probably return apd.Series
ofdtype='object'
wrapping apd.Int64
value.Output of
pd.show_versions()
pandas : 1.1.0.dev0+779.g27ad77971
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2.post20191203
Cython : 0.29.14
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.4.0.dev0+62.g8ac3a4c8
fastparquet : 0.3.2
gcsfs : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.14.1
xlrd : None
xlwt : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: