Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.gropuby().mean() incorrect result #22487

Closed
tinchoroman opened this issue Aug 23, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@tinchoroman
Copy link

commented Aug 23, 2018

Anybody knows why I'm having different results when I apply the same operator to the same DataFrame but using groupby?
When using groupby , It returned negative values while all values are positive.

from pandas import DataFrame
df = DataFrame({"user":["A", "A", "A", "A", "A"],
                            "connections":[18446744073699999744, 4970, 4749, 4719, 4704]})

df.mean()

connections 3.689349e+18
dtype: float64

df.groupby("user")["connections"].mean()

user
A -1906546.0
Name: connections, dtype: float64

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd

This comment has been minimized.

Copy link
Member

commented Aug 23, 2018

Can you try on master? Looks like an int overflow somewhere if still present investigation and PRs are always welcome

@tinchoroman

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

Hi WillAyd, thanks for your prompt response. This is the first time I post an issue. Could you please explain little further what you exactly mean by "try on master" ? Thanks in advance!

@tinchoroman

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

I've upgraded to latest version and the problem still persists. In the investigation line that WillAyd suggests, the same example whit float numbers worked fine.

df = DataFrame({"user":["A", "A", "A", "A", "A"],
           "connections":[18446744073699999744.0, 4970.0, 4749.0, 4719.0, 4704.0]})

df.mean()
connections    3.689349e+18
dtype: float64

df.groupby("user")["connections"].mean()
user
A    3.689349e+18
Name: connections, dtype: float64

df.mean()[0] == df.groupby("user")["connections"].mean()[0]
True

troels added a commit to troels/pandas that referenced this issue Sep 9, 2018

BUG SeriesGroupBy.mean() overflowed on some integer array (pandas-dev…
…#22487)

When integer arrays contained integers that could were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.

troels added a commit to troels/pandas that referenced this issue Sep 11, 2018

BUG SeriesGroupBy.mean() overflowed on some integer array (pandas-dev…
…#22487)

When integer arrays contained integers that could were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.

troels added a commit to troels/pandas that referenced this issue Sep 16, 2018

BUG SeriesGroupBy.mean() overflowed on some integer array (pandas-dev…
…#22487)

When integer arrays contained integers that could were outside
the range of int64, the conversion would overflow.
Instead only allow allow safe casting and if a safe cast can not
be done, cast to float64 instead.

@jreback jreback added this to the 0.24.0 milestone Sep 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.