BUG: STD modifies groupby target column when as_index=False #10355

jxrossel · 2015-06-15T08:14:27Z

xref #14547 for other tests

In pandas 0.16.2 (and already in 0.16.0), using std() for aggregation after a groupby( 'my_column', as_index=False) modifies 'my_column' by taking its sqrt(). Example:

df = pandas.DataFrame({
               'a' : [1,1,1,2,2,2,3,3,3],
               'b' : [1,2,3,4,5,6,7,8,9],
})
df.groupby('a',as_index=False).std()
Out[5]: 
          a  b
0  1.000000  1
1  1.414214  1
2  1.732051  1

The square root values of 'a' are returned instead of 1, 2, 3.

INSTALLED VERSIONS

commit: None
python: 2.7.9.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH

pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.0
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.1
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None

jreback · 2015-06-15T10:39:29Z

Something like this would fix it. care to do a pull-requests (and add some tests)?
(should remove the other function definition as well)

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index 4abdd11..3fd2436 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -1812,6 +1812,10 @@ class BinGrouper(BaseGrouper):
         'min': 'group_min_bin',
         'max': 'group_max_bin',
         'var': 'group_var_bin',
+        'std': {
+            'name' : 'group_var_bin',
+            'f' : lambda func, a: np.sqrt(func(a)),
+            },
         'ohlc': 'group_ohlc',
         'first': {
             'name': 'group_nth_bin',

jxrossel · 2015-06-15T13:55:43Z

Hi,
I'm kind of a newbie here (and in Python in general). What do you mean by pull-request ?

jreback · 2015-06-15T15:01:57Z

see contributing docs here

jxrossel · 2015-06-15T15:11:31Z

woaw, I didn't consider becoming a code contributer when mentioning the bug. I don't think I would be the correct person for that. I'd probably do more damage than good.

jreback · 2015-06-15T15:19:25Z

best way to start! give it a shot.

jorisvandenbossche · 2017-06-30T08:29:32Z

Other case were it raises an error (when grouping by non-numerical columns): #16799

alohia · 2018-03-12T09:14:26Z

This issue still exists in pandas 0.22. Doing std() after groupby tries to apply std() on the column being grouped by and raises an error if the column is 'str' for example. This happens when using drop_index=True in the groupby() call. How can I contribute to fix this issue?

jreback · 2018-03-12T10:19:12Z

I put a patch that might work, needs tests, see the contributing docs here:http://pandas-docs.github.io/pandas-docs-travis/contributing.html

TakaakiFuruse · 2018-03-31T07:32:27Z

This code returns "ValueError: cannot insert a, already exists" error on pandas 0.22 with python 3.6.4.
(I have tried master and showed the same error also.)

import pandas as pd
df = pd.DataFrame({
               'a' : [1,1,1,2,2,2,3,3,3],
               'b' : [1,2,3,4,5,6,7,8,9],
})
df.groupby('a', as_index=False).agg({'a': 'count'})

Do you think the root cause is the same?

Output of pd.show_versions() is...

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: ja_JP.UTF-8
LOCALE: ja_JP.UTF-8

pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

TakaakiFuruse · 2018-03-31T14:09:37Z

For #10355 (comment), as I have found another similar behavior, I have created a new issue here #20566.

mukherjees · 2018-04-12T17:47:06Z

I can confirm the same problem as reported by @TakaakiFuruse (see two posts above), with pandas 0.22 and python 3.6.2. With the same example dataframe as he has, we see that the describe() command applied to the groupby object shows the correct results in the std columns for both a and b:
df.groupby('a', as_index=False).describe()

	a	a	a	a	a	a	a	a	b	b	b	b	b	b	b	b
	count	mean	std	min	25%	50%	75%	max	count	mean	std	min	25%	50%	75%	max
0	3.0	1.0	0.0	1.0	1.0	1.0	1.0	1.0	3.0	2.0	1.0	1.0	1.5	2.0	2.5	3.0
1	3.0	2.0	0.0	2.0	2.0	2.0	2.0	2.0	3.0	5.0	1.0	4.0	4.5	5.0	5.5	6.0
2	3.0	3.0	0.0	3.0	3.0	3.0	3.0	3.0	3.0	8.0	1.0	7.0	7.5	8.0	8.5	9.0

However, applying std() directly to the groupby object gives the wrong result for a:
df.groupby('a', as_index=False).std()

	a	b
0	1.0	1.0
1	1.4142135623730951	1.0
2	1.7320508075688772	1.0

Clearly, std() is not the same as .apply(np.std, ddof=1) [even though I thought that they were syntactically equivalent] because the latter again gives the right answer for both a and b:
df.groupby('a', as_index=False).apply(np.std, ddof=1)

	a	b
0	0.0	1.0
1	0.0	1.0
2	0.0	1.0

jreback added Bug Groupby labels Jun 15, 2015

jreback added this to the 0.17.0 milestone Jun 15, 2015

livia-b mentioned this issue Jul 25, 2015

groupby, as_index=False, with pandas.Series.count() as an agg #8381

Closed

jreback modified the milestones: Next Major Release, 0.17.0 Aug 19, 2015

jreback added Prio-low labels Aug 19, 2015

terrytangyuan mentioned this issue Sep 13, 2015

BUG: Fixed bug in groupby.std changing target column when as_index=False #11085

Closed

henrystokeley mentioned this issue Oct 12, 2015

BUG: GH10355 groupby std() no longer sqrts grouping cols #11300

Closed

henrystokeley mentioned this issue Nov 3, 2015

BUG: GH10355 groupby std() doesnt sqrt grouping cols #11507

Closed

jreback mentioned this issue Nov 1, 2016

Group-by/apply unexpected output with some operations when as_index=False #14547

Closed

ivaniadg mentioned this issue Jun 29, 2017

BUG: apply std to groupby with as_index=False #16799

Closed

jorisvandenbossche removed the Prio-low label Jun 30, 2017

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

xieyuheng mentioned this issue Feb 14, 2019

... #25315

Closed

alexcwatt added a commit to alexcwatt/pandas that referenced this issue Apr 27, 2019

BUG: Fix pandas-dev#10355, std() groupby calculation

1d65e9f

alexcwatt added a commit to alexcwatt/pandas that referenced this issue Apr 27, 2019

BUG: Fix pandas-dev#10355, std() groupby calculation

517b194

alexcwatt mentioned this issue Apr 27, 2019

BUG: Fix #10355, std() groupby calculation #26229

Closed

4 tasks

alexcwatt added a commit to alexcwatt/pandas that referenced this issue May 7, 2019

BUG: Fix pandas-dev#10355, std() groupby calculation

14eb325

jreback modified the milestones: Contributions Welcome, 0.25.0 May 7, 2019

jreback modified the milestones: 0.25.0, Contributions Welcome Jul 3, 2019

jbrockmendel removed the Effort Low label Oct 21, 2019

rhshadrach mentioned this issue Apr 18, 2020

BUG: DataFrameGroupby std/sem modify grouped column when as_index=False #33630

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.1 May 19, 2020

jreback closed this as completed in #33630 May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: STD modifies groupby target column when as_index=False #10355

BUG: STD modifies groupby target column when as_index=False #10355

jxrossel commented Jun 15, 2015 •

edited by jreback

Loading

jreback commented Jun 15, 2015

jxrossel commented Jun 15, 2015

jreback commented Jun 15, 2015

jxrossel commented Jun 15, 2015

jreback commented Jun 15, 2015

jorisvandenbossche commented Jun 30, 2017

alohia commented Mar 12, 2018 •

edited

Loading

jreback commented Mar 12, 2018

TakaakiFuruse commented Mar 31, 2018 •

edited

Loading

INSTALLED VERSIONS

TakaakiFuruse commented Mar 31, 2018

mukherjees commented Apr 12, 2018

BUG: STD modifies groupby target column when as_index=False #10355

BUG: STD modifies groupby target column when as_index=False #10355

Comments

jxrossel commented Jun 15, 2015 • edited by jreback Loading

INSTALLED VERSIONS

jreback commented Jun 15, 2015

jxrossel commented Jun 15, 2015

jreback commented Jun 15, 2015

jxrossel commented Jun 15, 2015

jreback commented Jun 15, 2015

jorisvandenbossche commented Jun 30, 2017

alohia commented Mar 12, 2018 • edited Loading

jreback commented Mar 12, 2018

TakaakiFuruse commented Mar 31, 2018 • edited Loading

INSTALLED VERSIONS

TakaakiFuruse commented Mar 31, 2018

mukherjees commented Apr 12, 2018

jxrossel commented Jun 15, 2015 •

edited by jreback

Loading

alohia commented Mar 12, 2018 •

edited

Loading

TakaakiFuruse commented Mar 31, 2018 •

edited

Loading