cumsum sums the groupby column #5614

hayd · 2013-11-28T20:37:33Z

It shouldn't sum the groupby'd col (in fact index col should be the index, if groupby as_index).

In [13]: df = pd.DataFrame([[1, 2, np.nan], [1, np.nan, 9], [3, 4, 9]], columns=['A', 'B', 'C'])

In [14]: g = df.groupby('A')

In [16]: g.cumsum()
Out[16]: 
   A   B   C
0  1   2 NaN
1  2 NaN   9
2  3   4   9

[3 rows x 3 columns]

Nature of it being dispatch. Should fix up for 0.14 possibly along with some other whitelisted groupby functions.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2013-11-29T08:01:56Z

What would be the expected output? Something like this?:

In [29]: g.cumsum
Out[29]:
       B   C
  A
0 1   2 NaN
1   NaN   9
2 3   4   9

And should it then also be the case for cumcount if as_index=True?

hayd · 2013-11-29T08:26:47Z

@jorisvandenbossche RE the index of cumcount, possibly yes it should respect as_index... I think it's debatable if this would ever be desired though... the main problem however is it's slow (I don't think efficient way to append index to index to make MI) and this is the default. I had thought I had posted about this somewhere but can't find issue...

I think so, though like I say I think we need to have a discussion about as_index (there are at least three different ways used in groupby atm)... I had a partially filled in issue about it from a week or so ago... :s will look at it again after the weekend and try to post it. It's kinda a mess and some conventions are of dubious value (e.g. that of head)

jorisvandenbossche · 2013-11-29T08:32:04Z

Yes, you did :-) Here: #4646 (comment)
And it is indeed, for cumcount, maybe in some way more consistent to also return a MI, but I also think you mostly wouldn't want it.

hayd · 2014-01-27T21:36:28Z

I think should add some UserWarnings in 0.14 about this kind of behaviour, link to #5755.

jreback · 2014-04-06T15:08:15Z

@hayd you have anything in the works about this? push to 0.15 otherwise

hayd · 2014-04-06T17:20:14Z

I think I do, hope to get in the week.

jreback · 2014-05-01T14:17:59Z

ping!

jreback · 2014-05-01T14:25:45Z

I think this is closed by #7000, maybe just add a test?

hayd · 2014-05-01T16:46:59Z

Weirdly with the above example we don't have A as the index!

In [4]: df = pd.DataFrame([[1, 2, np.nan], [1, np.nan, 9], [3, 4, 9]], columns=['A', 'B', 'C'])

In [5]: g = df.groupby('A')

In [6]: g.cumsum()  # should have A as index
Out[6]:
    B   C
0   2 NaN
1 NaN   9
2   4   9

In [7]: g = df.groupby('A', as_index=False)  # this is correct

In [8]: g.cumsum()
Out[8]:
   A   B   C
0  1   2 NaN
1  2 NaN   9
2  3   4   9

hayd · 2014-05-01T16:48:15Z

Ah wait, this is a feature! Coool!

jreback · 2014-05-01T16:50:15Z

hmm...the index should have a named index (as A)...let me fix

hayd · 2014-05-01T16:52:02Z

@jreback I'm not so sure, what are you changing? I think this is good as is!

jreback · 2014-05-01T17:05:32Z

I think this should be this (happens to be the same as sum in this case)

DataFrame([[2, 9], [4, 9]], columns=['B', 'C'], index=Index([1,3],name='A'))
   B  C
A      
1  2  9
3  4  9

[2 rows x 2 columns]

jreback · 2014-05-01T17:10:31Z

Here's a more realistic example

db) df = DataFrame([[1, 2, np.nan], [1, np.nan, 9], [1, 1, 2 ], [3, 4, 9]], columns=['A', 'B', 'C'])
(Pdb) df
   A   B   C
0  1   2 NaN
1  1 NaN   9
2  1   1   2
3  3   4   9

[4 rows x 3 columns]

(Pdb) results = concat([df.iloc[0:3].cumsum(),df.iloc[3:4].cumsum()])
(Pdb) p results
   A   B   C
0  1   2 NaN
1  2 NaN   9
2  3   3  11
3  3   4   9

[4 rows x 3 columns]

(Pdb) results.index = MultiIndex.from_tuples([(1,0),(1,1),(1,2),(3,0)],names=['A',None])
(Pdb) results
      B   C
A             
1 0    2 NaN
  1  NaN   9
  2    3  11
3 0    4   9

[4 rows x 3 columns]

Here's the current result

(Pdb) df.groupby('A').cumsum()
    B   C
0   2 NaN
1 NaN   9
2   3  11
3   4   9

[4 rows x 2 columns]

jorisvandenbossche · 2014-05-01T20:12:47Z

Following our rules, cumsum is not a reducer/aggregator, so it should ignore as_index?

And I would say it is a transformer, and then it is 'correct' to drop the grouper column. At least this is also what transform does:

In [10]: df.groupby('A', as_index=False).transform(lambda x: x.cumsum())
Out[10]:
    B   C
0   2 NaN
1 NaN   9
2   3  11
3   4   9

jreback · 2014-05-01T22:23:50Z

@jorisvandenbossche you are right....ok..marked it as some additional tests needed in any event (simply to validate this expectation)

ghost assigned hayd Nov 28, 2013

hayd mentioned this issue Nov 29, 2013

DataFrameGroupBy.cumcount() returing Series instead of DataFrame #5608

Closed

jreback mentioned this issue May 1, 2014

TST: tests for groupby not using grouper column, solved in GH7000, (GH5614) #7019

Merged

jreback closed this as completed in #7019 May 1, 2014

hayd mentioned this issue May 1, 2014

Consistency with groupby as_index #5755

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cumsum sums the groupby column #5614

cumsum sums the groupby column #5614

hayd commented Nov 28, 2013

jorisvandenbossche commented Nov 29, 2013

hayd commented Nov 29, 2013

jorisvandenbossche commented Nov 29, 2013

hayd commented Jan 27, 2014

jreback commented Apr 6, 2014

hayd commented Apr 6, 2014

jreback commented May 1, 2014

jreback commented May 1, 2014

hayd commented May 1, 2014

hayd commented May 1, 2014

jreback commented May 1, 2014

hayd commented May 1, 2014

jreback commented May 1, 2014

jreback commented May 1, 2014

jorisvandenbossche commented May 1, 2014

jreback commented May 1, 2014

cumsum sums the groupby column #5614

cumsum sums the groupby column #5614

Comments

hayd commented Nov 28, 2013

jorisvandenbossche commented Nov 29, 2013

hayd commented Nov 29, 2013

jorisvandenbossche commented Nov 29, 2013

hayd commented Jan 27, 2014

jreback commented Apr 6, 2014

hayd commented Apr 6, 2014

jreback commented May 1, 2014

jreback commented May 1, 2014

hayd commented May 1, 2014

hayd commented May 1, 2014

jreback commented May 1, 2014

hayd commented May 1, 2014

jreback commented May 1, 2014

jreback commented May 1, 2014

jorisvandenbossche commented May 1, 2014

jreback commented May 1, 2014