BUG: odd transform behaviour with integers #7972

Closed
dsm054 opened this Issue Aug 9, 2014 · 5 comments

Comments

Projects
None yet
3 participants
Contributor

dsm054 commented Aug 9, 2014

After grouping on an integer column, we seem to forget that we have groups:

>>> pd.__version__
'0.14.1-172-gab64d58'
>>> x = np.arange(6, dtype=np.int64)
>>> df = pd.DataFrame({"a": x//2, "b": 2.0*x, "c": 3.0*x})
>>> df
   a   b   c
0  0   0   0
1  0   2   3
2  1   4   6
3  1   6   9
4  2   8  12
5  2  10  15
>>> df.groupby("a").transform("mean")
    b     c
0   1   1.5
1   5   7.5
2   9  13.5
3 NaN   NaN
4 NaN   NaN
5 NaN   NaN
>>> df["a"] = df["a"]*1.0
>>> df.groupby("a").transform("mean")
   b     c
0  1   1.5
1  1   1.5
2  5   7.5
3  5   7.5
4  9  13.5
5  9  13.5

To make it even more obvious:

>>> df.index = range(20, 26)
>>> df.groupby("a").transform("mean")
     b     c
0    1   1.5
1    5   7.5
2    9  13.5
20 NaN   NaN
21 NaN   NaN
22 NaN   NaN
23 NaN   NaN
24 NaN   NaN
25 NaN   NaN

Switching to a float index seems to avoid the issues as well.

Contributor

jreback commented Aug 9, 2014

I think the .ix should be .iloc here: https://github.com/pydata/pandas/blob/master/pandas/core/groupby.py#L2871 to fix

can you give a try ?

Member

cpcloud commented Aug 9, 2014

that doesn't fix it i just tried

Contributor

jreback commented Aug 10, 2014

was close though!

Member

cpcloud commented Aug 10, 2014

yep!

jreback added this to the 0.15.0 milestone Aug 10, 2014

cpcloud closed this in #7973 Aug 10, 2014

Member

cpcloud commented Aug 10, 2014

@dsm054 thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment