You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After grouping on an integer column, we seem to forget that we have groups:
>>> pd.__version__
'0.14.1-172-gab64d58'
>>> x = np.arange(6, dtype=np.int64)
>>> df = pd.DataFrame({"a": x//2, "b": 2.0*x, "c": 3.0*x})
>>> df
a b c
0 0 0 0
1 0 2 3
2 1 4 6
3 1 6 9
4 2 8 12
5 2 10 15
>>> df.groupby("a").transform("mean")
b c
0 1 1.5
1 5 7.5
2 9 13.5
3 NaN NaN
4 NaN NaN
5 NaN NaN
>>> df["a"] = df["a"]*1.0
>>> df.groupby("a").transform("mean")
b c
0 1 1.5
1 1 1.5
2 5 7.5
3 5 7.5
4 9 13.5
5 9 13.5
To make it even more obvious:
>>> df.index = range(20, 26)
>>> df.groupby("a").transform("mean")
b c
0 1 1.5
1 5 7.5
2 9 13.5
20 NaN NaN
21 NaN NaN
22 NaN NaN
23 NaN NaN
24 NaN NaN
25 NaN NaN
Switching to a float index seems to avoid the issues as well.
The text was updated successfully, but these errors were encountered:
After grouping on an integer column, we seem to forget that we have groups:
To make it even more obvious:
Switching to a float index seems to avoid the issues as well.
The text was updated successfully, but these errors were encountered: