There is another built-in method "transform", which is similar to apply but imposes more constraints on the kind of function we can use: 
- it can produce a scalar value to be broadcast to the shape of the group.
- it can produce an object of the same shape as the input group.
- it must not mutate its input


In [2]:
import pandas as pd 
import numpy as np 

df = pd.DataFrame({'key': ['a', 'b', 'c'] * 4,
                   'value': np.arange(12.)})
df

Unnamed: 0,key,value
0,a,0.0
1,b,1.0
2,c,2.0
3,a,3.0
4,b,4.0
5,c,5.0
6,a,6.0
7,b,7.0
8,c,8.0
9,a,9.0


In [4]:
# here are the group means by key:
g = df.groupby('key')['value']
g.mean()

key
a    4.5
b    5.5
c    6.5
Name: value, dtype: float64

In [5]:
# suppose instead we wanted to produce a Series of the same shape as df['value'] but with values replaced by the average grouped by 'key' 
# we can pass a function that computes the mean of a single group to transform

def get_mean(group):
    return group.mean()

g.transform(get_mean)

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: value, dtype: float64

In [6]:
# for built-in aggregation functions, we can pass a string alias as with the GroupBy agg method
g.transform('mean')

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: value, dtype: float64

In [7]:
# like apply, transform works with functions that return Series, but the result must be the same size as the input
# for ex, we can multiply each group by 2 using a helper function

def times_two(group):
    return group * 2

g.transform(times_two)

0      0.0
1      2.0
2      4.0
3      6.0
4      8.0
5     10.0
6     12.0
7     14.0
8     16.0
9     18.0
10    20.0
11    22.0
Name: value, dtype: float64

In [8]:
# as a more complicated example, we can compute the ranks in descending order for each group
def get_ranks(group):
    return group.rank(ascending=False)
g.transform(get_ranks)

0     4.0
1     4.0
2     4.0
3     3.0
4     3.0
5     3.0
6     2.0
7     2.0
8     2.0
9     1.0
10    1.0
11    1.0
Name: value, dtype: float64

In [9]:
# consider a group transforming function composed from single aggregations
def normalize(x):
    return (x - x.mean()) / x.std()

# we can obtain equivalent results in this case using either transform or apply
g.transform(normalize)

0    -1.161895
1    -1.161895
2    -1.161895
3    -0.387298
4    -0.387298
5    -0.387298
6     0.387298
7     0.387298
8     0.387298
9     1.161895
10    1.161895
11    1.161895
Name: value, dtype: float64

In [10]:
g.apply(normalize)

key    
a    0    -1.161895
     3    -0.387298
     6     0.387298
     9     1.161895
b    1    -1.161895
     4    -0.387298
     7     0.387298
     10    1.161895
c    2    -1.161895
     5    -0.387298
     8     0.387298
     11    1.161895
Name: value, dtype: float64

- built-in aggregate functions like 'mean' or 'sum' are often much faster than a general apply function.
- these also have a "fast path" when used with transform.
- this allows us to perform what is called an "unwrapped" group operation.

In [11]:
g.transform('mean')

0     4.5
1     5.5
2     6.5
3     4.5
4     5.5
5     6.5
6     4.5
7     5.5
8     6.5
9     4.5
10    5.5
11    6.5
Name: value, dtype: float64

In [13]:
normalized = (df['value'] - g.transform('mean')) / g.transform('std')
normalized

0    -1.161895
1    -1.161895
2    -1.161895
3    -0.387298
4    -0.387298
5    -0.387298
6     0.387298
7     0.387298
8     0.387298
9     1.161895
10    1.161895
11    1.161895
Name: value, dtype: float64

In [None]:
# here we are doing arithmetic between the outputs of multiple GroupBy operations instead of writing a function and passing it to groupby(...).apply.
# that is what is meant by "unwrapped"
# while an unwrapped group operation may involve multiple group aggregations, the overall benefit of vectorized operations often outweighs this.