# Pandas: apply computation within groups in a DataFrame

One useful capability of pandas is to execute computations within groups of a DataFrame. This is achieved using `groupby()` followed by `transform()`.

In [1]:
# Import libraries
import pandas as pd
import seaborn as sns

# Load sample data in a DataFrame
df = (
    sns.load_dataset('iris')
    .sample(n=12, random_state=24)
    .sort_values('species')
    .reset_index(drop=True)
    [['species', 'sepal_width']]
)
df

Unnamed: 0,species,sepal_width
0,setosa,3.4
1,setosa,3.7
2,setosa,3.2
3,setosa,3.1
4,setosa,3.8
5,versicolor,2.4
6,versicolor,2.7
7,versicolor,2.5
8,virginica,2.9
9,virginica,2.8


## Compute mean at group-level

To get group-level statistics like mean, sum or count, use `transform('function')`.

In [2]:
# Get group mean
df.assign(mean=
    df.groupby('species').transform('mean')
)

Unnamed: 0,species,sepal_width,mean
0,setosa,3.4,3.44
1,setosa,3.7,3.44
2,setosa,3.2,3.44
3,setosa,3.1,3.44
4,setosa,3.8,3.44
5,versicolor,2.4,2.533333
6,versicolor,2.7,2.533333
7,versicolor,2.5,2.533333
8,virginica,2.9,3.025
9,virginica,2.8,3.025


## Standardize values

You can center values inside a group by substracting the group mean to each row.

In [3]:
# Standardize values
df.assign(standardized=
    df.groupby('species').transform(lambda x: x - x.mean())
)

Unnamed: 0,species,sepal_width,standardized
0,setosa,3.4,-0.04
1,setosa,3.7,0.26
2,setosa,3.2,-0.24
3,setosa,3.1,-0.34
4,setosa,3.8,0.36
5,versicolor,2.4,-0.133333
6,versicolor,2.7,0.166667
7,versicolor,2.5,-0.033333
8,virginica,2.9,-0.125
9,virginica,2.8,-0.225


## Rank values inside groups

Besides computing group-wise values, you can also rank values within each group.

In [4]:
# Rank values
df.assign(rank=
    df.groupby('species').transform(lambda x: x.rank())
)

Unnamed: 0,species,sepal_width,rank
0,setosa,3.4,3.0
1,setosa,3.7,4.0
2,setosa,3.2,2.0
3,setosa,3.1,1.0
4,setosa,3.8,5.0
5,versicolor,2.4,1.0
6,versicolor,2.7,3.0
7,versicolor,2.5,2.0
8,virginica,2.9,2.0
9,virginica,2.8,1.0
