# Aggregration & Grouping
Grouping and aggregating helps you summarize your data — like answering:

“What’s the average salary per department?”<br>
“How many users joined the Gym per month?”


## Common Aggregation Functions

## ` .groupby()` Function
df.groupby() is used to group rows of a DataFrame based on the values in one or more columns, which allows you to then perform aggregate functions (like sum(), mean(), count(), etc.) on each group. Consider this DataFrame:

In [2]:
import pandas as pd

In [4]:
df = pd.DataFrame({
    "Department": ["HR", "HR", "IT", "IT", "Marketing", "Marketing", "Sales", "Sales"],
    "Team": ["A", "A", "B", "B", "C", "C", "D", "D"],
    "Gender": ["M", "F", "M", "F", "M", "F", "M", "F"],
    "Salary": [85, 90, 78, 85, 92, 88, 75, 80],
    "Age": [23, 25, 30, 22, 28, 26, 21, 27],
    "JoinDate": pd.to_datetime([
        "2020-01-10", "2020-02-15", "2021-03-20", "2021-04-10",
        "2020-05-30", "2020-06-25", "2021-07-15", "2021-08-01"
    ])
})  
print(df)

  Department Team Gender  Salary  Age   JoinDate
0         HR    A      M      85   23 2020-01-10
1         HR    A      F      90   25 2020-02-15
2         IT    B      M      78   30 2021-03-20
3         IT    B      F      85   22 2021-04-10
4  Marketing    C      M      92   28 2020-05-30
5  Marketing    C      F      88   26 2020-06-25
6      Sales    D      M      75   21 2021-07-15
7      Sales    D      F      80   27 2021-08-01


### .mean()

In [5]:
df.groupby("Department")["Salary"].mean()

Department
HR           87.5
IT           81.5
Marketing    90.0
Sales        77.5
Name: Salary, dtype: float64

### .sum()

In [6]:
df.groupby("Department")["Salary"].sum()

Department
HR           175
IT           163
Marketing    180
Sales        155
Name: Salary, dtype: int64

### .count()

In [7]:
df.groupby("Department")["Salary"].count()

Department
HR           2
IT           2
Marketing    2
Sales        2
Name: Salary, dtype: int64

### .min()

In [8]:
df.groupby("Department")["Salary"].min()

Department
HR           85
IT           78
Marketing    88
Sales        75
Name: Salary, dtype: int64

### .max()

In [9]:
df.groupby("Department")["Salary"].max()

Department
HR           90
IT           85
Marketing    92
Sales        80
Name: Salary, dtype: int64

## Custom Aggregations with .agg()

Apply multiple functions at once like this:

In [10]:
df.groupby("Team")["Salary"].agg(["mean", "max", "min"])

Unnamed: 0_level_0,mean,max,min
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,87.5,90,85
B,81.5,85,78
C,90.0,92,88
D,77.5,80,75


## Note: In pandas, .agg and .aggregate are exactly the same — they're aliases for the same method

### Name your own functions:

In [12]:
df.groupby("Department")["Salary"].agg(
    avg_score="mean",
    high_score="max"
)

Unnamed: 0_level_0,avg_score,high_score
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,87.5,90
IT,81.5,85
Marketing,90.0,92
Sales,77.5,80


### Apply different functions to different columns:

In [14]:
df.groupby("Department").agg({
    "Salary": "mean",
    "Age": "max"
})

Unnamed: 0_level_0,Salary,Age
Department,Unnamed: 1_level_1,Unnamed: 2_level_1
HR,87.5,25
IT,81.5,30
Marketing,90.0,28
Sales,77.5,27


### Transform vs Aggregate vs Filter

| Operation       | Returns                | When to Use                          |
|-----------------|------------------------|--------------------------------------|
| `.aggregate()`  | Single value per group | For summaries (like mean, sum, etc.) |
| `.transform()`  | Same shape as original | To add new columns based on groups   |
| `.filter()`     | Subset of rows         | To keep or remove entire groups      |

## ` .transform()` Example:

In [15]:
df["Team Avg"] = df.groupby("Team")["Salary"].transform("mean")

In [16]:
df

Unnamed: 0,Department,Team,Gender,Salary,Age,JoinDate,Team Avg
0,HR,A,M,85,23,2020-01-10,87.5
1,HR,A,F,90,25,2020-02-15,87.5
2,IT,B,M,78,30,2021-03-20,81.5
3,IT,B,F,85,22,2021-04-10,81.5
4,Marketing,C,M,92,28,2020-05-30,90.0
5,Marketing,C,F,88,26,2020-06-25,90.0
6,Sales,D,M,75,21,2021-07-15,77.5
7,Sales,D,F,80,27,2021-08-01,77.5


## ` .filter()` Example:

In [18]:
df.groupby("Team").filter(lambda x: x["Salary"].mean() > 80)

Unnamed: 0,Department,Team,Gender,Salary,Age,JoinDate,Team Avg
0,HR,A,M,85,23,2020-01-10,87.5
1,HR,A,F,90,25,2020-02-15,87.5
2,IT,B,M,78,30,2021-03-20,81.5
3,IT,B,F,85,22,2021-04-10,81.5
4,Marketing,C,M,92,28,2020-05-30,90.0
5,Marketing,C,F,88,26,2020-06-25,90.0


# Summary
- .groupby() helps you summarize large datasets by category
- Use mean(), sum(), count(), .agg() for custom metrics
- .transform() adds values back to original rows
- .filter() keeps only groups that meet conditions