# GroupBy and Aggregation Operations

<!--
Author: RSK World
Website: https://rskworld.in
Email: help@rskworld.in
Phone: +91 93305 39277
Description: Comprehensive guide to GroupBy and aggregation operations in Pandas
-->

## Introduction

This notebook covers grouping data and performing aggregations using Pandas GroupBy functionality.



In [None]:
# Author: RSK World | Website: https://rskworld.in | Email: help@rskworld.in | Phone: +91 93305 39277

import pandas as pd
import numpy as np

# Create sample DataFrame
df = pd.DataFrame({
    'Department': ['IT', 'HR', 'IT', 'HR', 'IT', 'Finance', 'IT', 'HR'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry'],
    'Salary': [50000, 60000, 70000, 55000, 65000, 80000, 58000, 62000],
    'Experience': [2, 5, 8, 3, 6, 12, 4, 7],
    'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney', 'Berlin', 'Mumbai', 'London']
})

print("Sample DataFrame:")
print(df)



## Basic GroupBy Operations


In [None]:
# Author: RSK World | Website: https://rskworld.in | Email: help@rskworld.in | Phone: +91 93305 39277

# Group by a single column
grouped = df.groupby('Department')
print("=== Grouped by Department ===")
print(f"Number of groups: {len(grouped)}")
print(f"Groups: {list(grouped.groups.keys())}")

# Iterate through groups
for name, group in grouped:
    print(f"\n{name} Department:")
    print(group)



## Aggregation Functions


In [None]:
# Author: RSK World | Website: https://rskworld.in | Email: help@rskworld.in | Phone: +91 93305 39277

# Mean salary by department
print("=== Mean Salary by Department ===")
print(df.groupby('Department')['Salary'].mean())

# Multiple aggregations
print("\n=== Multiple Aggregations ===")
print(df.groupby('Department')['Salary'].agg(['mean', 'median', 'min', 'max', 'std', 'count']))

# Aggregation on multiple columns
print("\n=== Aggregations on Multiple Columns ===")
print(df.groupby('Department')[['Salary', 'Experience']].mean())



## Custom Aggregation Functions


In [None]:
# Author: RSK World | Website: https://rskworld.in | Email: help@rskworld.in | Phone: +91 93305 39277

# Custom aggregation function
def salary_range(series):
    return series.max() - series.min()

print("=== Custom Aggregation: Salary Range ===")
print(df.groupby('Department')['Salary'].agg(salary_range))

# Multiple custom functions
print("\n=== Multiple Custom Functions ===")
print(df.groupby('Department')['Salary'].agg(['mean', salary_range]))

# Named aggregations
print("\n=== Named Aggregations ===")
result = df.groupby('Department').agg(
    avg_salary=('Salary', 'mean'),
    total_employees=('Employee', 'count'),
    max_experience=('Experience', 'max')
)
print(result)



## Grouping by Multiple Columns


In [None]:
# Author: RSK World | Website: https://rskworld.in | Email: help@rskworld.in | Phone: +91 93305 39277

# Group by multiple columns
print("=== Group by Department and City ===")
multi_group = df.groupby(['Department', 'City'])['Salary'].mean()
print(multi_group)

# Unstack for better readability
print("\n=== Unstacked Result ===")
print(multi_group.unstack())



## Transform and Apply


In [None]:
# Author: RSK World | Website: https://rskworld.in | Email: help@rskworld.in | Phone: +91 93305 39277

# Transform - returns DataFrame with same shape
df['Salary_Mean_Dept'] = df.groupby('Department')['Salary'].transform('mean')
print("=== Adding Mean Salary by Department ===")
print(df[['Employee', 'Department', 'Salary', 'Salary_Mean_Dept']])

# Apply - apply custom function to each group
def normalize_salary(group):
    return (group - group.mean()) / group.std()

df['Salary_Normalized'] = df.groupby('Department')['Salary'].transform(normalize_salary)
print("\n=== Normalized Salary by Department ===")
print(df[['Employee', 'Department', 'Salary', 'Salary_Normalized']])

