# Day 18 â€” GroupBy and Aggregation in Pandas

---

## Objectives
- Understand what `groupby` is and why it is useful
- Learn how to aggregate data using functions like `sum`, `mean`, `count`, `min`, `max`
- Apply grouping on single or multiple columns
- Perform custom aggregation with `agg()`
- Explore hierarchical indexing after grouping

---

## 1. Load Dataset


In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv('datasets/employee_data.csv')
print(df.head())


## 2. Basic Aggregation Functions


In [None]:
# Sum of a numeric column
print("Total Salary:", df['Salary'].sum())

# Mean, min, max
print("Average Salary:", df['Salary'].mean())
print("Minimum Salary:", df['Salary'].min())
print("Maximum Salary:", df['Salary'].max())

# Count of non-null values
print("Number of employees:", df['Salary'].count())


## 3. Grouping Data

# Group by single column


In [None]:
# Group employees by Department and calculate average salary
dept_group = df.groupby('Department')['Salary'].mean()
print("Average Salary per Department:")
print(dept_group)


# Group by multiple columns


In [None]:
# Average Salary by Department and City
dept_city_group = df.groupby(['Department','City'])['Salary'].mean()
print("Average Salary by Department and City:")
print(dept_city_group)


## 4. Aggregating with Multiple Functions


In [None]:
# Aggregate using multiple functions
agg_salary = df.groupby('Department')['Salary'].agg(['mean','sum','max','min'])
print("Salary aggregation by Department:")
print(agg_salary)


## 5. Custom Aggregation


In [None]:
# Define a custom function
def salary_range(x):
    return x.max() - x.min()

custom_agg = df.groupby('Department')['Salary'].agg(['mean','sum', salary_range])
print("Custom aggregation with salary range:")
print(custom_agg)


## 6. Group Size and Count


In [None]:
# Count of employees in each Department
count_group = df.groupby('Department').size()
print("Number of employees per Department:")
print(count_group)

# Count of employees per Department and City
count_group2 = df.groupby(['Department','City']).size()
print("\nEmployees per Department and City:")
print(count_group2)


## 7. Resetting Index after Grouping


In [None]:
# Reset index to convert grouped Series/DataFrame back to normal DataFrame
dept_group_reset = dept_group.reset_index()
print(dept_group_reset.head())


## 8. Practice Exercises

1. Find total and average salary per Department.  
2. Find the maximum and minimum salary per City.  
3. Count the number of employees per Department.  
4. Calculate custom aggregation: salary range per Department.  
5. Group by Department and City, then calculate average Salary.  
6. Reset the index of grouped DataFrame and display it.

---

# End of Day 18 notebook
