# Topic 04 - Problem 06: Multiple Aggregations per Group

---

## 1. About the Problem

In real datasets, a single aggregation (like only mean or sum) is often not enough.  
For proper analysis and machine learning feature creation, I need **multiple statistics per group**.

In this problem, I will compute:
- **Mean salary**
- **Maximum salary**
- **Minimum salary**
- **Total number of employees**

for each department.

This kind of grouped summary is commonly used in:
- Exploratory Data Analysis (EDA)
- Feature engineering
- Business reporting

---


## 2. Solution Code

In [2]:
import pandas as pd

# Sample dataset
data = {
    "department": ["IT", "IT", "HR", "HR", "Finance", "Finance", "IT"],
    "salary": [60000, 65000, 48000, 52000, 78000, 75000, 70000]
}

df = pd.DataFrame(data)

# Multiple aggregations per group
grouped_summary=df.groupby('department')['salary'].agg(
        mean_salary='mean',
        max_salary='max',
        min_salary='min',
        number_of_employees='count'
)

print(grouped_summary)


            mean_salary  max_salary  min_salary  number_of_employees
department                                                          
Finance         76500.0       78000       75000                    2
HR              50000.0       52000       48000                    2
IT              65000.0       70000       60000                    3


---

## 3. Explanation (What is happening)

- **groupby("department")**  
  → Splits data into groups based on department

- **["salary"]**  
  → Applies aggregation only on salary column

- **agg(...)**  
  → Computes multiple statistics at once
    - `mean` → average salary
    - `max` → highest salary
    - `min` → lowest salary
    - `count` → number of employees

- Named aggregations make the output clean and readable

---

## 4. Summary / Takeaways

By solving this problem, I learned:

1. How to calculate multiple statistics per group efficiently
2. How grouped summaries help understand data distribution
3. How this output can directly become ML features
4. Why named aggregations improve clarity and usability

This is a **core data science skill** and looks very strong on GitHub.

Next, I want to explore:
- Grouping by **multiple columns**
- Ranking values inside groups
- Time-based grouping

