# Topic 04 - Problem 03: Custom Aggregations & Resetting Index

---

## 1. About the Problem

Sometimes built-in aggregation functions like `mean`, `sum`, or `max` are not enough.  
In real datasets, I may need **custom calculations**, such as:

- Salary range (max − min)
- Experience spread
- Combined metrics for feature creation

In this problem, I will:
1. Apply **custom aggregation functions**
2. Reset the multi-level index into a clean tabular format

This is important because machine learning models usually **don’t accept multi-index data**.

---



## 2. Solution Code

In [8]:
import pandas as pd

# Sample dataset
data = {
    "department": ["IT", "IT", "HR", "HR", "Finance", "Finance", "IT"],
    "salary": [60000, 65000, 48000, 52000, 78000, 75000, 70000],
    "experience": [3, 4, 2, 5, 8, 7, 6]
}

df = pd.DataFrame(data)

aggregated_data=df.groupby('department').agg(
    avg_salary=('salary','mean'),
    salary_range=('salary',lambda x:x.max()-x.min()),
    avg_experience=('experience','mean')
)
aggregated_data=aggregated_data.reset_index()
print(aggregated_data)


  department  avg_salary  salary_range  avg_experience
0    Finance     76500.0          3000        7.500000
1         HR     50000.0          4000        3.500000
2         IT     65000.0         10000        4.333333


---

## 3. Explanation (What is happening)

- **groupby("department")**  
  → Groups employees by department

- **avg_salary**  
  → Mean salary per department

- **salary_range (custom lambda)**  
  → Difference between highest and lowest salary

- **reset_index()**  
  → Converts grouped index back into a normal column

This makes the data **model-ready** and easier to export or visualize.

---

## 4. Summary / Takeaways

By solving this problem, I learned:

1. How to create **custom aggregation logic** using lambda functions
2. How to generate **business-relevant features** (like salary range)
3. Why `reset_index()` is essential after grouping
4. How aggregation helps in **feature engineering**

This type of aggregation is frequently used in:
- Feature engineering pipelines
- HR analytics
- Financial reporting
- ML preprocessing workflows

This problem clearly shows **intermediate Pandas + analytical thinking**, which is valuable for Data Science and ML roles.

Next, I will explore **group-wise transformations and filtering**.
