# Topic 04 - Problem 05: Group-wise Normalization (Feature Scaling by Group)

---

## 1. About the Problem

In many real-world datasets, values need to be **normalized within groups**, not globally.

For example:
- Salary should be compared **within the same department**
- Scores should be scaled **per class or category**

In this problem, I will normalize employee salaries **within each department** using **z-score normalization**:

\[
z = \frac{x - \mu}{\sigma}
\]

This helps machine learning models understand **relative performance inside a group**.

---


## 2. Solution Code

In [8]:
import pandas as pd

# Sample dataset
data = {
    "department": ["IT", "IT", "HR", "HR", "Finance", "Finance", "IT"],
    "salary": [60000, 65000, 48000, 52000, 78000, 75000, 70000]
}

df = pd.DataFrame(data)

# Group-wise normalization (z-score)
df['salary_normalized']=df.groupby('department')['salary'].transform(lambda x:(x-x.mean())/(x.std()))

print(df)


  department  salary  salary_normalized
0         IT   60000          -1.000000
1         IT   65000           0.000000
2         HR   48000          -0.707107
3         HR   52000           0.707107
4    Finance   78000           0.707107
5    Finance   75000          -0.707107
6         IT   70000           1.000000


---

## 3. Explanation (What is happening)

- **groupby("department")["salary"]**  
  → Groups salary values by department

- **transform(...)**  
  → Applies calculation but keeps original DataFrame shape

- **(x - x.mean()) / x.std()**  
  → Standardizes salary within each department

- Result is a new feature aligned row-by-row

This is different from `agg()`:
- `agg()` reduces data
- `transform()` preserves row-level data

---

## 4. Summary / Takeaways

By solving this problem, I learned:

1. How to normalize numerical data within groups
2. Why group-wise scaling is important for ML models
3. The power of `groupby().transform()` for feature engineering
4. How to create meaningful derived features

This technique is used in:
- Feature engineering
- Fair comparisons
- Recommendation systems
- ML preprocessing pipelines

This problem clearly demonstrates **data science maturity** and is worth showcasing on GitHub.

Next, I will move toward **multi-level grouping and advanced aggregations**.
