# Data Normalization :~


### 🔹 Data Normalization (Manual Scaling)
### 📌 What is it?

Normalization = Rescaling numerical values into a common range, often [0, 1].

Mathematical Formula:

```x' = (x - min(x)) / (max(x) - min(x))```

So if your data = [10, 20, 30],
normalized values = [0.0, 0.5, 1.0].

### 📌 Where it’s used in DS/AI:

#### Machine Learning Preprocessing

Many algorithms (KNN, Logistic Regression, Gradient Descent) work with distances or weights.

Example: If one feature is “Salary” (₹50,000) and another is “Age” (25), the huge salary numbers dominate.

Normalization puts everything on a level playing field.

#### Neural Networks

Inputs need to be in a small range (usually -1 to 1 or 0 to 1).

Otherwise, large values cause gradients to explode/vanish → training becomes unstable.

Normalization helps networks learn faster and converge smoothly.

Clustering & Similarity (K-Means, Cosine Similarity)

If features aren’t normalized, clustering will group points based on scale, not actual patterns.

### 📌 Why it’s a great loop exercise:

Normally we’d use NumPy / Pandas ```(.min(), .max(), vectorized ops)```.

But implementing with only loops makes you understand:

how to compute min & max,

how scaling is applied element by element.

So this exercise bridges the gap between pure Python fundamentals → real ML preprocessing.

In [2]:
# Sample dataset: a list of lists where each inner list is [age, salary]
data = [
    [25, 50000],
    [35, 80000],
    [45, 120000],
    [22, 45000],
    [60, 95000]
]

# --- Separate the data into individual columns (features) ---
ages = []
salaries = []

for row in data:
    ages.append(row[0])       # Extract age
    salaries.append(row[1])   # Extract salary

# --- Normalize each column independently ---
# Normalization formula: (value - min) / (max - min)

# Normalize ages
min_age = min(ages)
max_age = max(ages)
normalized_ages = []

for age in ages:
    norm_age = (age - min_age) / (max_age - min_age)
    normalized_ages.append(norm_age)

# Normalize salaries
min_salary = min(salaries)
max_salary = max(salaries)
normalized_salaries = []

for salary in salaries:
    norm_salary = (salary - min_salary) / (max_salary - min_salary)
    normalized_salaries.append(norm_salary)

# --- Recombine the normalized columns ---
normalized_data = []

for i in range(len(data)):
    normalized_data.append([normalized_ages[i], normalized_salaries[i]])

# --- Print the results ---
print("Original Data (Age, Salary):")
for row in data:
    print(row)

print("\n------------------------------------\n")

print("Normalized Data (Age, Salary):")
for row in normalized_data:
    print(row)


Original Data (Age, Salary):
[25, 50000]
[35, 80000]
[45, 120000]
[22, 45000]
[60, 95000]

------------------------------------

Normalized Data (Age, Salary):
[0.07894736842105263, 0.06666666666666667]
[0.34210526315789475, 0.4666666666666667]
[0.6052631578947368, 1.0]
[0.0, 0.0]
[1.0, 0.6666666666666666]
