# Feature Scaling:
Feature Scaling means transforming the values of numerical features to have similar scale or range ensuring all features contribute equally to the model.<br>
This helps machine learning models perform better by preventing some features from dominating others just because of their larger values.<br><br>
**Why It's Important**<br>
Imagine you have a dataset with two features:<br>

Salary, ranging from 20,000  to 200,000.<br>

Age, ranging from 18 to 70.<br>

A model like a Support Vector Machine (SVM) or K-Nearest Neighbors (KNN) uses the distance between data points to make predictions. Without scaling, the "Salary" feature would dominate the distance calculation simply because its values are much larger than the "Age" values. This could lead to a biased model that incorrectly prioritizes one feature over another.


# Common Methods of Feature Scaling

# 1.Normalization (Min-Max Scaling)
Transforms values to a range between 0 and 1<br>
$$
X_{normalized} = \left( \frac{X - X_{min}}{X_{max} - X_{min}} \right)
$$
where X is feature value, Xmin is the minimum feature value in the dataset, and Xmax is the maximum feature value.<br>


In [43]:
import pandas as pd

df = pd.DataFrame({
    "Age": [25, 45, 35, 50, 23],
    "Income": [50000, 100000, 75000, 120000, 35000]
})
print(df)

   Age  Income
0   25   50000
1   45  100000
2   35   75000
3   50  120000
4   23   35000


Income has values in tens of thousands, whereas age is in tens . Models might give more importance to Income unless scaled

In [62]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler() #create instance of MinMaxScaler

scaled_data = scaler.fit_transform(df) # gives a two numpy array
print(scaled_data,"\n")
df['Age'],df['Income']=scaled_data[:,0],scaled_data[:,1] #[:,0] --> select all rows from column 0(Age) and [:,1]-->select all rows from column 1(Income)
print(df)

[[0.07407407 0.17647059]
 [0.81481481 0.76470588]
 [0.44444444 0.47058824]
 [1.         1.        ]
 [0.         0.        ]] 

        Age    Income
0  0.074074  0.176471
1  0.814815  0.764706
2  0.444444  0.470588
3  1.000000  1.000000
4  0.000000  0.000000


**Drawback**<br>
i)Sensitive to outilers:<br>
>Example:<br>
Original Age data → [25, 35, 40, 30, 150]<br>

* Here, 150 is an outlier.<br>

After scaling, most values might lie between 0 and 0.2, while the outlier reaches 1 → the pattern of the majority is lost.
<br>
<br>
**common uses**<br>
Its mostly used in K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and K-Means Clustering,

# 2.Vector Normalization
A vector is just a list of numbers that represents features of a data point.<br>
For example:<br>

    "Age": [25, 45, 35, 50, 23],
    "Income": [50000, 100000, 75000, 120000, 35000]

V=[25,50000] is a vector.<br>
This vector can be thought of as a point in 2D space (x=25, y=50000), or as an arrow pointing from the origin (0,0) to the point (25,50000).<br>
The direction of a vector is where it points in space — that is, the way the arrow is oriented.<br><br>
For example:<br>
[25, 50000] and [50, 100000] point in the same direction!
They just differ in how long the arrow is.

Now $$
Length = \sqrt{25^2 + 50000^2}
$$

**What is Vector Normalization?**<br>

Vector normalization is a process where you adjust the values in a vector (or row of data) so that the vector’s length (or magnitude) becomes 1.<br><br>
✔ It’s often used when the direction of the data matters more than its magnitude.<br>
✔ This technique is common in algorithms like KNN, Cosine Similarity, Text Mining, and Neural Networks.<br>
$$
||v||= \sqrt{x_{1}^2 + x_{2}^2+...+x_{n}^2}
$$
$$
x_{i-normalized} = \frac{x_i}{||v||}
$$
This way, the normalized vector has a length of 1, but keeps its direction same.

In [101]:
import pandas as pd

df = pd.DataFrame({
    "Age": [25, 45, 35, 50, 23],
    "Income": [50000, 100000, 75000, 120000, 35000]
})
print(df)

   Age  Income
0   25   50000
1   45  100000
2   35   75000
3   50  120000
4   23   35000


In [107]:
from sklearn.preprocessing import Normalizer
normalizer = Normalizer()
df_normalized=normalizer.fit_transform(df)
print(df_normalized)

[[4.99999938e-04 9.99999875e-01]
 [4.49999954e-04 9.99999899e-01]
 [4.66666616e-04 9.99999891e-01]
 [4.16666630e-04 9.99999913e-01]
 [6.57142715e-04 9.99999784e-01]]


NOTE: Normalization doesn’t change proportions. It simply scales the vector so its length is 1.

# 3.Mean Normalization

Mean normalization is a feature scaling technique where you shift and scale the data so that the features have a mean of 0.<br>
$$
X_{normalized} = \left( \frac{X - mean(X)}{X_{max} - X_{min}} \right)
$$
Why is it used?

1. Centering the data helps many algorithms (like linear regression, gradient descent, neural networks) perform better.

2. It prevents large-valued features from dominating models.

3. It maintains relationships while adjusting the scale.

A negative value means the original value was below the mean. A positive value means the original value was above the mean

The magnitude shows how far the value is from the average relative to the spread.

It is sensitive to ouliers.

# 4. Absolute Maximum Scaling
Absolute Maximum Scaling rescales each feature by dividing all values by the maximum absolute value of that feature. This ensures the feature values fall within the range of -1 to 1.<br>

Sensitive to outliers, making it less suitable for noisy datasets.<br>

$$
X_{scaled}=\frac{X_{i}}{|X_{max}|}
$$

It is also sensitive to outliers.

# 5.Robust Scaling
Robust Scaling uses the median and interquartile range (IQR) making the transformation robust to outliers and skewed distributions.<br>

The median and IQR are robust statistics → they don’t change much when extreme values are present.
So, this scaler is called RobustScaler in scikit-learn<br>
$$
X_{Scaled} = \left( \frac{X - median(X)}{IQR} \right)
$$
Where:

Median(X) = the 50th percentile (middle value)

IQR = Q3 - Q1 (75th percentile – 25th percentile)

In [40]:
import pandas as pd
from sklearn.preprocessing import RobustScaler
df = pd.DataFrame({
    "Income": [50000, 60000, 55000, 58000, 1200000]  #last value is an outlier
})

print(df)

    Income
0    50000
1    60000
2    55000
3    58000
4  1200000


In [42]:
scaler = RobustScaler()
df_scaled = scaler.fit_transform(df)
print(df_scaled)

[[ -1.6]
 [  0.4]
 [ -0.6]
 [  0. ]
 [228.4]]


RobustScaler helps models learn better when outliers are present. It prevents extreme values from dominating the scaling process

# 6.Standardization (Also known as Z-Score Normalization)

Standardization is a feature scaling technique where you rescale the data so that it has:<br>
* Mean = 0<br>
* Standard deviation =1 <br>
$$
X_{Scaled} = \left( \frac{X_{i} -μ}{σ} \right)
$$

✔ Handles outliers better than MinMax scaling (though not perfectly)<br>
✔ Centers data and normalizes variance → improves model performance<br>
✔ Effective for data approximately normally distributed.



In [65]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.DataFrame({
    "Age":[20,35,18,25,65],
    "Income": [50000, 60000, 55000, 58000, 1200000]  # outlier present
})
print(df)

   Age   Income
0   20    50000
1   35    60000
2   18    55000
3   25    58000
4   65  1200000


In [67]:
scaler = StandardScaler()
df_scaled = scaler.fit_transform(df)
print(df_scaled)

[[-0.73107693 -0.51254893]
 [ 0.13925275 -0.49070115]
 [-0.84712088 -0.50162504]
 [-0.44096703 -0.49507071]
 [ 1.8799121   1.99994582]]


The values are now centered around 0, with most of them near 0 and the outlier slightly farther.This makes the data symmetrical and prevents the algorithm from being skewed by large numbers.<br>
Centers the data → Mean becomes 0 and Normalizes variance → The differences between data points are scaled so that the spread of values is uniform.<br>
All features contribute equally, even if they originally had different ranges.