# Feature Scaling: 

## What is Feature Scaling?

Imagine we’re comparing two things:

1. The height of a person (in meters): e.g., 1.75, 1.80, 1.65.
2. The weight of a person (in kilograms): e.g., 70, 80, 90.

Now, height values are much smaller than weight values. If we give these numbers to a computer, it might think weight is more important because the numbers are bigger. But in reality, both height and weight are equally important!

To fix this, we "scale" the features so that all the numbers are on the same level. This process is called **Feature Scaling**.


So Technically;
> Feature scaling is the process of transforming numerical features so that they are on the same scale. This helps machine learning algorithms work better by avoiding bias toward features with larger values.


## Why Do We Need Feature Scaling?

1. Avoid Bias: Features with larger values can dominate the model, even if they’re not more important.

2. Faster learning: Algorithms like Gradient Descent converge faster when features are scaled.

3. Better results: Scaling helps the model treat all features fairly.

### Affected Algorithms (i.e. ML algos which work better with Scaling) : 
  - KNN, K-Means, SVM, Logistic/Linear Regression, PCA, Neural Nets
### Non Affected Algorithms :
  - Tree-based models (Decision Tree, Random Forest, XGBoost)

## Types of Feature Scaling
1. **Standardization (Z-Score Normalization)**:
   - Converts data to have a mean of 0 and a standard deviation of 1.
   - Formula:  
     $$
     z = \frac{x - \text{mean}}{\text{standard deviation}}
     $$
   - Use case: When data has outliers or follows a normal distribution.

2. **Normalization (Min-Max Scaling)**:
   - Scales data to a fixed range (usually 0 to 1).
   - Formula:  
     $$
     x_{\text{scaled}} = \frac{x - \text{min}}{\text{max} - \text{min}}
     $$
   - Use case: When you need all features in the same range (e.g., image data).

## Tools for Feature Scaling
Scikit-learn provides two main tools:
- `StandardScaler`: For standardization.
- `MinMaxScaler`: For normalization.

## Example Workflow
1. Create your dataset.
2. Apply a scaler (`StandardScaler` or `MinMaxScaler`).
3. Transform the data and convert it back to a DataFrame for readability.

## Key Takeaways
- Standardization works well with outliers and normal data.
- Normalization is useful when you need a fixed range (e.g., 0 to 1).
- Always scale your features before training machine learning models!

In [9]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler

In [10]:
# Sample dataset
data = {"Height(cm)": [150, 160, 170, 180, 190], "Weight(kg)": [50, 60, 70, 80, 90]}
df = pd.DataFrame(data)
print("Original Data:")
print(df)

Original Data:
   Height(cm)  Weight(kg)
0         150          50
1         160          60
2         170          70
3         180          80
4         190          90


In [11]:
# --------- Standard Scaler --------------
std_scaler = StandardScaler()
df["Height_std"] = std_scaler.fit_transform(df[["Height(cm)"]])
df["Weight_std"] = std_scaler.fit_transform(df[["Weight(kg)"]])

In [12]:
# ----------- Min-Max Scaler ------------
minmax_scaler = MinMaxScaler()
df["Height_mm"] = minmax_scaler.fit_transform(df[["Height(cm)"]])
df["Weight_mm"] = minmax_scaler.fit_transform(df[["Weight(kg)"]])

In [13]:
print("\nScaled Data:")
print(df)


Scaled Data:
   Height(cm)  Weight(kg)  Height_std  Weight_std  Height_mm  Weight_mm
0         150          50   -1.414214   -1.414214       0.00       0.00
1         160          60   -0.707107   -0.707107       0.25       0.25
2         170          70    0.000000    0.000000       0.50       0.50
3         180          80    0.707107    0.707107       0.75       0.75
4         190          90    1.414214    1.414214       1.00       1.00
