In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler

df = pd.DataFrame({
    'Hours Study': [2, 3, 4, 5, 6],
    'Marks Scored': [50, 60, 70, 80, 90]
})
df

Unnamed: 0,Hours Study,Marks Scored
0,2,50
1,3,60
2,4,70
3,5,80
4,6,90


### Standard scaler

In [2]:
# StandardScaler is a data preprocessing tool from scikit-learn used to normalize numerical features.
# It transforms the data such that each feature has a mean of 0 and a standard deviation of 1.
# Formula: z = (x - mean) / std_dev
# This is particularly useful for algorithms that assume normally distributed data.
# scaler.fit_transform(df) is used to compute the mean and std deviation, and then apply the transformation.
scaler = StandardScaler()
df_standardized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
df_standardized

Unnamed: 0,Hours Study,Marks Scored
0,-1.414214,-1.414214
1,-0.707107,-0.707107
2,0.0,0.0
3,0.707107,0.707107
4,1.414214,1.414214


In [3]:
# MinMaxScaler is another data scaling technique from scikit-learn, used to transform features into a fixed range, usually [0, 1].
# It scales each feature individually such that the minimum value of the feature becomes 0 and the maximum value becomes 1.
# Formula: X_scaled = (X - X_min) / (X_max - X_min), where X is the original value, X_min is the minimum value of the feature, and X_max is the maximum value of the feature.
# This is useful when the distribution of data is not Gaussian or when you want to preserve zeros in sparse data.
min_max_scaler = MinMaxScaler()
df_min_max_scaled = pd.DataFrame(min_max_scaler.fit_transform(df), columns=df.columns)
df_min_max_scaled

Unnamed: 0,Hours Study,Marks Scored
0,0.0,0.0
1,0.25,0.25
2,0.5,0.5
3,0.75,0.75
4,1.0,1.0


| Feature / Point        | **StandardScaler**                                 | **MinMaxScaler**                                 |
| ---------------------- | -------------------------------------------------- | ------------------------------------------------ |
| What it does           | Scales data so mean = 0 and standard deviation = 1 | Scales data to a fixed range (default 0 to 1)    |
| Formula                | (x - mean) \ standard deviation                    | ((x - x_{min}) / (x_{max} - x_{min}))            |
| Output Range           | Not fixed (can be negative or >1)                  | Fixed (usually 0 to 1, can change range)         |
| Effect on Distribution | Keeps shape but changes scale                      | Keeps shape but compresses data                  |
| Sensitive to Outliers  | Less sensitive                                     | Highly sensitive                                 |
| Common Use Cases       | Linear/Logistic Regression, SVM, K-means, PCA      | Neural Networks, Gradient Descent-based models   |
| When to Use            | When features vary widely or outliers exist        | When data needs to be bounded (like input to NN) |
| Example Output         | Centered around 0                                  | Between 0 and 1                                  |