P.S.Feature scaling. The StandardScaler is applied to standardize the features to have a mean=0 and variance=1. The scaler is fitted on the training set and then used to transform both the training and test sets. This is to prevent information leak from the test set into the training set.

Import required libraries 

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

Load data

In [2]:
data = pd.DataFrame({
    "people": ['A', 'B', 'C'], 
    "salary": [70000,60000,52000], 
    "age": [45,44,40]
})

Print data

In [3]:
print(data)

  people  salary  age
0      A   70000   45
1      B   60000   44
2      C   52000   40


Normalization

In [4]:
c_1 = (data.iloc[:,1]-data.iloc[:,1].min())/(data.iloc[:,1].max()-data.iloc[:,1].min())
c_2 = (data.iloc[:,2]-data.iloc[:,2].min())/(data.iloc[:,2].max()-data.iloc[:,2].min())

result = pd.DataFrame({
    "people": ['A', 'B', 'C'], 
    "salary": c_1, 
    "age": c_2
})

print(result)

  people    salary  age
0      A  1.000000  1.0
1      B  0.444444  0.8
2      C  0.000000  0.0


In [5]:
## Using function
result=data
def normalize_column(column):
    min_val = column.min()
    max_val = column.max()
    normalized_column = (column - min_val) / (max_val - min_val)
    return normalized_column

result["salary"] = normalize_column(data["salary"])
result["age"] = normalize_column(data["age"])


print(result)

  people    salary  age
0      A  1.000000  1.0
1      B  0.444444  0.8
2      C  0.000000  0.0


In [6]:
# Initialize MinMaxScaler
result=data
scaler = MinMaxScaler()

# Normalize the "Salary" and "Age" columns
result[["salary", "age"]] = scaler.fit_transform(data[["salary", "age"]])
print(result)

  people    salary  age
0      A  1.000000  1.0
1      B  0.444444  0.8
2      C  0.000000  0.0


Standardization

In [7]:

result=data
# Initialize StandardScaler
scaler = StandardScaler()

# Fit and transform the DataFrame
result = pd.DataFrame(scaler.fit_transform(data[['salary', 'age']]))

print(result)

          0        1
0  1.267500  0.92582
1 -0.090536  0.46291
2 -1.176965 -1.38873


In [8]:
# OR

result=data
# Standardize the DataFrame using pandas
result = (result[['salary', 'age']] - result[['salary', 'age']].mean()) / result[['salary', 'age']].std()

print(result)

     salary       age
0  1.034910  0.755929
1 -0.073922  0.377964
2 -0.960988 -1.133893


Guidelines:

For algorithms that rely on distance calculations or gradient descent, normalization is often preferred because it scales the features to a specific range, preventing one feature from dominating others.

For algorithms that assume a standard normal distribution of the input data or those that rely on the mean and variance of the features (e.g., linear regression, logistic regression), standardization is usually more appropriate.

In many cases, trying both standardization and normalization and comparing the performance of your model can help determine which preprocessing technique works better for your specific problem.

Remember that the choice between standardization and normalization is not strictly black and white, and the optimal choice may depend on the characteristics of your data and the requirements of your chosen machine learning algorithm.

| Normalization                    | Standardization                                   |
|:-----------------------------------:|:---------------------------------------------------:|
| This method scales the model using minimum and maximum values. | This method scales the model using the mean and standard deviation. |
| When features are on various scales, it is functional. | When a variable's mean and standard deviation are both set to 0, it is beneficial. |
| Values on the scale fall between [0, 1] and [-1, 1]. | Values on a scale are not constrained to a particular range. |
| Additionally known as scaling normalization. | This process is called Z-score normalization. |
| When the feature distribution is unclear, it is helpful. | When the feature distribution is consistent, it is helpful. |
