## Data Transformation 

The Standard Scaler is a data normalization technique that transforms features to have a mean of 0 and a standard deviation of 1. It is commonly used in machine learning preprocessing when the data needs to be centered and scaled for algorithms sensitive to feature scales (e.g., linear regression, SVMs, PCA).

In [1]:
# import libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler

In [2]:
# Example dataset
data = {
    "ID": [1, 2, 3, 4, 5],
    "Age": [25, 30, 35, 40, 45],
    "Years_of_Experience": [2, 5, 7, 10, 15],
    "Salary": [50000, 60000, 70000, 80000, 90000]
}

# Convert to pandas DataFrame
df = pd.DataFrame(data)

# Display the first few rows
print(df.head())

   ID  Age  Years_of_Experience  Salary
0   1   25                    2   50000
1   2   30                    5   60000
2   3   35                    7   70000
3   4   40                   10   80000
4   5   45                   15   90000


In [3]:
# import the scalar
scalar = StandardScaler()

# fit the scalar on data
scaled_df = scalar.fit_transform(df)
scaled_df
# convert this data into a pandas dataframe
scaled_df = pd.DataFrame(scaled_df, columns=df.columns)
scaled_df.head()

Unnamed: 0,ID,Age,Years_of_Experience,Salary
0,-1.414214,-1.414214,-1.304772,-1.414214
1,-0.707107,-0.707107,-0.62989,-0.707107
2,0.0,0.0,-0.179969,0.0
3,0.707107,0.707107,0.494913,0.707107
4,1.414214,1.414214,1.619717,1.414214


Min-max scaling is a data normalization technique used to rescale features to a specific range, typically between 0 and 1. It ensures that all features contribute equally to the model and prevents features with large ranges from dominating those with smaller ranges.

In [4]:
# import the scalar
from sklearn.preprocessing import MinMaxScaler
scalar = MinMaxScaler()

# fit the scalar on data
scaled_df = scalar.fit_transform(df)
# convert this data into a pandas dataframe
scaled_df = pd.DataFrame(scaled_df, columns=df.columns)
scaled_df.head()

Unnamed: 0,ID,Age,Years_of_Experience,Salary
0,0.0,0.0,0.0,0.0
1,0.25,0.25,0.230769,0.25
2,0.5,0.5,0.384615,0.5
3,0.75,0.75,0.615385,0.75
4,1.0,1.0,1.0,1.0


The MaxAbsScaler is a normalization technique that scales each feature individually by dividing it by the maximum absolute value of that feature. It transforms data to the range [−1,1][−1,1] without shifting or centering the data (i.e., the mean remains unchanged). It is particularly useful for sparse data as it preserves sparsity.

In [5]:
# import the scalar
from sklearn.preprocessing import MaxAbsScaler
scalar = MaxAbsScaler

# Instantiate the MaxAbsScaler
scaler = MaxAbsScaler()

# Fit and transform the data
scaled_array = scaler.fit_transform(df)

# Convert the scaled array back to a pandas DataFrame
scaled_df = pd.DataFrame(scaled_array, columns=df.columns)

# Display the first few rows of the scaled DataFrame
print(scaled_df.head())


    ID       Age  Years_of_Experience    Salary
0  0.2  0.555556             0.133333  0.555556
1  0.4  0.666667             0.333333  0.666667
2  0.6  0.777778             0.466667  0.777778
3  0.8  0.888889             0.666667  0.888889
4  1.0  1.000000             1.000000  1.000000


The Robust Scaler is a preprocessing technique in machine learning used to standardize features by removing the median and scaling data according to the interquartile range (IQR). It is particularly useful for handling datasets with outliers.

In [6]:
from sklearn.preprocessing import RobustScaler

# import the scalar
scalar = RobustScaler()

# fit the scalar on data
scaled_df = scalar.fit_transform(df)
scaled_df
# convert this data into a pandas dataframe
scaled_df = pd.DataFrame(scaled_df, columns=df.columns)
scaled_df.head()

Unnamed: 0,ID,Age,Years_of_Experience,Salary
0,-1.0,-1.0,-1.0,-1.0
1,-0.5,-0.5,-0.4,-0.5
2,0.0,0.0,0.0,0.0
3,0.5,0.5,0.6,0.5
4,1.0,1.0,1.6,1.0
