### Performing Feature Scaling

Many machine learning algorithms are sensitive to the variable scale. For example, the coefficients of linear models depend on the scale of the feature  that is, changing the feature scale will change the coefficient's value. In linear models, as well as in algorithms that depend on distance calculations such as clustering and principal component analysis, features with larger value ranges tend to dominate over features with smaller ranges. Therefore, having features on a similar scale allows us to compare feature importance and may help algorithms converge faster, improving performance and training times.


- Standardizing the features
- Scaling to the maximum and minimum values
- Scaling with the median and quantiles
- Performing mean normalization
- Implementing maximum absolute scaling
- Scaling to vector unit length

### Standardizing the features

Standardization is the process of centering the variable at 0 and standardizing the variance to 1. To standardize features, we subtract the mean from each observation and then divide the result by the  standard deviation:

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from  sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X.drop(labels=['Latitude','Longitude'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3, random_state=0)

Next, we'll set up the StandardScaler() function from scikit-learn and fit it to the train set so that it learns each variable's mean and standard deviation:

In [2]:
scaler = StandardScaler().set_output(transform='pandas')
scaler.fit(X_train)

Now, let's standardize the train and test sets with the trained scaler:

In [4]:
X_train_scaled  = scaler.transform(X_train)
X_train_scaled.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
1989,-1.000304,1.85621,-1.146823,-0.871975,-1.07187,0.259828,0.513964,-0.111314
256,-0.849386,1.141712,-0.765855,0.197335,-0.512516,-0.003271,0.999935,-1.317384
7887,1.286205,-0.922393,0.461027,-0.015158,-0.033194,0.112104,-0.822456,0.760844
4581,-1.1352,-0.922393,-1.2701,0.223585,1.414342,0.001526,-0.733673,0.641234
1993,-0.870432,1.697432,-0.350004,0.147772,-0.712596,0.085648,0.513964,-0.121281
