# Feature Scaling

Different features in a dataset can have very different ranges. For example, "income" might range from 20,000 to 200,000, while "age" ranges from 18 to 70.
Many algorithms (like logistic regression, k-nearest neighbors, or gradient descent-based models) perform poorly when features are on very different scales, because large-valued features dominate.

That's why feature scaling is an important preprocessing step.
I'll present two common methods: Min-Max scaling (0-1 scaling) and Standardization (normalizing to mean 0, std 1).

In [22]:
import numpy as np
import pandas as pd

np.random.seed(42)

# Create a dataset with 3 features, one of which has a much larger scale
X = np.random.rand(100, 3)
X[:, 0] = X[:, 0] * 10000
X[:, 1] = X[:, 1] + 10

print("Unscaled:")
print(X[:5])
print("1st feature: min", min(X[:, 0]), "max", max(X[:, 0]))
print("2st feature: min", min(X[:, 1]), "max", max(X[:, 1]))
print("3st feature: min", min(X[:, 2]), "max", max(X[:, 2]))



Unscaled:
[[3.74540119e+03 1.09507143e+01 7.31993942e-01]
 [5.98658484e+03 1.01560186e+01 1.55994520e-01]
 [5.80836122e+02 1.08661761e+01 6.01115012e-01]
 [7.08072578e+03 1.00205845e+01 9.69909852e-01]
 [8.32442641e+03 1.02123391e+01 1.81824967e-01]]
1st feature: min 55.22117123602399 max 9900.538501042633
2st feature: min 10.005061583846219 max 10.985650454110601
3st feature: min 0.006952130531190703 max 0.9699098521619943


## Min-max scaling

$x'= \frac{min(x)}{(max(x)-min(x))}$

In [24]:
X_minmax = (X-X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))

print("Min-Max scaled:")
print(X_minmax[:5])

print("1st feature: min", min(X_minmax[:, 0]), "max", max(X_minmax[:, 0]))
print("2st feature: min", min(X_minmax[:, 1]), "max", max(X_minmax[:, 1]))
print("3st feature: min", min(X_minmax[:, 2]), "max", max(X_minmax[:, 2]))

Min-Max scaled:
[[0.37481575 0.96437228 0.75293213]
 [0.60245531 0.15394531 0.15477563]
 [0.0533873  0.87816065 0.61701866]
 [0.71358844 0.01583019 1.        ]
 [0.83991251 0.21138066 0.1815997 ]]
1st feature: min 0.0 max 1.0
2st feature: min 0.0 max 1.0
3st feature: min 0.0 max 1.0


## Standardizing

$x'= \frac{x-\mu}{\sigma}$

In [27]:
X_standard = (X-X.mean(axis=0)) / np.std(X, axis=0)

print("Standardized:")
print(X_standard[:5])

print("1st feature: min:", min(X_standard[:, 0]), " max:", max(X_standard[:, 0]))
print("2st feature: min:", min(X_standard[:, 1]), " max:", max(X_standard[:, 1]))
print("3st feature: min:", min(X_standard[:, 2]), " max:", max(X_standard[:, 2]))

Standardized:
[[-0.35958026  1.4198383   0.82216804]
 [ 0.42811633 -1.15479213 -1.18711031]
 [-1.47181267  1.14595419  0.36561862]
 [ 0.81266807 -1.59356748  1.65209819]
 [ 1.24978474 -0.97232681 -1.09700508]]
1st feature: min: -1.6565476851937493  max: 1.8037322538398184
2st feature: min: -1.6438581271334913  max: 1.5330233528481956
3st feature: min: -1.7070199406136317  max: 1.652098194094035


# Example: Support Vector Machine Classification

Let's try SVM with unscaled and standardized data:

In [48]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Make synthetic data with features on very different scales
X, y = make_classification(
    n_samples=1000,
    n_features=3,
    n_informative=3,
    n_redundant=0,
    random_state=42
)

# Artificially rescale one feature to be huge
X[:, 0] = X[:, 0] * 10000   # one dominant feature
X[:, 1] = X[:, 1] * 0.01   # one tiny feature

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)


In [43]:
from sklearn.svm import SVC

# SVM on unscaled data
svm_unscaled = SVC()
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)

print("SVM (Unscaled):")
print(classification_report(y_test, y_pred_unscaled))

SVM (Unscaled):
              precision    recall  f1-score   support

           0       0.91      0.84      0.87       140
           1       0.81      0.89      0.85       110

    accuracy                           0.86       250
   macro avg       0.86      0.86      0.86       250
weighted avg       0.86      0.86      0.86       250



In [51]:
# Apply StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm_scaled = SVC()
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)

print("SVM (Standardized):")
print(classification_report(y_test, y_pred_scaled))

SVM (Standardized):
              precision    recall  f1-score   support

           0       0.96      0.91      0.93       140
           1       0.90      0.95      0.92       110

    accuracy                           0.93       250
   macro avg       0.93      0.93      0.93       250
weighted avg       0.93      0.93      0.93       250



## Conclusion

Scaling has a clear impact on SVM performance. On unscaled data, the model struggles to balance contributions from features with different ranges, resulting in lower accuracy, precision and recall.
After standardization, all features contribute equally, and the SVM achieves noticeably higher accuracy and balanced F1-scores for both classes.

Key takeaway: SVMs (and other distance-based models) almost always require feature scaling to perform optimally.