# Feature Scaling for Infrastructure Metrics

## Context
In SRE and observability, metrics exist on vastly different scales. For instance, CPU utilization is typically a percentage `[0-100]`, while Network Bytes In might be in the millions or billions `[10^6 - 10^9]`. 

When using distance-based Machine Learning models (like K-Means Clustering, PCA, or Support Vector Machines), features with larger ranges will disproportionately dominate the algorithm. To prevent an alert firing *only* because of network traffic and ignoring CPU spikes, we must **scale** our features.

## Objectives
- Load synthetic server telemetry with variables of drastically different scales.
- Implement and understand **StandardScaler** (Z-score normalization).
- Implement and understand **MinMaxScaler** (Normalization to a specific range).
- Visualize the effect of scaling on the data distribution.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

plt.style.use('ggplot')

### 1. Generating Disparate Telemetry Data
Let's create a dataset where Network Traffic dwarfs CPU usage.

In [None]:
np.random.seed(42)
n_samples = 500

# Feature 1: CPU Usage ranges mostly from 10% to 90%
cpu_usage = np.random.normal(loc=50, scale=15, size=n_samples)

# Feature 2: Network Traffic (Bytes/sec) ranges in the millions
network_traffic = np.random.normal(loc=5_000_000, scale=1_000_000, size=n_samples)

df = pd.DataFrame({
    'CPU_Usage_pct': cpu_usage,
    'Network_Bytes_Sec': network_traffic
})

# Plotting the unscaled data
plt.figure(figsize=(8, 5))
sns.scatterplot(x='CPU_Usage_pct', y='Network_Bytes_Sec', data=df, alpha=0.7)
plt.title("Unscaled Server Metrics")
plt.xlabel("CPU Usage (%)")
plt.ylabel("Network Traffic (Bytes/sec)")
plt.show()

print("Data Describe (Notice the massive difference in scale):")
print(df.describe().round(2))

### 2. Standardization (`StandardScaler`)

Standardization (or Z-score normalization) transforms the data so that it has a **mean of 0** and a **standard deviation of 1**.
This is the go-to default scaling method for many ML algorithms (like Logistic Regression, SVMs, and PCA).

Formula: $z = \frac{(x - \mu)}{\sigma}$

In [None]:
# Split into train and test sets to prevent data leakage
X_train, X_test = train_test_split(df, test_size=0.3, random_state=42)

# Initialize StandardScaler
std_scaler = StandardScaler()

# Fit on training data AND transform it
X_train_std = pd.DataFrame(std_scaler.fit_transform(X_train), columns=X_train.columns)

# Transform test data based on the mean/std learned from training data
X_test_std = pd.DataFrame(std_scaler.transform(X_test), columns=X_test.columns)

print("Mean after standardization (should be ~0):")
print(X_train_std.mean().round(2))
print("\nStandard Deviation after standardization (should be ~1):")
print(X_train_std.std().round(2))

### 3. Normalization (`MinMaxScaler`)

Min-Max scaling shrinks the data into a fixed range, usually **0 to 1**. 
This is often used in Neural Networks (like CNNs or ANNs) or when you need bounded values (e.g., image pixels).

In [None]:
# Initialize MinMaxScaler
minmax_scaler = MinMaxScaler()

X_train_minmax = pd.DataFrame(minmax_scaler.fit_transform(X_train), columns=X_train.columns)
X_test_minmax = pd.DataFrame(minmax_scaler.transform(X_test), columns=X_test.columns)

print("Min values after scaling (should be 0):")
print(X_train_minmax.min().round(2))
print("\nMax values after scaling (should be 1):")
print(X_train_minmax.max().round(2))

### 4. Visualizing the Difference
Let's look at how the data distribution shape remains identical, but the axes (scales) change drastically.

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# 1. Original
sns.scatterplot(x='CPU_Usage_pct', y='Network_Bytes_Sec', data=X_train, ax=axes[0], alpha=0.7)
axes[0].set_title("Original Data")

# 2. Standardized
sns.scatterplot(x='CPU_Usage_pct', y='Network_Bytes_Sec', data=X_train_std, ax=axes[1], alpha=0.7, color='blue')
axes[1].set_title("StandardScaler (Mean=0, Std=1)")

# 3. Normalized
sns.scatterplot(x='CPU_Usage_pct', y='Network_Bytes_Sec', data=X_train_minmax, ax=axes[2], alpha=0.7, color='green')
axes[2].set_title("MinMaxScaler (0 to 1)")

plt.tight_layout()
plt.show()

# Notice that the geometric shape of the points is exactly the same across all three plots.
# However, looking at the X and Y axes, we see the scales have been leveled out, ensuring
# Network Traffic doesn't accidentally completely overshadow CPU Usage.