Scaling, standardizing, normalization is the same

# 1. Standardizing Data

### Description

Also calles as: 
* Z-score scaling
* Standard score scaling
* Z-score normalization.

Formula:

z = (x - u) / s

where z is the standardized value, x is the original value, u is the mean of the values, and s is the standard deviation of the values.

### Usage

* Assumes normal distribution. If the data is not normally distributed, standardization may not be appropriate, and other scaling or transformation techniques may be more effective.

* Outliers in the data can have a significant impact. Important to handle outliers before standardizing the data.

* Can magnify noise

* May not be appropriate for some algorithms (decision trees and random forests)

### Example  1

In [3]:
from sklearn import datasets
from sklearn import preprocessing

# Load sample data
iris = datasets.load_iris()
X = iris.data

# Scale data
X_scaled = preprocessing.scale(X)

# Print mean and standard deviation of the input data
print("Mean of X:", X.mean(axis=0))
print("Standard deviation of X:", X.std(axis=0))

print()

# Print mean and standard deviation of the scaled data
print("Mean of scaled data:", X_scaled.mean(axis=0))
print("Standard deviation of scaled data:", X_scaled.std(axis=0))

Mean of X: [5.84333333 3.05733333 3.758      1.19933333]
Standard deviation of X: [0.82530129 0.43441097 1.75940407 0.75969263]

Mean of scaled data: [-1.69031455e-15 -1.84297022e-15 -1.69864123e-15 -1.40924309e-15]
Standard deviation of scaled data: [1. 1. 1. 1.]


### Example 2

In [12]:
from sklearn import datasets
from sklearn import preprocessing

# Load sample data
iris = datasets.load_iris()
X = iris.data

scaler = preprocessing.StandardScaler().fit(X)

# Print mean and standard deviation of the input data
print("Mean of X:", scaler.mean_)
print("Standard deviation of X:", scaler.scale_)

#Scale data
X_scaled = scaler.transform(X)
print()

# Print mean and standard deviation of the scaled data
print("Mean of scaled data:", X_scaled.mean(axis=0))
print("Standard deviation of scaled data:", X_scaled.std(axis=0))

Mean of X: [5.84333333 3.05733333 3.758      1.19933333]
Standard deviation of X: [0.82530129 0.43441097 1.75940407 0.75969263]

Mean of scaled data: [-1.69031455e-15 -1.84297022e-15 -1.69864123e-15 -1.40924309e-15]
Standard deviation of scaled data: [1. 1. 1. 1.]


### Example 3

In [16]:
from sklearn import datasets
from scipy.stats import zscore

# Load sample data
iris = datasets.load_iris()
X = iris.data

X_scaled = zscore(X)

# Print mean and standard deviation of the input data
print("Mean of X:", scaler.mean_)
print("Standard deviation of X:", scaler.scale_)

print()

# Print the mean and standard deviation of the scaled dataset
print("Mean of scaled data:", X_scaled.mean(axis=0))
print("Standard deviation of scaled data:", X_scaled.std(axis=0))

Mean of X: [5.84333333 3.05733333 3.758      1.19933333]
Standard deviation of X: [0.82530129 0.43441097 1.75940407 0.75969263]

Mean of scaled data: [-1.69031455e-15 -1.84297022e-15 -1.69864123e-15 -1.40924309e-15]
Standard deviation of scaled data: [1. 1. 1. 1.]


# 2. Scaling features to a range

# 3. Scaling sparse data

# 4. Scaling data with outliers