# Feature Scaling

Feature scaling though standardization and normalization can be an important preprocessing step for many machine learning algorithms. Standardization involves rescaling the features such that they have the properties of a standard normal distribution with a mean of zero and a standard deviation of one. Normalalization, in this context, refers to removing the skew from the data. 

**NOTE:** Remember that Normalization can mean many things. 

https://en.wikipedia.org/wiki/Normalization_(statistics)

While many algorithms (such as SVM, K-nearest neighbors, and logistic regression) require features to be normalized, intuitively we can think of Principle Component Analysis (PCA) as being a prime example of when normalization is important. In PCA we are interested in the components that maximize the variance. If one component (e.g. human height) varies less than another (e.g. weight) because of their respective scales (meters vs. kilos), PCA might determine that the direction of maximal variance more closely corresponds with the ‘weight’ axis, if those features are not scaled. As a change in height of one meter can be considered much more important than the change in weight of one kilogram, this is clearly incorrect.

To illustrate this, PCA is performed comparing the use of data with StandardScaler applied, to unscaled data. The results are visualized and a clear difference noted.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.pipeline import make_pipeline

from scipy.stats import boxcox

In [None]:
np.random.seed(1)

In [None]:
CLF = LogisticRegression(C=1E-6)

In [None]:
wine_df = pd.read_csv('data/wine.csv', header=None)

In [None]:
y = wine_df[0]
X = wine_df.drop(0, axis=1)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

## Compare Performance of three Model Pipelines

### Train and Predict an Unscaled Classifier

In [None]:
unscaled_clf = make_pipeline(PCA(n_components=2), 
                             CLF)
unscaled_clf.fit(X_train, y_train)
pred_test = unscaled_clf.predict(X_test)

### Train and Predict a Scaled Classifier

In [None]:
std_clf = make_pipeline(StandardScaler(), 
                        PCA(n_components=2), 
                        CLF)
std_clf.fit(X_train, y_train)
pred_test_std = std_clf.predict(X_test)

### Train and Predict a Deskewed, Scaled Classifier

There is no deskew class in `sklearn` so we can not put this in a `Pipeline`.

In [None]:
sc = StandardScaler()
pc = PCA(n_components=2)
clf = CLF

In [None]:
X_tr_bc = pd.DataFrame()
X_ts_bc = pd.DataFrame()
for col in X_train.columns:
    box_cox_trans_tr, lmbda = boxcox(X_train[col])
    box_cox_trans_ts = boxcox(X_test[col], lmbda)
    X_tr_bc[col] = pd.Series(box_cox_trans_tr)
    X_ts_bc[col] = pd.Series(box_cox_trans_ts)

In [None]:
X_tr_bc_sc = sc.fit_transform(X_tr_bc)
X_ts_bc_sc = sc.transform(X_ts_bc)

In [None]:
X_tr_bc_sc_pc = pc.fit_transform(X_tr_bc_sc)
X_ts_bc_sc_pc = pc.transform(X_ts_bc_sc)

In [None]:
CLF.fit(X_tr_bc_sc_pc, y_train)
pred_test_dsk_std = CLF.predict(X_ts_bc_sc_pc)

#### Prediction accuracy for the normal test dataset with PCA

In [None]:
print('{:.2%}\n'.format(metrics.accuracy_score(y_test, pred_test)))

#### Prediction accuracy for the standardized test dataset with PCA

In [None]:
print('{:.2%}\n'.format(metrics.accuracy_score(y_test, pred_test_std)))

#### Prediction accuracy for the deskewed, standardized test dataset with PCA

In [None]:
print('{:.2%}\n'.format(metrics.accuracy_score(y_test, pred_test_dsk_std)))

## Visualize Two Principal Component Analyses

### Pull the PCA models from the Pipeline

In [None]:
pca = unscaled_clf.named_steps['pca']
pca_std = std_clf.named_steps['pca']

### Perform the Data Transformations

In [None]:
scaler = std_clf.named_steps['standardscaler']
X_train_pca = pca.transform(X_train)
X_train_std_pca = pca_std.transform(scaler.transform(X_train))

### Visualize standardized vs. untouched dataset with PCA performed

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(15,6))

for l, c, m in zip(range(1, 4), ('blue', 'red', 'green'), ('^', 's', 'o')):
    ax1.scatter(X_train_pca[y_train == l, 0], X_train_pca[y_train == l, 1],
                color=c,
                label='class %s' % l,
                alpha=0.5,
                marker=m
                )

for l, c, m in zip(range(1, 4), ('blue', 'red', 'green'), ('^', 's', 'o')):
    ax2.scatter(X_train_std_pca[y_train == l, 0], X_train_std_pca[y_train == l, 1],
                color=c,
                label='class %s' % l,
                alpha=0.5,
                marker=m
                )

for l, c, m in zip(range(1, 4), ('blue', 'red', 'green'), ('^', 's', 'o')):
    ax3.scatter(X_tr_bc_sc_pc[y_train == l, 0], X_tr_bc_sc_pc[y_train == l, 1],
                color=c,
                label='class %s' % l,
                alpha=0.5,
                marker=m
                )

ax1.set_title('Training dataset after PCA')
ax2.set_title('Standardized training dataset after PCA')
ax3.set_title('Deskewed, Standardized training dataset after PCA')

for ax in (ax1, ax2, ax3):
    ax.set_xlabel('1st principal component')
    ax.set_ylabel('2nd principal component')
    ax.legend(loc='upper right')
    ax.grid()

plt.tight_layout()