# Feature Scaling

Feature scaling is a way of transforming your data into a common range of values. There are two common scalings:

-  **Standardization**
Taking the standard normal. For every value, substract the mean and divide by standard deviaton.
This is making the data zero mean and unit variance along each feature.

-  **Normalization**
Scale every value so that they all are between 0 and 1.

Below are some examples: Suppose you have a dataframe ``df``, and two columns ``height`` and ``weight``.
I will standardize the height an dnormalize the weight.


In [None]:
#Standardize

df['height_standard'] = df['height'] - df['height'].mean() / df['height'].std()

#Normalize

df['weight_normal'] = df['weight'] - df['weight'].min() / df['weight'].max() - df['weight'].min()

## When Should I Use Feature Scaling?

In many machine learning algorithms, the result will change depending on the units of your data. This is especially true in two specific cases:

-  **When your algorithm uses a distance-based metric to predict.**

Support Vector Machines or k-nearest neighbours

-  **When you incorporate regularization.**

Regularization behaves differently for different scaling: Suppose you have an ℓ2 regularization on the problem above. It is easy to see that ℓ2 regularization pushes larger weights towards zero more strongly than smaller weights. So consider that you obtain some optimal values of 𝑤1 and 𝑤2 using your given unnormalized data matrix 𝑋. Now instead of using 𝑚2 as the unit of area, if I change the data to represent area in 𝑓𝑡2, the corresponding column of X will get multiplied by a factor of ~10. Therefore, you would expect the corresponding optimal coefficient 𝑤2 to go down by a factor of 10 to maintain the value of y. But, as stated before, the ℓ2 regularization now has a smaller effect because of the smaller value of the coefficient. So you will end up getting a larger value of 𝑤2 than you would have expected. This does not make sense — you did not change the information content of the data, and therefore, your optimal coefficients should not have changed.

## Exercise

In [2]:
# TODO: Add import statements
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler

# Assign the data to predictor and outcome variables
# TODO: Load the data
train_data = pd.read_csv('regularization_data.txt', header = None)
X = train_data.iloc[:, :-1]
y = train_data.iloc[:, -1]

# TODO: Create the standardization scaling object.
scaler = StandardScaler()

# TODO: Fit the standardization parameters and scale the data.
X_scaled = scaler.fit_transform(X)

# TODO: Create the linear regression model with lasso regularization.
lasso_reg = linear_model.Lasso()

# TODO: Fit the model.
lasso_reg.fit(X_scaled,y)

# TODO: Retrieve and print out the coefficients from the regression model.
reg_coef = lasso_reg.coef_
print(reg_coef)

[  0.           3.90753617   9.02575748  -0.         -11.78303187
   0.45340137]
