#Feature Scaling
**Feature Scaling** is the concept that rescales the measurement of features so that each will have more of an equal factor when performing Machine Learning.  This scaling is usually reduced to a range between 0 and 1. The formula for feature scaling is the following:

>$x_{rescaled} = \frac{x - x_{min}}{x_{max} - x_{min}}$

Some Algorithms that would be affected by Feature Scaling are:
- Support Vector Machine with RBF kernel
- K-means Clustering

##Min/Max Scaler in sklearn

In [1]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
x_train = np.array([[1.,-1.,2.],[2.,0.,0.],[0.,1.,-1.]])

mms = MinMaxScaler()

x_train_mms = mms.fit_transform(x_train)

x_train_mms

array([[ 0.5       ,  0.        ,  1.        ],
       [ 1.        ,  0.5       ,  0.33333333],
       [ 0.        ,  1.        ,  0.        ]])

#Feature Selection

Feature Selection is the concept of choosing the minimum number of features that have maximum amount of predictibility.

The process of adding new features is:
- Use your human intuition to determine whether that feature is a good selection or has some predictive power
- Code up the new feature
- Visualize the new feature through plots
- Repeat

It is sometime a viable option to remove features because they are too noisy, or may cause overfitting, or may be strongly related to an already present feature.  This feature removal can have the advantage of speeding up your algorithm process and generalize your model for better prediction.


It is good to note that features and information are separate entities.  Features attempts to access information, but are not information themselves.

##Feature Selection in sklearn
Feature selection can be done by an individual component (*SelectPercentile*, and *SelectKBest*) or it can also be done inside of the Text vectorizer component, i.e *TfidfVectorizer*.  The parameters to control feature in the Tfdif vectorizer are:
- max_df
- max_features
- stop_words

All of those parameters removes words that have surpassed some sort of threshold.

In [2]:
from sklearn.feature_selection import f_classif, SelectPercentile

sel = SelectPercentile(f_classif, percentile=10)

The code above shows how to perform feature selection based on a percentile threshold.  This function would go through all of the features and choose the top 10% of features which (hopefully)generate the most information.

##Bias Variance Tradeoff and Feature Selection

If you use a few features, you run the risk of having a **high-bias** model due to it being oversimplified.  Conversly, if you a lot of features, you run the risk of having a **high-variance** model due to it being overfitted.  This is why choosing the right number of features becomes important in creating a generalized model.

##Regularization (Lasso Regression)

**Regualarization** is the method for penalizing extra features within your regression model.  **Lasso Regression** performs regularization by using the following formula:

>$\min SSE + \lambda|\beta|$

where $\lambda$ is a penalty parameter and $\beta$ is the coefficient vector of my regression model

features that do not help the regression model have a coefficient that is set to zero.  Lasso Regression performs this by adding each feature one at a time and sets the coefficient based on the formula above.


##Lasso regression in sklearn

In [3]:
from sklearn.linear_model import Lasso

regr = Lasso()

We can check the coefficients of *regr* by calling the attribute **coef_**.