## CHAPTER 10
---
# DIMENSIONALITY REDUCTION USING FEATURE SELECTION

---
In Chapter 9, we discussed how to reduce the dimensionality of our feature matrix by creating new features with (ideally) similar ability to train quality models but with significantly fewer dimensions. This is called `feature extraction`. In this chapter we will cover an alternative approach: selecting high-quality, informative features and dropping less useful features. This is called `feature selection`.

There are three types of feature selection methods:
- `Filter`: select the best features by examining their statistical properties
- `Wrapper`: use trial and error to find the subset of features that produce models with the highest quality predictions 
- `Embedded`: select the best feature subset as part or as an extension of a learning algorithm’s training process

In this chapter we cover only filter and wrapper feature selection methods

## 10.1 Thresholding Numerical Feature Variance

- You have a set of numerical features and want to remove those with low variance (i.e., likely containing little information).
- Select a subset of features with variances above a given threshold.

In [1]:
# Load libraries
from sklearn import datasets
from sklearn.feature_selection import VarianceThreshold

# import some data to play with
iris = datasets.load_iris()

# Create features and target
features = iris.data
target = iris.target

# Create thresholder
thresholder = VarianceThreshold(threshold=.5)

# Create high variance feature matrix
features_high_variance = thresholder.fit_transform(features)

# View high variance feature matrix
features_high_variance[0:3]

array([[5.1, 1.4, 0.2],
       [4.9, 1.4, 0.2],
       [4.7, 1.3, 0.2]])

#### Discussion:
Variance thresholding (VT) is one of the most basic approaches to feature selection. It is motivated by the idea that features with low variance are likely less interesting (and useful) than features with high variance. VT first calculates the variance of each feature, then it drops all features whose variance does not meet that threshold:
$$
operatornameVar(x) = \frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2
$$
where
- $x$ is the feature vector, 
- $x_i$ is an individual feature value, and 
- $\mu$ is that feature’s mean value. 

In [2]:
# View variances
thresholder.fit(features).variances_

array([0.68112222, 0.18871289, 3.09550267, 0.57713289])

If the features have been standardized (to mean zero and unit variance), then for obvious reasons variance thresholding will not work correctly:

In [3]:
# Load library
from sklearn.preprocessing import StandardScaler

# Standardize feature matrix
scaler = StandardScaler()
features_std = scaler.fit_transform(features)

# Caculate variance of each feature
selector = VarianceThreshold()
selector.fit(features_std).variances_

array([1., 1., 1., 1.])

## 10.2 Thresholding Binary Feature Variance

- You have a set of binary categorical features and want to remove those with low variance (i.e., likely containing little information).
- Select a subset of features with a Bernoulli random variable variance above a given threshold: