### VarianceThreshold
VarianceThreshold is a simple baseline approach to feature selection.
It removes all features whose variance doesn’t meet some threshold. 
By default, it removes all zero-variance features, i.e. features that have the same value in all samples.

As an example, suppose that we have a dataset with boolean features, and we want to remove all features that are either one or zero (on or off) in more than 80% of the samples. 
Boolean features are Bernoulli random variables, and the variance of such variables is given by
so we can select using the threshold .8 * (1 - .8):  var = p(1-p)

In [1]:
from sklearn.feature_selection import VarianceThreshold

In [6]:
X = [[0, 0, 1], 
     [0, 1, 1], 
     [1, 0, 1], 
     [0, 1, 1], 
     [0, 1, 1], 
     [0, 1, 1]];

feature :\
         x1: [0], [0], [1], [0], [0], [0]  most zero    p=5/6 > 0.8 \
         x2: [0], [1], [0], [1], [1], [1]   \
         x3: [1], [1], [1], [1], [1], [1]  all one

In [9]:
t = .8 * (1 - .8)
sel = VarianceThreshold(threshold = t)
t

0.15999999999999998

In [10]:
sel.fit_transform(X)    

array([[0],
       [1],
       [0],
       [1],
       [1],
       [1]])

## Univariate feature selection

Univariate feature selection works by selecting the best features based on univariate statistical tests. It can be seen as a preprocessing step to an estimator.
SelectKBest removes all but the  highest scoring features.

In [11]:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

In [12]:
iris = load_iris()
X = iris.data
y = iris.target

In [13]:
s = SelectKBest(chi2, k=1 )

In [17]:
Xnew = s.fit_transform(X, y)

Xnew[:10]

array([[1.4],
       [1.4],
       [1.3],
       [1.5],
       [1.4],
       [1.7],
       [1.4],
       [1.5],
       [1.4],
       [1.5]])

## L1-based feature selection

In [25]:
from sklearn.svm import LinearSVC
from sklearn.feature_selection import SelectFromModel

In [26]:
m = LinearSVC(C=0.01, penalty="l1", dual=False, max_iter=10000)  # the smaller C the fewer features selected
clf = m.fit(X, y);

In [27]:
s = SelectFromModel(clf, prefit=True)

In [28]:
Xnew = s.transform(X)
Xnew.shape

(150, 3)

## Tree-based feature selection

In [29]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel

In [30]:
clf = ExtraTreesClassifier(n_estimators = 50)
clf = clf.fit(X, y)

In [31]:
clf.feature_importances_  

array([0.10890282, 0.04632831, 0.39845873, 0.44631014])

In [32]:
model = SelectFromModel(clf, prefit=True)
Xnew = model.transform(X)
Xnew.shape

(150, 2)