## Removing features with low variance
> [VarianceThreshold](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.VarianceThreshold.html#sklearn.feature_selection.VarianceThreshold) is a simple baseline approach to feature selection. It removes all features whose variance doesn’t meet some threshold. 

In [4]:
from sklearn.feature_selection import VarianceThreshold
X = [[0, 0, 1],
     [0, 1, 0],
     [1, 0, 0],
     [0, 1, 1],
     [0, 1, 0],
     [0, 1, 1]]

threshold = 0.8 * (1 - 0.8) # We want to remove all features that are either one or zero (on or off) in more than 80%
                            # Boolean features are Bernoulli random variables, and the variance of such variables 
                            # is given by p * (1-p)
sel = VarianceThreshold(threshold)
X = sel.fit_transform(X)

print(X)

[[0 1]
 [1 0]
 [0 0]
 [1 1]
 [1 0]
 [1 1]]


## Univariate feature selection
> Univariate feature selection works by selecting the best features based on univariate statistical tests. 
> - [SelectKBest](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest) removes all but the k highest scoring features
> - [SelectPercentile](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html#sklearn.feature_selection.SelectPercentile) removes all but a user-specified highest scoring percentage of features
> - using common univariate statistical tests for each feature: false positive rate [SelectFpr](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFpr.html#sklearn.feature_selection.SelectFpr), false discovery rate [SelectFdr](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFdr.html#sklearn.feature_selection.SelectFdr), or family wise error [SelectFwe](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFwe.html#sklearn.feature_selection.SelectFwe).
> - [GenericUnivariateSelect](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.GenericUnivariateSelect.html#sklearn.feature_selection.GenericUnivariateSelect) allows to perform univariate feature selection with a configurable strategy. This allows to select the best univariate selection strategy with hyper-parameter search estimator.

In [5]:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
iris = load_iris()
X, y = iris.data, iris.target
X.shape
print(X.shape)
X_new = SelectKBest(chi2, k=2).fit_transform(X, y)
X_new.shape
print(X_new.shape)

(150, 4)
(150, 2)


> A scoring function returns univariate scores and p-values
> Scoreing function:
> - For regression: [f_regression](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html#sklearn.feature_selection.f_regression), [mutual_info_regression](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_regression.html#sklearn.feature_selection.mutual_info_regression)
> - For classification: [chi2](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html#sklearn.feature_selection.chi2), [f_classif](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html#sklearn.feature_selection.f_classif), [mutual_info_classif](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html#sklearn.feature_selection.mutual_info_classif)

## Recursive feature elimination¶
> Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), recursive feature elimination is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a **coef_** attribute or through a **feature_importances_** attribute. Then, the least important features are pruned from current set of features.That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

[Example](http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_digits.html#sphx-glr-auto-examples-feature-selection-plot-rfe-digits-py)

## Feature selection using SelectFromModel¶
> The features are considered unimportant and removed, if the corresponding **coef_** or **feature_importances_** values are below the provided threshold parameter.

> Note: Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.

### L1-based feature selection
> Linear models penalized with the L1 norm have sparse solutions: many of their estimated coefficients are zero. When the goal is to reduce the dimensionality of the data to use with another classifier, they can be used along with [feature_selection.SelectFromModel](http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html#sklearn.feature_selection.SelectFromModel) to select the non-zero coefficients. 


In [6]:
from sklearn.svm import LinearSVC
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel
iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)
lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
model = SelectFromModel(lsvc, prefit=True)
X_new = model.transform(X)
print(X_new.shape)

(150, 4)
(150, 3)


### Tree-based feature selection
> Tree-based estimators (see the [sklearn.tree](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree) module and forest of trees in the [sklearn.ensemble](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble) module) can be used to compute feature importances, which in turn can be used to discard irrelevant features.

In [7]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectFromModel
iris = load_iris()
X, y = iris.data, iris.target
print(X.shape)

clf = ExtraTreesClassifier()
clf = clf.fit(X, y)
clf.feature_importances_  

model = SelectFromModel(clf, prefit=True)
X_new = model.transform(X)
print(X_new.shape)

(150, 4)
(150, 2)


### Note : The contents above referred to [sklearn - feature selection](http://scikit-learn.org/stable/modules/feature_selection.html`).

## Sequential Feature Selector
> Sequential feature selection algorithms are a family of greedy search algorithms that are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k < d. The motivation behind feature selection algorithms is to automatically select a subset of features that is most relevant to the problem. The goal of feature selection is two-fold: We want to improve the computational efficiency and reduce the generalization error of the model by removing irrelevant features or noise. A wrapper approach such as sequential feature selection is especially useful if embedded feature selection -- for example, a regularization penalty like LASSO -- is not applicable.

> In a nutshell, SFAs remove or add one feature at the time based on the classifier performance until a feature subset of the desired size k is reached. There are 4 different flavors of SFAs available via the 
> - Sequential Forward Selection (SFS)
> - Sequential Backward Selection (SBS)
> - Sequential Floating Forward Selection (SFFS)
> - Sequential Floating Backward Selection (SFBS)

> The floating algorithms have an additional exclusion or inclusion step to remove features once they were included (or excluded), so that a larger number of feature subset combinations can be sampled. 

**Note: The pseudo codes are described in detail in reference.**

In [16]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

iris = load_iris()
X = iris.data
y = iris.target
knn = KNeighborsClassifier(n_neighbors=4)
print(X.shape)

sfs1 = SFS(knn, 
           k_features=3, 
           forward=True, 
           floating=False, 
           verbose=1,
           scoring='accuracy',
           cv=0)

sfs1 = sfs1.fit(X, y)
X_new = sfs1.transform(X)
print(X_new.shape)

[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.0s finished
Features: 1/3[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.0s finished
Features: 2/3[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s finished
Features: 3/3

(150, 4)
(150, 3)


## Using k-fold cross validation to assess model performance

### The holdout method
![image.png](./holdout.png)
### K-fold cross-validation
![image.png](./img.png)

In [1]:
from sklearn.model_selection import cross_val_score
scores = cross_val_score(estimator=clf,
                         X=train_X,
                         y=train_Y,
                         cv=10,
                         n_jobs=1)
np.set_printoptions(3)
print('CV accuracy scores: {}'.format(scores))
print('CV accuracy: {:.3f} +/- {:.3f}'.format(np.mean(scores), np.std(scores)))

NameError: name 'clf' is not defined

## Fine-tuning machine learning models via grid search

### Tuning hyperparameters via grid search
[GridSearchCV sklearn document](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

In [5]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

parameters = {'clf__kernel':['linear', 'rbf'], 'clf__C': [0.01, 0.05, 10], 'clf__gamma': [0.1, 1, 'auto']}
clf = Pipeline([('scl', StandardScaler()), ('clf', SVC(random_state=1))])
gs = GridSearchCV(estimator=clf, 
                  param_grid=parameters, 
                  scoring='accuracy', 
                  cv=3,
                  n_jobs=-1)

gs = gs.fit(train_X, train_Y)
print(gs.best_score_)
print(gs.best_params_)

NameError: name 'train_X' is not defined

In [None]:
clf = gs.best_estimator_
print('Test accuracy: {:.3f}'.format(clf.score(test_X, test_Y)))

### Note : The contents above referred to [Sequential Feature Selector](https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/) and [document](https://rasbt.github.io/mlxtend/api_subpackages/mlxtend.feature_selection/).