# Feature Selection

Feature Selection is the process of selecting a subset of relevant features for use in a machine learning model building. Given an initial set of variables, how do we go about reducing the amount of variables to look at and focus only on the most important ones?

## Why should we select features?

- Simple models are easier to interpret
- Shorter training times
- Enhanced generalisation by reducing overfitting (by eliminating irrelevant features we improve the predictability)
- Easier to implement by developers
- Reduced risk of data errors during model use (reducing the exposure to errors in datasets)
- Variable redundancy (highly correlated variables are somewhat redundant)
- Bad learning behaviour in high dimensional spaces

## Feature Selection Procedure
A feature selection algorithm can be seen as the combination of a search technique for proposing new feature subset, along with an evaluation measure which scores the different feature subsets. Ideally, the procedure goes through the combination of all features and select the subset that is the most efficient, but:

- Computationally expensive
- Different feature subsets render optimal performance for different ML algorithms.

There are, hence, different methods of feature selection.

## Filter Selection Methods
Filter selection algorithms can be divided into three groups:

- Filter methods
- Wrapper methods
- Embedded methods

### Filter methods
Filter methods are feature selection methods that rely on the characteristics of the data, called __feature characteristics__. They only rely on the characterstics of the data, not on the algorithm, and allow for faster computation.

- Do not use ML algorithms
- Model agnostic
- Tend to be less computationally expensive
- Usually give lower prediction performance than a wrapper method
- Are very well suited for a quick screen and removal of irrelevant features

__Variance, Correlation, Unvariate selection__

Two step procedure:

1. Rank features according to a certain criteria (independently of the feature space)

    - Chi-Square
    - Univariate parametric tests (anova)
    - Mutual information
    - Variance to handle redundant, duplicated or correlated features
        - Constant features
        - Quasi-constant features
2. Select the highest ranking features

Filter methods may select redundant variables because they do not consider the relationships between features.

### Wrapper methods
Wrapper methods use predictive ML models to score the feature subset.

- Train a new model on each feature subset
- Tend to be very computationally expensive
- Usually provide the best performing feature subset for a given ML algorithm
- They may not provide the best feature combination for a different ML model

__Forward selection, Backwards selection, Exhaustive search__

### Embedded methods
Embedded methods perform feature selection as part of the model construction process.

- Consider the interaction between features and models
- They are less computationally expensive than wrapper methods, because they fit the ML model only once.

__LASSO, Tree importance__
