# How to Choose a Feature Selection Method For Machine Learning
by Jason Brownlee on June 30, 2020. [Here](https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/) in [Data Preparation](https://machinelearningmastery.com/category/data-preparation/)

Feature selection is the process of `reducing the number of input variables` when developing a predictive model.

It is desirable to reduce the number of input variables to both `reduce the computational cost` of modeling and, in some cases, to `improve the performance` of the model.

In this post, you will discover how to choose statistical measures for filter-based feature selection with `numerical` and `categorical` data.

After reading this post, you will know:

- There are two main types of feature selection techniques: `supervised` and `unsupervised`, and supervised methods may be divided into *wrapper*, *filter* and *intrinsic*.
- Filter-based feature selection methods use statistical measures to score the correlation or dependence between input variables that can be filtered to choose the most relevant features.
- Statistical measures for feature *`selection must be carefully chosen based on the data type of the input variable and the output or response variable`*.

## Overview
This tutorial is divided into 4 parts; they are:

1. Feature Selection Methods
2. Statistics for Filter Feature Selection Methods
    1. Numerical Input, Numerical Output
    2. Numerical Input, Categorical Output
    3. Categorical Input, Numerical Output
    4. Categorical Input, Categorical Output
3. Tips and Tricks for Feature Selection
    1. Correlation Statistics
    2. Selection Method
    3. Transform Variables
    4. What Is the Best Method?
4. Worked Examples
    - 1. Regression Feature Selection (Numerical Input, Numerical Output)
    - 2.a Classification Feature Selection (Numerical Input, Categorical Output)
    - 2.b Classification Feature Selection (Categorical Input, Categorical Output)

## 1. Feature Selection Methods
Feature selection methods are intended to reduce the number of input variables to those that are believed to be most useful to a model in order to predict the target variable.

Many models, especially those based on `regression slopes` and `intercepts`, will estimate parameters for every term in the model. Because of this, the presence of non-informative -*that are not relevant to the target variable*- variables can add uncertainty to the predictions and reduce the overall effectiveness of the model.

We can summarize feature selection as follows.

- Feature Selection: Select a subset of input features from the dataset.
    - Unsupervised: Do not use the target variable (e.g. remove redundant variables).
        - Correlation
    - Supervised: Use the target variable (e.g. remove irrelevant variables).
        - Wrapper: Search for well-performing subsets of features (maximizes model performance).
            - RFE
        - Filter: Select subsets of features based on their relationship with the target.
            - Statistical Methods
            - Feature Importance Methods
        - Intrinsic: Algorithms that perform automatic feature selection during training (intrinsically conduct feature selection).
            - Decision Trees
            - Rule-based models, MARS, randon forest, and the lasso, for example. 
- Dimensionality Reduction: Project input data into a lower-dimensional feature space (create a projection of the data resulting in entirely new input features).

## 2. Statistics for Filter Feature Selection Methods
It is common to use correlation type statistical measures between input and output variables as the basis for filter feature selection. Statistical measures is highly dependent upon the variable data types.

Common input variable data types:

- Numerical Variables
    - Integer Variables (1, 2, 3).
    - Floating Point Variables (0.1, 0.2, 0.3).
- Categorical Variables.
    - Boolean Variables (dichotomous) (True, False).
    - Ordinal Variables (1st, 2nd, 3rd).
    - Nominal Variables (r, g, b).

`Input variables` are those that are provided as input to a model. In feature selection, it is this group of variables that we wish to reduce in size. `Output variables` are those for which a model is intended to predict, often called the response variable.    

The type of response variable typically indicates the type of predictive modeling problem being performed.
- `Numerical Output`: Regression predictive modeling problem.
- `Categorical Output`: Classification predictive modeling problem.

Most of these techniques are `univariate`, *`meaning that they evaluate each predictor in isolation`*. In this case, the existence of correlated predictors makes it possible to select important, but redundant, predictors. The obvious consequences of this issue are that too many predictors are chosen and, as a result, `collinearity problems arise`.

Univariate statistical measures used for filter-based feature selection.
- Input Variable
    - Numerical
        - Output Variable
            - Numerical (`regression` predictive modeling)
                - Pearson’s correlation coefficient (linear).
                - Spearman’s rank coefficient (nonlinear)                
            - Categorical (`classification` predictive modeling)
                - ANOVA correlation coefficient (linear).
                - Kendall’s rank coefficient (nonlinear) (assume that the categorical variable is ordinal).
    - Categorical
        - Output Variable
            - Numerical (`regression` predictive modeling - *not encounter it often*)
                - ANOVA (use “Numerical Input, Categorical Output” methodsin reverse)
                - Kendall’s (use “Numerical Input, Categorical Output” methodsin reverse)
            - Categorical (`classification` predictive modeling)
                - Chi-Squared (contingency tables)
                - Mutual Information (agnostic to the data types, usefull for categorical and numerical)

## 3. Tips and Tricks for Feature Selection
Some additional considerations when using filter-based feature selection

### 1. Correlation Statistics
The scikit-learn library provides
- Pearson’s Correlation Coefficient: [f_regression()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html)
- ANOVA: [f_classif()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html)
- Chi-Squared: [chi2()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html)
- Mutual Information: [mutual_info_classif()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html) and [mutual_info_regression()](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_regression.html)

The SciPy library provides
- Kendall’s tau ([kendalltau](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html))
- Spearman’s rank correlation ([spearmanr](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.spearmanr.html)).

### 2. Selection Method
The scikit-learn library provides 
- Select the top k variables: [SelectKBest](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html)(most usefull)
- Select the top percentile variables: [SelectPercentile](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectPercentile.html)

### 3. Transform Variables
Consider transforming the variables in order to access different statistical methods.
- Transform a `categorical` variable to `ordinal`
- Transform `numerical` variable discrete (e.g. bins); try `categorical-based` measures
- Some statistical measures assume properties of the variables.
    - Pearson’s assumes `Gaussian probability distribution` and a linear relationship

### 4. What Is the Best Method?
- There is no best feature selection method.
- Just like there is no best set of input variables or best machine learning algorithm. 
- Use careful systematic experimentation. Try a range of `different models fit` on `different subsets of features` chosen via `different statistical measures` and discover what works best for your specific problem.

## 4. Worked Examples
This section provides worked examples of feature selection cases that you can use as a starting point.

### 1. Regression Feature Selection
#### (Numerical Input, Numerical Output)

Feature selection is performed using `Pearson’s Correlation Coefficient` via the `f_regression()` function.

`Note`. Running the example first creates the regression dataset, then defines the feature selection and applies the feature selection procedure to the dataset, returning a subset of the selected input features.

In [2]:
# pearson's correlation feature selection for numeric input and numeric output
from sklearn.datasets import make_regression
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression

# generate dataset sample
X, y = make_regression(n_samples=1000, n_features=100, n_informative=10)

# define feature selection
fs = SelectKBest(score_func=f_regression, k=10)

# apply feature selection
X_selected = fs.fit_transform(X, y)
print(X_selected.shape)

(1000, 10)


In [None]:
### 2.a Classification Feature Selection
#### (Numerical Input, Categorical Output)

Feature selection is performed using `ANOVA F measure` via the `f_classif()` function.

`Note`. Running the example first creates the classification dataset, then defines the feature selection and applies the feature selection procedure to the dataset, returning a subset of the selected input features.

In [3]:
# ANOVA feature selection for numeric input and categorical output
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

# generate dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2)

# define feature selection
fs = SelectKBest(score_func=f_classif, k=2)

# apply feature selection
X_selected = fs.fit_transform(X, y)
print(X_selected.shape)

(1000, 2)


#### 2.b Classification Feature Selection:
#### (Categorical Input, Categorical Output)
Feature selection with categorical inputs and categorical outputs

- [How to Perform Feature Selection with Categorical Data](https://machinelearningmastery.com/feature-selection-with-categorical-data/).


## Tutorials
- [How to Calculate Nonparametric Rank Correlation in Python](https://machinelearningmastery.com/how-to-calculate-nonparametric-rank-correlation-in-python/)
- [How to Calculate Correlation Between Variables in Python](https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/)
- [Feature Selection For Machine Learning in Python](https://machinelearningmastery.com/feature-selection-machine-learning-python/)
- [An Introduction to Feature Selection](https://machinelearningmastery.com/an-introduction-to-feature-selection/)