# How to Choose a Feature Selection Method For Machine Learning

Author: Jason Brownlee

Article from [machinelearningmastery](https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/).

> Note: In this notebook, I am studying the article mentioned above. Some changes may have been made to the code during its implementation.

# Regression Feature Selection
## (Numerical Input, Numerical Output)

### Pearson's correlation feature selection for numeric input and numeric output

In [1]:
from sklearn.datasets import make_regression
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_regression

### Generate dataset

In [2]:
X, y = make_regression(n_samples=100, n_features=100, n_informative=10)

### Define feature selection

In [3]:
fs = SelectKBest(score_func=f_regression, k=10)

### Apply feature selection

In [5]:
X_selected = fs.fit_transform(X, y)
print(X_selected.shape)
X_selected

(100, 10)


array([[-1.16229135e+00, -1.05503437e+00, -3.08969404e-01,
         2.60453264e-01,  7.59393326e-01, -3.49248338e-01,
        -3.73927128e-01,  2.55724082e+00, -1.20316256e+00,
         1.21019212e+00],
       [-4.76555275e-01, -2.30927302e-01, -1.74843065e+00,
        -1.20635897e+00, -5.34707094e-01,  1.95235061e-01,
        -2.94345287e-01,  8.57314321e-01, -5.36617067e-01,
         2.02579419e+00],
       [-2.65612556e+00,  8.27355866e-01,  1.92813807e-01,
         4.10003818e-01,  1.04880563e-01, -6.95203924e-01,
        -7.42591444e-01, -1.11435294e+00,  1.11744173e+00,
         9.97855673e-01],
       [-7.97424906e-01,  1.14812496e+00,  1.95890008e-01,
        -9.83714920e-01, -9.32145576e-02,  3.55200776e-01,
        -4.03016529e-01,  1.26149161e+00,  1.90881534e-02,
         2.11505516e-01],
       [ 3.35188789e-01, -6.05198115e-01, -2.11733487e-01,
        -1.03705128e+00,  1.45562596e+00,  8.70376384e-01,
         9.00966157e-01,  1.41705197e+00, -2.18905087e+00,
        -3.

# Classification Feature Selection
## (Numerical Input, Categorical Output)

### ANOVA feature selection for numeric input and categorical output

In [6]:
from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif

### Generate dataset

In [7]:
X, y = make_classification(n_samples=100, n_features=20, n_informative=2)

### Define feature selection

In [8]:
fs = SelectKBest(score_func=f_classif, k=2)

### Apply feature selection

In [10]:
X_selected = fs.fit_transform(X, y)
print(X_selected.shape)
X_selected

(100, 2)


array([[-0.56943   ,  0.28180863],
       [ 0.00627038,  0.08574417],
       [-0.34142192,  0.30965929],
       [-0.25665827, -0.16242079],
       [-1.11653296,  1.0629519 ],
       [-0.25790876,  0.15935724],
       [ 1.05746933, -0.5916832 ],
       [-0.24073026,  0.28590474],
       [-1.22799121,  0.85327792],
       [ 0.25941541, -0.51686396],
       [-2.0817453 ,  1.7954972 ],
       [ 0.03390062, -0.04492714],
       [-0.14669345,  0.03191718],
       [-0.96918876,  0.62043682],
       [-0.1604327 ,  0.0276893 ],
       [ 0.27795059, -0.26745862],
       [-2.82943671,  2.97164807],
       [ 0.62672246, -0.12588204],
       [-1.83741528,  1.0260884 ],
       [-1.14234673,  1.07837916],
       [ 0.76021502, -0.82015157],
       [-1.02331392,  1.20832387],
       [ 1.22827709, -0.8796595 ],
       [ 1.33904867, -0.79494072],
       [-1.08871459,  0.86324912],
       [-0.46844392,  0.50777241],
       [ 2.1568822 , -1.80664268],
       [ 1.02024937, -0.64265394],
       [ 0.05346047,

# The categorical feature selection is in another article, [link here](https://machinelearningmastery.com/feature-selection-with-categorical-data/).