# Feature selection
- is the process of selecting a subset of relevant features for use in model construction.

- Statistical tests can be used to select those features that have the strongest relationship with the output variable.

- The ***scikit-learn*** library provides the ***SelectKBest*** class that can be used with different statistical tests to select a specific number of features.

***ANOVA F_Value***:
- Univariate linear regression tests.
- Linear model for testing the individual effect of each of many regressors. 

This is done in 2 steps:

- The correlation between each regressor and the target is computed, that is,
- It is converted to an F score then to a p-value.

In [None]:
import numpy as np
import pandas as pd

# importing the f_regression class 
from sklearn.feature_selection import f_regression, SelectKBest

#load the dataset
df = pd.read_excel('energy-efficient-dataset.xlsx')

X = df.iloc[:, 0:8] # selecting the first 8 columns into X
y1 = df.iloc[:, 8] # Loading the 9th column into y1 
y2 = df.iloc[:, 9] # Loading the 10th column into y2

# Number of features before applying f_regression
print("Features: {}".format(X.columns))
print('# features before r_regression: %d\n' % X.shape[1])

# selectkbset selects the k best features according to scores
test = SelectKBest(score_func = f_regression, k = 6).fit(X, y1)
X_new = test1.transform(X)
X_new = pd.DataFrame(X_new)

# Number of features after applying f_regression
print('# features before r_regression: %d\n' % X_new.shape[1])

***SelectKBest*** selects the best k features (k value is given manually).
- In the above case I have set k = 6 and SelectKbest chose the K features based on f_scores

In [31]:
for i in range(len(X.columns)):
    print('X%d F_value: %.2f'%(i+1, test1.scores_[i]))

Features: Index(['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8'], dtype='object')
# features before r_regression: 8

# features before r_regression: 6

X1 F_value: 484.05
X2 F_value: 585.26
X3 F_value: 200.73
X4 F_value: 2211.62
X5 F_value: 2900.59
X6 F_value: 0.01
X7 F_value: 60.16
X8 F_value: 5.89
