## Feature Selection Techniques

<!-- <hr>

### Agenda
1. Introduction to Feature Selection
2. VarianceThreshold
3. Chi-squared stats
4. ANOVA using f_classif
5. Univariate Linear Regression Tests using f_regression
6. F-score vs Mutual Information
7. Mutual Information for discrete value
8. Mutual Information for continues value
9. SelectKBest
10. SelectPercentile
11. SelectFromModel
12. Recursive Feature Elemination

<hr> -->

### Feature Selection
* Selecting features from the dataset
* Improve estimator's accuracy
* Boost preformance for high dimensional datsets
* Below we will discuss univariate selection methods
* Also, feature elimination method

In [18]:
from sklearn import feature_selection
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

### Mutual Information for classification using mutual_info_classification
* Returns dependency in the scale of 0 & 1 among feature & target
* Captures any kind of dependency even if non-linear
* Target is discrete in nature

In [19]:
df = pd.read_csv('data.csv')

In [20]:
df.drop(columns=["住院号", "CT号"], index=1, inplace=True)
df

Unnamed: 0,窦-连合LCC,窦-连合RCC,窦-连合NCC,周长LCC,周长RCC,周长NCC,AV面积,Valsalva窦,AV-annulus,STJ,...,mPA面积,LPA近端直径,LPA近端面积,RPA近端直径,RPA近端面积,LPA远端直径,LPA远端面积,RPA远端直径,RPA远端面积,M_rate
0,11.453,10.933,10.843,13.450,12.87,13.03,103.021,11.260,7.044,8.519,...,70.930,5.8005,26.726,6.2190,29.408,5.3275,23.481,6.2260,28.217,2.200000
2,15.841,16.684,17.376,18.330,18.64,18.66,218.995,18.601,10.912,15.496,...,74.419,10.3965,79.154,6.9230,38.345,10.4315,85.120,9.1750,59.943,1.600000
3,12.851,15.529,14.140,14.060,21.15,13.20,171.261,14.534,10.363,10.906,...,59.150,7.2950,38.526,7.7135,45.081,8.4760,55.135,8.0475,49.607,2.300000
4,13.544,15.585,13.894,15.030,18.30,16.04,179.615,15.632,11.879,11.100,...,31.678,4.9165,20.290,5.3655,22.471,5.0575,23.188,5.5550,22.715,1.500000
5,14.807,14.079,14.253,14.650,21.55,12.85,177.990,14.243,9.348,10.739,...,93.060,10.1610,83.448,8.1320,60.321,10.2425,72.613,10.2495,82.881,3.400000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,38.022,33.406,35.830,49.940,39.79,46.07,1210.000,28.680,38.848,36.935,...,239.771,18.3445,300.189,18.6595,288.627,20.1660,336.092,21.2175,361.737,1.400000
96,15.572,16.821,15.630,18.530,22.30,18.63,228.308,14.045,17.075,13.601,...,32.095,5.5115,24.453,7.7345,43.181,5.9665,29.450,9.7500,73.396,2.000000
97,17.198,17.711,17.396,20.301,18.95,24.29,281.499,19.777,15.244,14.001,...,66.164,6.0490,48.913,7.4500,43.808,13.0615,129.879,6.4935,32.027,1.936252
98,15.160,16.009,13.796,15.520,13.54,13.50,179.746,14.711,11.568,12.291,...,10.076,1.1270,2.513,4.3415,14.275,2.3375,4.835,8.4390,55.165,0.500000


In [22]:
## fit_transform(): Used on the training data so that we can scale the training data 
## and also learn the scaling parameters of that data.

for col in df.columns:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])

In [27]:
mutual_info_classification = feature_selection.mutual_info_classif(df.drop('M_rate', axis=1), df.M_rate) # mutual_info_classification

In [30]:
f_classif, pval = feature_selection.f_classif(df.drop('M_rate', axis=1), df.M_rate) # f_classif

In [31]:
f_regression, pval = feature_selection.f_regression(df.drop('M_rate', axis=1), df.M_rate) # f_regression

In [32]:
mutual_info_regression = feature_selection.mutual_info_regression(df.drop('M_rate', axis=1), df.M_rate) # mutual_info_regression

In [39]:
d = {
    'features': ['窦-连合LCC', '窦-连合RCC', '窦-连合NCC', '周长LCC', '周长RCC', '周长NCC', 'AV面积',
       'Valsalva窦', 'AV-annulus', 'STJ', 'AO根部直径', 'AO根部面积', 'mPA直径', 'mPA面积',
       'LPA近端直径', 'LPA近端面积', 'RPA近端直径', 'RPA近端面积', 'LPA远端直径', 'LPA远端面积',
       'RPA远端直径', 'RPA远端面积'],
    'mic': mutual_info_classification,
    'f_classif': f_classif,
    'f_regression': f_regression,
    'mir': mutual_info_regression
}
table = pd.DataFrame(data = d)


In [49]:
# df.columns
# table

### 11. SelectFromModel
* Selecting important features from model weights
* The estimator should support 'feature_importances'

### 12. Recursive Feature Elimination
* Uses an external estimator to calculate weights of features
* First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through a coef_ attribute or through a feature_importances_ attribute. 
* Then, the least important features are pruned from current set of features. 
* That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.