# Feature Selection

We all may have faced the problem of identifying the important features from a set of given data and removing the irrelevant or less important features which do not contribute much to our decision making in order to achieve better accuracy for our model.

In machine learning and statistics, feature selection, also known as **variable selection**, **attribute selection** or **variable subset selection**, is the process of reducing the number of input variables when developing a predictive model. Feature selection techniques are used for several reasons:

* It reduces model complexity by dropping some irrelevant features.
* Helps ML algorithm to train a model faster.
* Redcution of dimensionality helps in avoid overfitting.

In this notebook i will be discussing 3 common techniques used for feature selection which are easy to implement and will give you a good results based on problem. Following are the feature selection techniques:

1. **Univariate Selection**
2. **Feature Importance** 
3. **Correlation Matrix with Heatmap**



Before Discussing above three techniques let us go through the basic methodologies used for feature selection.

### 1. Filter Method:
Filter feature selection methods use statistical techniques to evaluate the relationship **between each input variable and the target variable**, and these scores are used as the basis to choose (filter) those input variables that will be used in the model.

The statistical measures used in filter-based feature selection are generally calculated one input variable at a time with the target variable. As such, they are referred to as **univariate statistical measures**. This may mean that any interaction between input variables is not considered in the filtering process.

**Note:-** In this case, the existence of correlated predictors makes it possible to select important, but redundant, predictors. The obvious consequences of this issue are that too many predictors are chosen and, as a result, collinearity problems arise. 


<img src = "https://www.analyticsindiamag.com/wp-content/uploads/2019/04/filte.jpg">


### 2. Wrapper Method:
Wrapper feature selection methods create many models with different subsets of input features and select those features that result in the best performing model according to a performance metric. These methods are unconcerned with the variable types, although they can be computationally expensive. Recursive Feature Elimination (RFE) is a good example of a wrapper feature selection method.

<img src = "https://www.analyticsvidhya.com/wp-content/uploads/2016/11/Wrapper_1.png">

1. __Forward Selection__: Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.

2. __Backward Elimination__: In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.

3. __Recursive Feature elimination__: It is a greedy optimization algorithm which aims to find the best performing feature subset. It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration. It constructs the next model with the left features until all the features are exhausted. It then ranks the features based on the order of their elimination.

### 3. Embedded Method:
Embedded methods combine the qualities’ of filter and wrapper methods.

<img src = "https://www.analyticsvidhya.com/wp-content/uploads/2016/11/Embedded_1.png">

# SUMMARY

<p>We can summarize feature selection as follows.</p>
<ul>
<li><strong>Feature Selection</strong>: Select a subset of input features from the dataset.
<ul>
<li><strong>Unsupervised</strong>: Do not use the target variable for selecting the feature importance of Input variable (e.g. remove redundant variables).
<ul>
<li>Correlation</li>
</ul>
</li>
<li><strong>Supervised</strong>: Use the target variable (e.g. remove irrelevant I/P features).
<ul>
<li><strong>Wrapper Method</strong>: Search for well-performing subsets of features.
<ul>
<li>Recursive Feature Elimination (RFE)</li>
</ul>
</li>
<li><strong>Filter Method</strong>: Select subsets of features based on their relationship with the target.
<ul>
<li>Statistical Methods</li>
<li>Feature Importance Methods</li>
</ul>
</li>
<li><strong>Intrinsic</strong>: Algorithms that perform automatic feature selection during training.
<ul>
<li>Decision Trees</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><strong>Dimensionality Reduction</strong>: Project input data into a lower-dimensional feature space.</li>
</ul>


**To get the cheat sheet for Feature selection Techniques in ML please follow the below link:**
https://www.kaggle.com/getting-started/186082


# Univariate Method

* Statistical tests can be used to select those features that have the strongest relationship with the output variable.

* The scikit-learn library provides the SelectKBest class that can be used with a suite of different statistical tests to select a specific number of features.

* The example below uses the chi-squared (chi²) statistical test for non-negative features to select k (k=10) of the best features from the Mobile Price Range Prediction Dataset.

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

mobile_data = pd.read_csv("../input/mobile-price-classification/train.csv")

X = mobile_data.iloc[:,0:20]  #independent variables
y = mobile_data.iloc[:,-1]    #target variable i.e price range

In [None]:
mobile_data.head()

In [None]:
#apply SelectKBest class to extract top 10 best features

BestFeatures = SelectKBest(score_func=chi2, k=10)
fit = BestFeatures.fit(X,y)


In [None]:
df_scores = pd.DataFrame(fit.scores_)
df_columns = pd.DataFrame(X.columns)

In [None]:
#concatenating two dataframes for better visualization

f_Scores = pd.concat([df_columns,df_scores],axis=1)               # feature scores
f_Scores.columns = ['Specs','Score']  

In [None]:
f_Scores                # Score value is directly proportional to the feature importance

In [None]:
print(f_Scores.nlargest(10,'Score'))       # print 10 best features in descending order

### Note: Using Above score we can conclude that 'ram' is the most important feature among all the features which is also true in practical scenario.

# Feature Importance
* You can get the feature importance of each feature of your dataset by using the feature importance property of the model.

* Feature importance gives you a score for each feature of your data, the higher the score more important or relevant is the feature towards your output variable.

* Feature importance is an inbuilt class that comes with Tree Based Classifiers, but here in this example I will be using XGB Classifier for extracting the top 10 features for the dataset.

In [None]:
import xgboost
import matplotlib.pyplot as plt

model = xgboost.XGBClassifier()
model.fit(X,y)

In [None]:
print(model.feature_importances_) 

In [None]:
# plot the graph of feature importances for better visualization 

feat_imp = pd.Series(model.feature_importances_, index=X.columns)
feat_imp.nlargest(10).plot(kind='barh')

plt.figure(figsize=(8,6))
plt.show()

## Correlation Matrix

* Correlation states how the features are related to each other or the target variable.

* Correlation can be positive (increase in one value of feature increases the value of the target variable) or negative (increase in one value of feature decreases the value of the target variable)

* Heatmap makes it easy to identify which features are most related to the target variable, we will plot heatmap of correlated features using the seaborn library.

In [None]:
import seaborn as sns

#get correlations of each features in dataset
corrmat = mobile_data.corr()
top_corr_features = corrmat.index

plt.figure(figsize=(20,20))

#plot heat map
g=sns.heatmap(mobile_data[top_corr_features].corr(),annot=True,cmap="RdYlGn")

### Note: From above correlation plot we can conclude that the feature 'price_range' and 'ram' are highly correlated features which can also be related with the present situation. As ram of your mobile increases price also gets increases. 