**Feature Selection:** is one of the core concept in machine learning which highly impact on the performance of model. Feature selection is a process when we automatically select the most important features which contributes the most in predicting the output.

Having irrelavent features in your data can decrease the accuracy of many models, especially linear algorithms like linear regression or logistic regression

**There are 3 Benifits of using feature selection before modelling**
- Reduce Overfitting
- Improves accuracy
- Reduce Training Time

let's get started with Feature selection methods. we will be looking at atlest 5 best feature selection methods which are mostly used.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
#import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#load the data
data = pd.read_csv('/kaggle/input/mobile-price-classification/train.csv')
df = data.copy()  #create a copy
df.shape

In [None]:
df.head()

* In this particular problem we have to determine the price_range of mobile phones using various features. Now by using various feature selection methods we will be selecting the top 10 features.

In [None]:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

## 1) Univariate Selection
Statistical tests can be used to select those features that have the strongest relationship with the output variable.

The scikit-learn library provides the SelectKBest class that can be used with a suite of different statistical tests to select a specific number of features.

The example below uses the chi-squared (chiÂ²) statistical test for non-negative features to select 10 of the best features from the Mobile Price Range Prediction Dataset

In [None]:
x = data.iloc[:, :-1]
y = data.iloc[:, -1]

In [None]:
#apply selectKBest to select top 10 features
best_features = SelectKBest(score_func=chi2, k=10)
fit = best_features.fit(x, y)

dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(x.columns)

#concat 2 dataFrame for better visualization
feature_score = pd.concat([dfcolumns, dfscores], axis=1)
feature_score.columns = ['Features', 'Score']
print(feature_score.nlargest(10, 'Score'))

* Many different statistical test can be used with this selection method. For example the ANOVA F-value method is appropriate for numerical inputs and categorical data. This can be used via the f_classif() function.

## 2) Feature Importance
we can get the importance of each feature by using the feature importance property. The technique gives us a score for each feature in a data, The higher the score is, more relevant feature is in predicting the output.

Bagged decision trees like Random Forest and Extra Trees can be used to estimate the importance of features.

we will be using Extra Tree Classifier for extracting the top 10 features for the dataset

In [None]:
from sklearn.ensemble import ExtraTreesClassifier

model = ExtraTreesClassifier()
model.fit(x, y)

In [None]:
feat_importance = pd.Series(model.feature_importances_, index=x.columns)
feat_importance.nlargest(10).plot(kind='barh')
plt.show()

## 3) Correlation Matrix using HeatMap

Correlation stats that how features are related to each-other or the target variable. Correlation can be positive(strongly positive relationship) as well as negative(strongly negative relationship)

heatmap makes it easy to identify which features are highly corelated to each other.

In [None]:
corr_mat = df.corr()
corr_mat_features = corr_mat.index

In [None]:
plt.figure(figsize=(20,14))
sns.heatmap(df[corr_mat_features].corr(), annot=True, cmap='RdYlGn')
plt.show()

### How to remove the corelated features?
we can remove the highly corelated features using the threshold.   

In [None]:
threshold = 0.8

# find and remove correlated features
def correlation(dataset, threshold):
    col_corr = set()             # Set of all the names of correlated columns
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if abs(corr_matrix.iloc[i, j]) > threshold: # we are interested in absolute coeff value
                colname = corr_matrix.columns[i]  # getting the name of column
                col_corr.add(colname)
    return col_corr

In [None]:
correlation(df.iloc[:, :-1], threshold)

## 4) Information Gain
we can select the most important features using the impormation gain of each feature.

In [None]:
from sklearn.feature_selection import mutual_info_classif

In [None]:
mutual_info = mutual_info_classif(x,y)

In [None]:
mutual_data = pd.Series(mutual_info,index=x.columns)
mutual_data.sort_values(ascending=False)

## 5) Recursive Feature Elimination
The Recursive Feature Elimination (or RFE) works by recursively removing attributes and building a model on those attributes that remain.

It uses the model accuracy to identify which attributes (and combination of attributes) contribute the most to predicting the target attribute.

The example below uses RFE with the logistic regression algorithm to select the top 3 features. The choice of algorithm does not matter too much as long as it is skillful and consistent.

In [None]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
import warnings
warnings.filterwarnings("ignore")

In [None]:
model = LogisticRegression(solver='lbfgs')
rfe = RFE(model, 10)
fit = rfe.fit(x,y)

print("Num Features: %d", fit.n_features_)
print("selected features: %s", fit.support_)
print("Feature Ranking: %s" % fit.ranking_)

## 6) Principal Component Analysis
Principal Component Analysis (or PCA) uses linear algebra to transform the dataset into a compressed form.

Generally this is called a data reduction technique. A property of PCA is that you can choose the number of dimensions or principal component in the transformed result.

In [None]:
from sklearn.decomposition import PCA
pca = PCA(n_components = 10)
pca_fit = pca.fit(x)

In [None]:
# summarize components
print("Explained Variance: %s" % pca_fit.explained_variance_ratio_)
print(pca_fit.components_)