# Feature Selection
---
Feature selection is the process of selecting the features that hold the most predictive power to the target. By removing unnecessary features, we reduce model complexity and minimize the computational resources required for training and inference.  Feature selection is a crucial step that can have a great impact on model efficiency in production settings.

The methods of feature selection we will perform are:
* Filter Methods
    * Correlation
    * Univariate Feature Selection
* Wrapper Methods
    * Forward Selection
    * Backward Selection
    * Recursive Feature Elimination
* Embedded Methods
    * Feature Importance (Tree-based)
    * L1 Regularization

In this notebook, we will demonstrate the feature selection methods above on the [Census Income](https://archive.ics.uci.edu/ml/datasets/Census-Income+%28KDD%29) dataset from the UCI repository. The dataset contains both numerical and categorical features with the goal to predict whether a person's salary is greater than or equal to $50k.

# Imports

In [None]:
# for data processing and manipulation
import pandas as pd
import numpy as np

# scikit-learn modules for feature selection and model evaluation
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE, SelectKBest, SelectFromModel, SequentialFeatureSelector, chi2, f_classif
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, f1_score
from sklearn.feature_selection import SelectFromModel
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# libraries for visualization
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt