# Feature Selection Techniques
The notebook covers a few feature selection techniques and when to use which one. This could be useful for figuring out what techniques are worth using in your own projects. Feature selection techniques covered here are
1. Filter based methods
2. Wrapper based methods
3. Embedded methods
4. Feature importance
5. Recursive Feature Elimination
6. Principal Component Analysis

Please also refer to https://scikit-learn.org/1.5/modules/feature_selection.html for more built-in feature selection techniques in scikit-learn.

The first and foremost step in any machine learning project is to understand the data. This includes understanding the features and their importance. Feature selection is the process of selecting a subset of relevant features to train the models. Three main reasons for Feature selection techniques are:
1. Simplification of models to make them explainable
2. Shorter training times
3. To avoid the curse of dimensionality due to overfitting possibilities
4. Enhance higher learning accuracy

## Filter based methods

In [1]:
# missing value ratio
import pandas as pd
def missing_value_ratio(df:pd.DataFrame, threshold:float)->pd.DataFrame:
    """
    This function will return the columns which have missing value ratio greater than the threshold
    """
    missing = df.isnull().sum()
    missing_ratio = (missing / df.shape[0]) * 100
    columns_to_drop = missing_ratio[missing_ratio > threshold].index
    df = df.drop(columns=columns_to_drop)
    return df