# Feature Selection

This chapter serves as an introduction to feature selection methods.

## Preamble

In [None]:
import data_science_learning_paths
data_science_learning_paths.setup_plot_style()

In [None]:
import pandas
import seaborn
import matplotlib.pyplot as plt
import numpy

## Why Feature Selection?

There are a number of reasons for explicitly limiting the number of features used for a model:

- **generalization/predictive performance**: Avoid overfitting due to a large number of features.
- **interpretability**: A model that relies on few features and can be easier to explain and interpret.
- **saving resources**: Fewer input features can result in significantly shorter training times and less memory use - depending on the learning algorithm
- **avoiding the curse of dimensionality**: The [curse of dimensionality](https://en.m.wikipedia.org/wiki/Curse_of_dimensionality) refers to problems resulting from high dimensionality of the data. As the volume of the attribute space increases strongly with the number of dimensions/features, data quickly becomes sparse, making it difficult to detect structure or get statistically significant results.

In the process of feature selection, we can decide to remove features for two different reasons:
- **irrelevance**: There is no important association of that feature and the target
- **redundancy**: Another feature is present that contains the same information

**Note: Feature Selection vs. Dimensionality Reduction**

Feature selection should be distinguished from **dimensionality reduction** methods (like Principal Component Analysis). In both cases we reduce the number of attributes in the dataset, but 
- dimensionality reduction methods do so by creating new combinations of attributes
- feature selection methods include and exclude attributes present in the data 



## Example: House Price Data Set

In [None]:
data = data_science_learning_paths.datasets.read_house_prices()

In [None]:
target = "SalePrice"
features = data.columns.difference([target])

In [None]:
data.head()

## Exploring Features

### Pairwise Scatter Plot

In [None]:
selected_columns = [target, "1stFlrSF", "LotArea", "OverallQual", "BedroomAbvGr"]
seaborn.pairplot(
    data = data[selected_columns],
    plot_kws={"alpha": 0.1}
)

### Correlation Matrix

In [None]:
from yellowbrick.features import Rank2D

f, ax = plt.subplots(1, 1,figsize=(10, 10))

# Instantiate the visualizer with the Covariance ranking algorithm
visualizer = Rank2D(
    features=features,
    algorithm='pearson',
    ax=ax
)
visualizer.fit(data[features], data[target])                # Fit the data to the visualizer
visualizer.transform(data)             # Transform the data
visualizer.poof()                   # Draw/show/poof the data

## Feature Selection Algorithms

- **filter**: Filter feature selection methods apply a statistical measure to assign a scoring to each feature. The features are ranked by the score and either selected to be kept or removed from the dataset. The methods are often univariate and consider the feature independently, or with regard to the dependent variable. An example: Filtering features by correlation to the target variable.
- **search**: Here the selection is considered as a combinatorial search problem, where different combinations are prepared, evaluated and compared to other combinations. A predictive model us used to evaluate a combination of features and assign a score based on model performance. An example is the Recursive Feature Elimination algorithm. 


### Recursive Feature Elimination

[Recursive Feature Elimination via Cross-Validation (RFECV)](https://www.scikit-yb.org/en/latest/api/features/rfecv.html) is a feature set search algorithm. Its basic mechanism is that it 

> ... fits a model and removes the weakest feature (or features) until the specified number of features is reached. Features are ranked by the model’s `coef_` or `feature_importances_` attributes, and by recursively eliminating a small number of features per loop, RFE attempts to eliminate dependencies and collinearity that may exist in the model.

In [None]:
from data_science_learning_paths.mlp import root_mean_squared_error

In [None]:
import warnings

In [None]:
from sklearn.ensemble import RandomForestRegressor

In order to limit search time, we select a fraction of the available features as the search space:

In [None]:
sampled_features = pandas.Series(features).sample(n=30)

The [`yellowbrick`](https://www.scikit-yb.org/en/latest/) library provides a wrapper over the `sklearn` implementation that visualizes the feature selection process:

In [None]:
from sklearn.metrics import make_scorer, mean_squared_error
from yellowbrick.features import RFECV

with warnings.catch_warnings():
    # needed due to flood of deprecation warnings
    warnings.simplefilter("ignore")
    
    viz = RFECV(
        model=RandomForestRegressor(),
        scoring=make_scorer(root_mean_squared_error, greater_is_better=False),
        cv=3,
    )
    viz.fit(data[sampled_features], data[target])
    viz.poof()
    print("selected features:\n ", sampled_features[viz.support_])

## Feature Selection as Part of Model Selection

> A mistake would be to perform feature selection first to prepare your data, then perform model selection and training on the selected features.

-- [An Introduction to Feature Selection](https://machinelearningmastery.com/an-introduction-to-feature-selection/)

Feature selection should be treated as an integral part of model selection. One should be careful not to do feature selection on the same data that the model is tested on, since this may lead to overfitting and poor generalization. For example, this implies that when using cross-validation to select a model, feature selection should happen within the cross-validation loop.

## Exercise: Model Engineering with Feature Selection

Apply feature selection methods to engineer a better house price prediction model. Experiment with different feature selection methods. Use RMSE as the error function and properly evaluate model performance using cross-validation.

In [None]:
# Your code here...

## References/Further Reading

- [An Introduction to Feature Selection](https://machinelearningmastery.com/an-introduction-to-feature-selection/)
- [scikit-learn: Feature Selection](https://scikit-learn.org/stable/modules/feature_selection.html)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_