Stepwise Feature Selection with mlxtend


    * Forward Selection: Starts with no features and adds them one by one based on their performance.
    * Backward Elimination: Starts with all features and removes the least significant ones.

Detailed Steps

    1. Prepare the Data: Load or generate your dataset.
    2. Choose a Model: Select a regression or classification model.
    3. Initialize Forward and Backward Selection: Use SequentialFeatureSelector.
    4. Fit the Models: Fit the models to the data.
    5. Evaluate the Results: Inspect selected features and model performance.

In [1]:
# Step 1: Prepare the Data
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)
feature_names = [f'feature_{i}' for i in range(X.shape[1])]
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2, random_state=42)


In [2]:
# Step 2: Choose a Model
from sklearn.linear_model import LinearRegression

# Initialize the linear regression model
model = LinearRegression()


In [3]:
# Step 3: Initialize Forward Selection
from mlxtend.feature_selection import SequentialFeatureSelector

# Initialize Forward Selection
sfs_forward = SequentialFeatureSelector(model,
                                        k_features='best',
                                        forward=True,
                                        floating=False,
                                        scoring='r2',
                                        cv=0)  # No cross-validation for simplicity


In [4]:
# Step 4: Fit the Model
# Fit Forward Selection
sfs_forward.fit(X_train.values, y_train)

# Get selected features
selected_features_forward = list(sfs_forward.k_feature_names_)
print("Selected features after forward selection (mlxtend):", selected_features_forward)



Selected features after forward selection (mlxtend): ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


In [5]:
# Step 5: Initialize and Fit Backward Elimination
# Initialize Backward Elimination
sfs_backward = SequentialFeatureSelector(model,
                                         k_features='best',
                                         forward=False,
                                         floating=False,
                                         scoring='r2',
                                         cv=0)  # No cross-validation for simplicity

# Fit Backward Elimination
sfs_backward.fit(X_train.values, y_train)

# Get selected features
selected_features_backward = list(sfs_backward.k_feature_names_)
print("Selected features after backward elimination (mlxtend):", selected_features_backward)


Selected features after backward elimination (mlxtend): ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


In [13]:
# Step 6: Evaluate the Model
from sklearn.metrics import mean_squared_error

# Predict with selected features from forward selection
X_train_selected = X_train.iloc[:, [i for i in sfs_forward.k_feature_idx_]] # Flatten the list if nested
X_test_selected = X_test.iloc[:, [i for i in sfs_forward.k_feature_idx_]]



# Fit the model with selected features
model.fit(X_train_selected, y_train)
y_pred_forward = model.predict(X_test_selected)

# Calculate MSE for forward selection
mse_forward = mean_squared_error(y_test, y_pred_forward)
print("Mean Squared Error with forward-selected features:", mse_forward)

# Predict with selected features from backward elimination
X_train_selected_backward = X_train.iloc[:, [i for i in sfs_backward.k_feature_idx_]] # Use iloc for integer-based indexing
X_test_selected_backward = X_test.iloc[:, [i for i in sfs_backward.k_feature_idx_]]

# Fit the model with selected features
model.fit(X_train_selected_backward, y_train)
y_pred_backward = model.predict(X_test_selected_backward)

# Calculate MSE for backward elimination
mse_backward = mean_squared_error(y_test, y_pred_backward)
print("Mean Squared Error with backward-selected features:", mse_backward)


Mean Squared Error with forward-selected features: 0.010265673458298246
Mean Squared Error with backward-selected features: 0.010265673458298246
