# 4.4 Feature Selection

This notebook reproduces the example from Section 4.4 of the paper 'Linux Kernel Configurations at Scale: A Dataset for Performance and Evolution Analysis' (EASE 2025). It uses LASSO regression to reduce options for predicting binary size in TuxKConfig version 5.8 (OpenML ID: 46744).

## Steps:
1. **Load Dataset**: Fetch version 5.8 from OpenML.
2. **Apply LASSO**: Fit the model to select features.
3. **Report Reduction**: Show the number of selected features and a sample.



In [None]:
import openml
from sklearn.linear_model import Lasso
import pandas as pd

# Step 1: Load TuxKConfig v5.8
dataset = openml.datasets.get_dataset(46744)
X, y = dataset.get_data(target='Binary_Size')

# Step 2: Apply LASSO
model = Lasso(alpha=0.01, random_state=42)
model.fit(X, y)
selected_features = X.columns[model.coef_ != 0]

# Step 3: Report results
print(f'Reduced from {X.shape[1]} to {len(selected_features)} features')
print(f'Sample selected: {selected_features[:5].tolist()}')