# FRAME Feature Selection: Demo Notebook

### This notebook demonstrates how to use the FRAMESelector library for feature selection using a regression task on a real-world dataset.

In [19]:
! pip install pandas numpy scikit-learn xgboost


Collecting xgboost
  Using cached xgboost-3.0.0-py3-none-win_amd64.whl.metadata (2.1 kB)
Using cached xgboost-3.0.0-py3-none-win_amd64.whl (150.0 MB)
Installing collected packages: xgboost
Successfully installed xgboost-3.0.0
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [14]:
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), '..')))

In [20]:
## 2. Imports
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from xgboost import XGBClassifier

In [21]:
# Import FRAMESelector
from frame.frame_selector import FRAMESelector

In [22]:
## 3. Load Example Dataset
# We'll use the California Housing dataset for this demo
data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target

print("Original shape of features:", X.shape)

Original shape of features: (20640, 8)


In [23]:
## 4. Preprocess Data
# For simplicity, we’ll just split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [26]:
## 5. Use FRAMESelector for Feature Selection
# Instantiate the FRAMESelector
selector = FRAMESelector(top_k=8, num_features=5)
X_selected = selector.fit_transform(X, y)

In [27]:
# Fit the selector
selector.fit(X_train, y_train)

In [29]:
# Transform training and test sets
X_train_selected = selector.transform(X_train)
X_test_selected = selector.transform(X_test)

print("Selected features:", selector.selected_features_)
print("Transformed shape:", X_train_selected.shape)

Selected features: ['MedInc', 'AveRooms', 'AveOccup', 'Latitude', 'Longitude']
Transformed shape: (16512, 5)


In [30]:
## 6. Train Model with Selected Features
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train_selected, y_train)

In [31]:
# Predict
y_pred = model.predict(X_test_selected)

In [32]:
# Evaluate
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Model performance with FRAME selected features:")
print("MSE:", mse)
print("R²:", r2)

Model performance with FRAME selected features:
MSE: 0.2974852695399802
R²: 0.7729828820775281


In [33]:
## 7. Compare with No Feature Selection
model_full = GradientBoostingRegressor(random_state=42)
model_full.fit(X_train, y_train)
y_pred_full = model_full.predict(X_test)

mse_full = mean_squared_error(y_test, y_pred_full)
r2_full = r2_score(y_test, y_pred_full)

print("Model performance with all features:")
print("MSE:", mse_full)
print("R²:", r2_full)

Model performance with all features:
MSE: 0.2939973248643864
R²: 0.7756446042829697


### FRAMESelector helps reduce the number of features while maintaining or improving performance.