# **Feature Transformation Recommender - Performance Evaluation on Multiple Datasets**

This notebook evaluates the performance of the Feature Transformation Recommender by applying it to four different datasets.

Clone the project repository:

In [1]:
!git clone https://github.com/ronigot/Flight-Price-Analysis.git

Cloning into 'Flight-Price-Analysis'...
remote: Enumerating objects: 112, done.[K
remote: Counting objects: 100% (112/112), done.[K
remote: Compressing objects: 100% (93/93), done.[K
remote: Total 112 (delta 38), reused 64 (delta 13), pack-reused 0 (from 0)[K
Receiving objects: 100% (112/112), 2.19 MiB | 9.17 MiB/s, done.
Resolving deltas: 100% (38/38), done.


Install the required dependencies:

In [2]:
!pip install -r /content/Flight-Price-Analysis/final_project/requirements.txt

Collecting autofeat (from -r /content/Flight-Price-Analysis/final_project/requirements.txt (line 2))
  Downloading autofeat-2.1.3-py3-none-any.whl.metadata (1.7 kB)
Collecting pint<1.0,>=0.17 (from autofeat->-r /content/Flight-Price-Analysis/final_project/requirements.txt (line 2))
  Downloading Pint-0.24.4-py3-none-any.whl.metadata (8.5 kB)
Collecting flexcache>=0.3 (from pint<1.0,>=0.17->autofeat->-r /content/Flight-Price-Analysis/final_project/requirements.txt (line 2))
  Downloading flexcache-0.3-py3-none-any.whl.metadata (7.0 kB)
Collecting flexparser>=0.4 (from pint<1.0,>=0.17->autofeat->-r /content/Flight-Price-Analysis/final_project/requirements.txt (line 2))
  Downloading flexparser-0.4-py3-none-any.whl.metadata (18 kB)
Downloading autofeat-2.1.3-py3-none-any.whl (23 kB)
Downloading Pint-0.24.4-py3-none-any.whl (302 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading flexcache-0.3-py3-non

Load the feature transformation recommender and evaluator modules:

In [3]:
import sys
import importlib.util

sys.path.append("/content/Flight-Price-Analysis/final_project/src")

spec1 = importlib.util.spec_from_file_location("feature_transformation_recommender", "/content/Flight-Price-Analysis/final_project/src/feature_transformation_recommender.py")
feature_transformation_recommender = importlib.util.module_from_spec(spec1)
spec1.loader.exec_module(feature_transformation_recommender)

spec2 = importlib.util.spec_from_file_location("basic_evaluator", "/content/Flight-Price-Analysis/final_project/src/basic_evaluator.py")
basic_evaluator = importlib.util.module_from_spec(spec2)
spec2.loader.exec_module(basic_evaluator)

Import required libraries:

In [4]:
from autofeat import AutoFeatRegressor
from autofeat import AutoFeatClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, f1_score
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from tabulate import tabulate
from scipy.stats import skew
import pandas as pd
import time
import warnings
warnings.filterwarnings('ignore')

## System Evaluation


In [5]:
class SystemEvaluator:
    """
    Compares model performance on raw data versus data transformed by the automated transformation recommender.
    Supports both regression and classification.
    """
    def __init__(self):
        self.results = {}

    def evaluate_regression(self, X, y, cv=3):
        # Split data into training and test sets
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        model_baseline = LinearRegression()
        model_transformed = LinearRegression()
        model_autofeat = LinearRegression()

        # ==== Baseline Evaluation ====
        baseline_cv_scores = cross_val_score(model_baseline, X_train, y_train, cv=cv, scoring='neg_mean_squared_error')

        start_time = time.perf_counter()
        model_baseline.fit(X_train, y_train)
        baseline_time_train = time.perf_counter() - start_time

        start_time = time.perf_counter()
        baseline_pred = model_baseline.predict(X_test)
        baseline_time_test = time.perf_counter() - start_time

        baseline_mse = mean_squared_error(y_test, baseline_pred)
        baseline_r2 = r2_score(y_test, baseline_pred)

        # ==== Feature Transformation Recommender ====
        recommender = feature_transformation_recommender.FeatureTransformationRecommender(model_type='linear', min_improvement=0.01, cv_folds=cv)

        start_time = time.perf_counter()
        X_train_transformed = recommender.fit_transform(X_train, y_train)
        transformation_time_train = time.perf_counter() - start_time

        start_time = time.perf_counter()
        X_test_transformed = recommender.transform(X_test)
        transformation_time_test = time.perf_counter() - start_time

        transformed_cv_scores = cross_val_score(model_transformed, X_train_transformed, y_train, cv=cv, scoring='neg_mean_squared_error')
        model_transformed.fit(X_train_transformed, y_train)
        transformed_pred = model_transformed.predict(X_test_transformed)
        transformed_mse = mean_squared_error(y_test, transformed_pred)
        transformed_r2 = r2_score(y_test, transformed_pred)

        # ==== AutoFeat Evaluation ====
        autofeat = AutoFeatRegressor(verbose=1)

        start_time = time.perf_counter()
        X_train_autofeat = autofeat.fit_transform(X_train, y_train)
        autofeat_time_train = time.perf_counter() - start_time

        start_time = time.perf_counter()
        X_test_autofeat = autofeat.transform(X_test)
        autofeat_time_test = time.perf_counter() - start_time

        autofeat_cv_scores = cross_val_score(model_autofeat, X_train_autofeat, y_train, cv=cv, scoring='neg_mean_squared_error')
        model_autofeat.fit(X_train_autofeat, y_train)
        autofeat_pred = model_autofeat.predict(X_test_autofeat)
        autofeat_mse = mean_squared_error(y_test, autofeat_pred)
        autofeat_r2 = r2_score(y_test, autofeat_pred)

        # ==== Statistical Distribution Improvements ====
        stat_improvements_transformed = {}
        for col in X_train.columns:
            raw_skew = skew(X_train[col])
            transformed_skew = skew(X_train_transformed[col])
            improvement = abs(raw_skew) - abs(transformed_skew)
            stat_improvements_transformed[col] = {
                'raw_skew': raw_skew,
                'transformed_skew': transformed_skew,
                'skew_improvement': improvement
            }

        # ==== Feature Interpretability Preservation ====
        interpretable_methods = ['none', 'log', 'sqrt', 'standardization', 'minmax']
        num_total = len(recommender.transformations)
        num_interpretable = sum(
            1 for details in recommender.transformations.values()
            if details.get('method', '').lower() in interpretable_methods
        )
        interpretability_score = (num_interpretable / num_total * 100) if num_total > 0 else 100

        results = {
            'baseline_cv_mse': -baseline_cv_scores.mean(),
            'transformed_cv_mse': -transformed_cv_scores.mean(),
            'autofeat_cv_mse': -autofeat_cv_scores.mean(),

            'baseline_test_mse': baseline_mse,
            'transformed_test_mse': transformed_mse,
            'autofeat_test_mse': autofeat_mse,

            'baseline_r2': baseline_r2,
            'transformed_r2': transformed_r2,
            'autofeat_r2': autofeat_r2,

            'baseline_time_train': baseline_time_train,
            'baseline_time_test': baseline_time_test,

            'transformation_time_train': transformation_time_train,
            'transformation_time_test': transformation_time_test,

            'autofeat_time_train': autofeat_time_train,
            'autofeat_time_test': autofeat_time_test,

            'transformed_statistical_distribution_improvements': stat_improvements_transformed,

            'feature_interpretability_score': interpretability_score,

            'transformations': recommender.transformations,
            'autofeat_features': X_train_autofeat.columns.tolist()
        }

        self.results['regression'] = results
        return results

    def evaluate_classification(self, X, y, cv=3):
        # Split data with stratification for classification
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
        model_baseline = LogisticRegression(max_iter=1000)
        model_transformed = LogisticRegression(max_iter=1000)
        model_autofeat = LogisticRegression(max_iter=1000)

        # ==== Baseline Evaluation ====
        baseline_cv_scores = cross_val_score(model_baseline, X_train, y_train, cv=cv, scoring='accuracy')

        start_time = time.perf_counter()
        model_baseline.fit(X_train, y_train)
        baseline_time_train = time.perf_counter() - start_time

        start_time = time.perf_counter()
        baseline_pred = model_baseline.predict(X_test)
        baseline_time_test = time.perf_counter() - start_time

        baseline_acc = accuracy_score(y_test, baseline_pred)
        baseline_f1 = f1_score(y_test, baseline_pred, average='weighted')

        # ==== Feature Transformation Recommender ====
        recommender = feature_transformation_recommender.FeatureTransformationRecommender(model_type='logistic', min_improvement=0.01, cv_folds=cv)

        start_time = time.perf_counter()
        X_train_transformed = recommender.fit_transform(X_train, y_train)
        transformation_time_train = time.perf_counter() - start_time

        start_time = time.perf_counter()
        X_test_transformed = recommender.transform(X_test)
        transformation_time_test = time.perf_counter() - start_time

        transformed_cv_scores = cross_val_score(model_transformed, X_train_transformed, y_train, cv=cv, scoring='accuracy')
        model_transformed.fit(X_train_transformed, y_train)
        transformed_pred = model_transformed.predict(X_test_transformed)
        transformed_acc = accuracy_score(y_test, transformed_pred)
        transformed_f1 = f1_score(y_test, transformed_pred, average='weighted')

        # ==== AutoFeat Evaluation ====
        autofeat = AutoFeatClassifier(verbose=1)

        start_time = time.perf_counter()
        X_train_autofeat = autofeat.fit_transform(X_train, y_train)
        autofeat_time_train = time.perf_counter() - start_time

        start_time = time.perf_counter()
        X_test_autofeat = autofeat.transform(X_test)
        autofeat_time_test = time.perf_counter() - start_time

        autofeat_cv_scores = cross_val_score(model_autofeat, X_train_autofeat, y_train, cv=cv, scoring='accuracy')
        model_autofeat.fit(X_train_autofeat, y_train)
        autofeat_pred = model_autofeat.predict(X_test_autofeat)
        autofeat_acc = accuracy_score(y_test, autofeat_pred)
        autofeat_f1 = f1_score(y_test, autofeat_pred, average='weighted')

        # ==== Statistical Distribution Improvements ====
        stat_improvements_transformed = {}
        for col in X_train.columns:
            raw_skew = skew(X_train[col])
            transformed_skew = skew(X_train_transformed[col])
            improvement = abs(raw_skew) - abs(transformed_skew)
            stat_improvements_transformed[col] = {
                'raw_skew': raw_skew,
                'transformed_skew': transformed_skew,
                'skew_improvement': improvement
            }

        # ==== Feature Interpretability Preservation ====
        interpretable_methods = ['none', 'log', 'sqrt', 'standardization', 'minmax']
        num_total = len(recommender.transformations)
        num_interpretable = sum(
            1 for details in recommender.transformations.values()
            if details.get('method', '').lower() in interpretable_methods
        )
        interpretability_score = (num_interpretable / num_total * 100) if num_total > 0 else 100


        results = {
            'baseline_cv_accuracy': baseline_cv_scores.mean(),
            'transformed_cv_accuracy': transformed_cv_scores.mean(),
            'autofeat_cv_accuracy': autofeat_cv_scores.mean(),

            'baseline_test_accuracy': baseline_acc,
            'transformed_test_accuracy': transformed_acc,
            'autofeat_test_accuracy': autofeat_acc,

            'baseline_test_f1': baseline_f1,
            'transformed_test_f1': transformed_f1,
            'autofeat_test_f1': autofeat_f1,

            'baseline_time_train': baseline_time_train,
            'baseline_time_test': baseline_time_test,

            'transformation_time_train': transformation_time_train,
            'transformation_time_test': transformation_time_test,

            'autofeat_time_train': autofeat_time_train,
            'autofeat_time_test': autofeat_time_test,

            'transformed_statistical_distribution_improvements': stat_improvements_transformed,

            'feature_interpretability_score': interpretability_score,

            'transformations': recommender.transformations,
            'autofeat_features': X_train_autofeat.columns.tolist()
        }
        self.results['classification'] = results
        return results

## Helper Functions for Evaluation Results

In [6]:
def print_regression_results_as_table(results, dataset_name):
    # Compute improvement percentages:
    cv_mse_improve_transformed = ((results['baseline_cv_mse'] - results['transformed_cv_mse']) / results['baseline_cv_mse'] * 100
                                  if results['baseline_cv_mse'] != 0 else 0)
    cv_mse_improve_autofeat = ((results['baseline_cv_mse'] - results['autofeat_cv_mse']) / results['baseline_cv_mse'] * 100
                               if results['baseline_cv_mse'] != 0 else 0)

    test_mse_improve_transformed = ((results['baseline_test_mse'] - results['transformed_test_mse']) / results['baseline_test_mse'] * 100
                                     if results['baseline_test_mse'] != 0 else 0)
    test_mse_improve_autofeat = ((results['baseline_test_mse'] - results['autofeat_test_mse']) / results['baseline_test_mse'] * 100
                                  if results['baseline_test_mse'] != 0 else 0)

    r2_improve_transformed = ((results['transformed_r2'] - results['baseline_r2']) / abs(results['baseline_r2']) * 100
                               if results['baseline_r2'] != 0 else 0)
    r2_improve_autofeat = ((results['autofeat_r2'] - results['baseline_r2']) / abs(results['baseline_r2']) * 100
                            if results['baseline_r2'] != 0 else 0)

    # Create a table for main regression metrics.
    data = [
        ["CV MSE", f"{results['baseline_cv_mse']:.2f}", f"{results['transformed_cv_mse']:.2f}", f"{results['autofeat_cv_mse']:.2f}",
         f"{cv_mse_improve_transformed:.2f}%", f"{cv_mse_improve_autofeat:.2f}%"],
        ["Test MSE", f"{results['baseline_test_mse']:.2f}", f"{results['transformed_test_mse']:.2f}", f"{results['autofeat_test_mse']:.2f}",
         f"{test_mse_improve_transformed:.2f}%", f"{test_mse_improve_autofeat:.2f}%"],
        ["R²", f"{results['baseline_r2']:.3f}", f"{results['transformed_r2']:.3f}", f"{results['autofeat_r2']:.3f}",
         f"{r2_improve_transformed:.2f}%", f"{r2_improve_autofeat:.2f}%"]
    ]
    df_metrics = pd.DataFrame(data, columns=["Metric", "Baseline", "Our Approach", "AutoFeat", "Our Approach Improvement", "AutoFeat Improvement"])


    print(f"\n=== Regression Evaluation Results for {dataset_name} ===")
    print(tabulate(df_metrics, headers="keys", tablefmt="psql", showindex=False))

    # Time comparison
    time_data = [
        ["Train Time (Baseline)", f"{results['baseline_time_train']:.4f} sec", "-"],
        ["Train Time (Our Approach)", f"{results['transformation_time_train']:.4f} sec",
         f"{((results['transformation_time_train'] - results['baseline_time_train']) / results['baseline_time_train'] * 100):.2f}% increase"],
        ["Train Time (AutoFeat)", f"{results['autofeat_time_train']:.4f} sec",
         f"{((results['autofeat_time_train'] - results['baseline_time_train']) / results['baseline_time_train'] * 100):.2f}% increase"],
        ["Test Time (Baseline)", f"{results['baseline_time_test']:.4f} sec", "-"],
        ["Test Time (Our Approach)", f"{results['transformation_time_test']:.4f} sec",
         f"{((results['transformation_time_test'] - results['baseline_time_test']) / results['baseline_time_test'] * 100):.2f}% increase"],
        ["Test Time (AutoFeat)", f"{results['autofeat_time_test']:.4f} sec",
         f"{((results['autofeat_time_test'] - results['baseline_time_test']) / results['baseline_time_test'] * 100):.2f}% increase"],
    ]

    print("\n--- Processing Time Overhead ---")
    print(tabulate(time_data, headers=["Metric", "Time", "Overhead"], tablefmt="grid"))


    print("\n--- Feature Interpretability Preservation ---")
    print(f"Interpretability Score: {results['feature_interpretability_score']:.2f}%")

    print("\n--- Statistical Distribution Improvements (Skewness) ---")
    trans_rows = []
    for feature, stats in results['transformed_statistical_distribution_improvements'].items():
        trans_rows.append([feature, f"{stats['raw_skew']:.2f}", f"{stats['transformed_skew']:.2f}", f"{stats['skew_improvement']:.2f}"])
    df_stat = pd.DataFrame(trans_rows, columns=["Feature", "Raw Skew", "Transformed Skew", "Skew Improvement"])
    print(tabulate(df_stat, headers="keys", tablefmt="psql", showindex=False))

    # Create a table for transformation details.
    trans_rows = []
    for feature, details in results['transformations'].items():
        transformer_type = type(details['transformer']).__name__ if details['transformer'] is not None else ""
        trans_rows.append([feature, details['method'], transformer_type])
    df_trans = pd.DataFrame(trans_rows, columns=["Feature", "Transformation", "Transformer Type"])
    print("\n--- Transformation Details ---")
    print(tabulate(df_trans, headers="keys", tablefmt="psql", showindex=False))
    print("\n" + "=" * 50 + "\n")

def print_classification_results_as_table(results, dataset_name):
    # Compute improvement percentages:
    cv_acc_improve_transformed = ((results['transformed_cv_accuracy'] - results['baseline_cv_accuracy']) / results['baseline_cv_accuracy'] * 100
                                  if results['baseline_cv_accuracy'] != 0 else 0)
    cv_acc_improve_autofeat = ((results['autofeat_cv_accuracy'] - results['baseline_cv_accuracy']) / results['baseline_cv_accuracy'] * 100
                               if results['baseline_cv_accuracy'] != 0 else 0)

    test_acc_improve_transformed = ((results['transformed_test_accuracy'] - results['baseline_test_accuracy']) / results['baseline_test_accuracy'] * 100
                                    if results['baseline_test_accuracy'] != 0 else 0)
    test_acc_improve_autofeat = ((results['autofeat_test_accuracy'] - results['baseline_test_accuracy']) / results['baseline_test_accuracy'] * 100
                                    if results['baseline_test_accuracy'] != 0 else 0)

    test_f1_improve_transformed = ((results['transformed_test_f1'] - results['baseline_test_f1']) / results['baseline_test_f1'] * 100
                                   if results['baseline_test_f1'] != 0 else 0)
    test_f1_improve_autofeat = ((results['autofeat_test_f1'] - results['baseline_test_f1']) / results['baseline_test_f1'] * 100
                                if results['baseline_test_f1'] != 0 else 0)

    # Create a table for main classification metrics.
    data = [
        ["CV Accuracy", f"{results['baseline_cv_accuracy']:.3f}", f"{results['transformed_cv_accuracy']:.3f}",
         f"{results['autofeat_cv_accuracy']:.3f}", f"{cv_acc_improve_transformed:.2f}%", f"{cv_acc_improve_autofeat:.2f}%"],

        ["Test Accuracy", f"{results['baseline_test_accuracy']:.3f}", f"{results['transformed_test_accuracy']:.3f}",
         f"{results['autofeat_test_accuracy']:.3f}", f"{test_acc_improve_transformed:.2f}%", f"{test_acc_improve_autofeat:.2f}%"],

        ["Test F1", f"{results['baseline_test_f1']:.3f}", f"{results['transformed_test_f1']:.3f}",
         f"{results['autofeat_test_f1']:.3f}", f"{test_f1_improve_transformed:.2f}%", f"{test_f1_improve_autofeat:.2f}%"]
    ]
    df_metrics = pd.DataFrame(data, columns=["Metric", "Baseline", "Our Approach", "AutoFeat", "Our Approach Improvement", "AutoFeat Improvement"])

    print(f"\n=== Classification Evaluation Results for {dataset_name} ===")
    print(tabulate(df_metrics, headers="keys", tablefmt="psql", showindex=False))

    # Time comparison
    time_data = [
        ["Train Time (Baseline)", f"{results['baseline_time_train']:.4f} sec", "-"],
        ["Train Time (Our Approach)", f"{results['transformation_time_train']:.4f} sec",
         f"{((results['transformation_time_train'] - results['baseline_time_train']) / results['baseline_time_train'] * 100):.2f}% increase"],
        ["Train Time (AutoFeat)", f"{results['autofeat_time_train']:.4f} sec",
         f"{((results['autofeat_time_train'] - results['baseline_time_train']) / results['baseline_time_train'] * 100):.2f}% increase"],
        ["Test Time (Baseline)", f"{results['baseline_time_test']:.4f} sec", "-"],
        ["Test Time (Our Approach)", f"{results['transformation_time_test']:.4f} sec",
         f"{((results['transformation_time_test'] - results['baseline_time_test']) / results['baseline_time_test'] * 100):.2f}% increase"],
        ["Test Time (AutoFeat)", f"{results['autofeat_time_test']:.4f} sec",
         f"{((results['autofeat_time_test'] - results['baseline_time_test']) / results['baseline_time_test'] * 100):.2f}% increase"],
    ]

    print("\n--- Processing Time Overhead ---")
    print(tabulate(time_data, headers=["Metric", "Time", "Overhead"], tablefmt="grid"))

    print("\n--- Feature Interpretability Preservation ---")
    print(f"Interpretability Score: {results['feature_interpretability_score']:.2f}%")

    print("\n--- Statistical Distribution Improvements (Skewness) ---")
    trans_rows = []
    for feature, stats in results['transformed_statistical_distribution_improvements'].items():
        trans_rows.append([feature, f"{stats['raw_skew']:.2f}", f"{stats['transformed_skew']:.2f}", f"{stats['skew_improvement']:.2f}"])
    df_stat = pd.DataFrame(trans_rows, columns=["Feature", "Raw Skew", "Transformed Skew", "Skew Improvement"])
    print(tabulate(df_stat, headers="keys", tablefmt="psql", showindex=False))

    # Create a table for transformation details.
    trans_rows = []
    for feature, details in results['transformations'].items():
        transformer_type = type(details['transformer']).__name__ if details['transformer'] is not None else ""
        trans_rows.append([feature, details['method'], transformer_type])
    df_trans = pd.DataFrame(trans_rows, columns=["Feature", "Transformation", "Transformer Type"])
    print("\n--- Transformation Details ---")
    print(tabulate(df_trans, headers="keys", tablefmt="psql", showindex=False))
    print("\n" + "="*50 + "\n")

## Dataset 1: Diabetes (regression)

In [7]:
from sklearn.datasets import load_diabetes

In [8]:
# Evaluate on Diabetes (regression)
diabetes = load_diabetes()
X_diabetes = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y_diabetes = pd.Series(diabetes.target)
eval_reg_diabetes = SystemEvaluator().evaluate_regression(X_diabetes, y_diabetes)
print_regression_results_as_table(eval_reg_diabetes, "Diabetes")

[featsel] Scaling data...done.
[AutoFeat]     6/    7 new features
=== Regression Evaluation Results for Diabetes ===
+----------+------------+----------------+------------+----------------------------+------------------------+
| Metric   |   Baseline |   Our Approach |   AutoFeat | Our Approach Improvement   | AutoFeat Improvement   |
|----------+------------+----------------+------------+----------------------------+------------------------|
| CV MSE   |   3081.43  |       3006.35  |    2904.76 | 2.44%                      | 5.73%                  |
| Test MSE |   2900.19  |       2705.63  |    2545.37 | 6.71%                      | 12.23%                 |
| R²       |      0.453 |          0.489 |       0.52 | 8.11%                      | 14.80%                 |
+----------+------------+----------------+------------+----------------------------+------------------------+

--- Processing Time Overhead ---
+---------------------------+-------------+----------------------+
| Metric   

## Dataset 2: California Housing (regression)

In [9]:
from sklearn.datasets import fetch_california_housing

In [10]:
# Evaluate on California Housing (regression)
cal_housing = fetch_california_housing()
X_cal = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)
y_cal = pd.Series(cal_housing.target)
eval_reg_cal = SystemEvaluator().evaluate_regression(X_cal, y_cal)
print_regression_results_as_table(eval_reg_cal, "California Housing")

[featsel] Scaling data...done.

=== Regression Evaluation Results for California Housing ===
+----------+------------+----------------+------------+----------------------------+------------------------+
| Metric   |   Baseline |   Our Approach |   AutoFeat | Our Approach Improvement   | AutoFeat Improvement   |
|----------+------------+----------------+------------+----------------------------+------------------------|
| CV MSE   |      0.52  |          0.44  |      0.42  | 15.37%                     | 18.65%                 |
| Test MSE |      0.56  |          0.46  |      0.43  | 17.10%                     | 22.04%                 |
| R²       |      0.576 |          0.648 |      0.669 | 12.60%                     | 16.24%                 |
+----------+------------+----------------+------------+----------------------------+------------------------+

--- Processing Time Overhead ---
+---------------------------+-------------+---------------------+
| Metric                    | Time   

## Dataset 3: Breast Cancer (classification:)

In [11]:
from sklearn.datasets import load_breast_cancer

In [12]:
# Evaluate on Breast Cancer (classification)
cancer = load_breast_cancer()
X_cancer = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y_cancer = pd.Series(cancer.target)
eval_clf_cancer = SystemEvaluator().evaluate_classification(X_cancer, y_cancer)
print_classification_results_as_table(eval_clf_cancer, "Breast Cancer")

[featsel] Scaling data...done.

=== Classification Evaluation Results for Breast Cancer ===
+---------------+------------+----------------+------------+----------------------------+------------------------+
| Metric        |   Baseline |   Our Approach |   AutoFeat | Our Approach Improvement   | AutoFeat Improvement   |
|---------------+------------+----------------+------------+----------------------------+------------------------|
| CV Accuracy   |      0.945 |          0.976 |      0.969 | 3.26%                      | 2.56%                  |
| Test Accuracy |      0.956 |          0.982 |      0.991 | 2.75%                      | 3.67%                  |
| Test F1       |      0.956 |          0.982 |      0.991 | 2.76%                      | 3.68%                  |
+---------------+------------+----------------+------------+----------------------------+------------------------+

--- Processing Time Overhead ---
+---------------------------+---------------+----------------------+


## Dataset 4: Iris (classification:)

In [13]:
from sklearn.datasets import load_iris

In [14]:
# Evaluate on Iris (classification)
iris = load_iris()
X_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
y_iris = pd.Series(iris.target)
eval_clf_iris = SystemEvaluator().evaluate_classification(X_iris, y_iris)
print_classification_results_as_table(eval_clf_iris, "Iris")

[featsel] Scaling data...done.

=== Classification Evaluation Results for Iris ===
+---------------+------------+----------------+------------+----------------------------+------------------------+
| Metric        |   Baseline |   Our Approach |   AutoFeat | Our Approach Improvement   | AutoFeat Improvement   |
|---------------+------------+----------------+------------+----------------------------+------------------------|
| CV Accuracy   |      0.958 |          0.975 |      0.967 | 1.74%                      | 0.87%                  |
| Test Accuracy |      0.967 |          0.967 |      0.967 | 0.00%                      | 0.00%                  |
| Test F1       |      0.967 |          0.967 |      0.967 | 0.00%                      | 0.00%                  |
+---------------+------------+----------------+------------+----------------------------+------------------------+

--- Processing Time Overhead ---
+---------------------------+-------------+---------------------+
| Metric    