# **ALL TOOLS ON DECK**

---
AutoGluon is a cutting-edge tool for automating machine learning (AutoML) processes on tabular datasets. The objective is to investigate the performance of several AutoML tools, including Auto-sklearn, FLAML,TPOT and MLJAR in comparison to AutoGluon's predictive capabilities.

While AutoGluon typically excels in longer training durations, our focus lies in examining its performance within shorter training times, specifically less than a minute. We aim to contrast this with an AutoGluon predictor trained for an equivalent total duration.

# Overview of AutoML Tools
## 1. AutoGluon
Description: AutoGluon is an open-source AutoML framework from Amazon that focuses on ease of use and high performance. It automates machine learning workflows, including feature engineering, model selection, and hyperparameter tuning.
### Strengths:
*   Versatility: Supports multiple data types and tasks (e.g., regression, classification).
*   Ensemble Learning: Automatically builds ensembles of models.
*   Efficiency: Optimized for performance with multi-threading and GPU support.

## 2. MLJAR
Description: MLJAR is a Python library that automates the machine learning pipeline with a focus on simplicity and interpretability. It also supports multiple types of data and tasks.
### Strengths:
*   Easy-to-Use Interface: Simplified API for quick model training.
*   Ensemble Learning: Combines multiple models to improve performance.
*   Feature Importance: Provides insights into feature importance.

## 3. TPOT
Description: TPOT (Tree-based Pipeline Optimization Tool) is an AutoML tool that uses genetic algorithms to optimize machine learning pipelines. It's part of the scikit-learn ecosystem.
### Strengths:


*   Pipeline Optimization: Automatically designs and optimizes machine learning pipelines.
*   Genetic Algorithms: Uses evolutionary algorithms to find the best models.
* Customization: Allows for detailed control over the optimization process.

Sure! Here is the information for Auto-sklearn and FLAML in the same format:

## 4. Auto-sklearn
Description: Auto-sklearn is an open-source AutoML tool built on top of the scikit-learn library. It uses Bayesian optimization to automate the process of model selection and hyperparameter tuning.
### Strengths:
*   Bayesian Optimization: Efficiently searches the hyperparameter space.
*   Meta-Learning: Leverages prior knowledge to warm-start the optimization.
*   Ensemble Learning: Automatically constructs ensembles of models to improve performance.
*   Extensible: Integrates well with the scikit-learn ecosystem, allowing for custom pipelines and preprocessing.

## 5. FLAML
Description: FLAML (Fast and Lightweight AutoML) is a lightweight open-source library developed by Microsoft Research. It focuses on efficient and fast AutoML for both classification and regression tasks.
### Strengths:
*   Efficiency: Optimized for fast performance and low computational cost.
*   Simplicity: Easy-to-use interface with minimal setup.
*   Customization: Allows users to specify constraints and customize the search space.
*   Versatility: Supports a range of machine learning tasks including time series forecasting.





Install All Required package.

In [None]:
!pip install flaml
!pip install tpot
!pip install mljar-supervised
!pip install autogluon
!pip install scikit-learn
!pip install scikit-optimize

In [None]:
from autogluon.tabular import TabularDataset, TabularPredictor

#import autosklearn.regression ## It seems like the enviroments to run Autosklearn and FLAML are uncompatible with this setup. Save one of the predictions before installing different env

#from flaml import AutoML

import pandas as pd
import numpy as np
import os
import torch
import matplotlib.pyplot as plt
import argparse
import logging
import pickle

from sklearn.metrics import accuracy_score, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle

from IPython.display import Image, display

from datetime import datetime, timedelta

from scipy.stats import entropy

from __future__ import annotations

from pathlib import Path


Parameters Setting

In [None]:

random_seed = 42
time = 45

# Load different datsets with respect to the dataset of the experiement

In [None]:
# Final dataset

base_path = '../../data/exam_dataset'

X_train = pd.read_parquet(f'{base_path}/X_train.parquet')
y_train = pd.read_parquet(f'{base_path}/y_train.parquet')
train_dataset = pd.concat([X_train, y_train], axis=1)
#test = train_dataset.sample(frac=0.2, replace=False, random_state=random_seed)
exam_X_values = pd.read_parquet(f'{base_path}/X_test.parquet')

# Also instantiate the target column
label = 'price'



print(train_dataset.info())

In [None]:
# Change path respective to the dataset you are testing:
#  * Bike_Sharing_Demand (361099) - label: 'count'
#  * Brazilian Houses (361098) - label: 'total_(BRL)'
#  * y_prop_4_1 (361092) - label: 'oz252'
# Because the exam dataset is ~20,000 entries, we should sample around this as well maybe?

# Set the base path for the dataset
base_path = '../../data/361099'

# Initialize variables for training data
X_train = None
y_train = None

# Loop through each fold of the dataset
for fold_number in range(1, 11):
    # Read the X_train and y_train data for the current fold
    x_fold = pd.read_parquet(f'{base_path}/{fold_number}/X_train.parquet')
    y_fold = pd.read_parquet(f'{base_path}/{fold_number}/y_train.parquet')

    # Concatenate the data to the existing training data
    if X_train is None:
        X_train = x_fold
        y_train = y_fold
    else:
        X_train = pd.concat([X_train, x_fold])
        y_train = pd.concat([y_train, y_fold])

# Sample the training data to around 20,000 entries
X_train = X_train.sample(n=20000, random_state=random_seed)
y_train = y_train.sample(n=20000, random_state=random_seed)

# Create a concatenated dataset for gluon
train_dataset = pd.concat([X_train, y_train], axis=1)

# Initialize variable for test data
test = None

# Loop through each fold of the dataset
for fold_number in range(1, 11):
    # Read the X_test and y_test data for the current fold
    x_fold = pd.read_parquet(f'{base_path}/{fold_number}/X_test.parquet')
    y_fold = pd.read_parquet(f'{base_path}/{fold_number}/y_test.parquet')

    # Concatenate the data to the existing test data
    concat_fold = pd.concat([x_fold, y_fold], axis=1)
    if test is None:
        test = concat_fold
    else:
        test = pd.concat([test, concat_fold])

# Set the label for the dataset
label = 'count'

# Print the information of the training dataset
print(train_dataset.info())

## Run this job first, then switch from conda kernel and run the script again excluding this cell

# AutoSKLearn Training

Auto-sklearn automates the process of model selection and hyperparameter tuning to optimize machine learning pipelines. Save the best pipeline.

In [None]:
# Import the autosklearn.regression module
import autosklearn.regression

# Fit the AutoSklearnRegressor model with the given time budget
autosklearn = autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=time, n_jobs=-1).fit(X_train, y_train)

# Make predictions on the exam dataset
autosklearn_pred = autosklearn.predict(exam_X_values)

# Save the predictions to a pickle file
with open('autosklearn_pred_exam.pkl', 'wb') as f:
    pickle.dump(autosklearn_pred, f)





## Switch kernel to python default before running this
### Also need to run imports and variables again

# FLAML Training

Train a FLAML model which uses efficient hyperparameter optimization to optimize machine learning pipelines. The best pipeline is saved and then used for making predictions. FLAML focuses on pipeline optimization and offers a high degree of automation.

In [None]:
from flaml import AutoML
from sklearn.utils import shuffle

# Instantiate the AutoML object
flaml = AutoML()

# Define the list of learners to be used
learners = ['lgbm', 'rf', 'catboost', 'extra_tree', 'kneighbor']

# Fit the AutoML model
flaml.fit(
    np.array(X_train),  # Training data features
    np.array(y_train),  # Training data labels
    task="regression",  # Task type is regression
    time_budget=time,  # Time budget for the AutoML search
    estimator_list=learners,  # List of learners to be used
    verbose=1  # Set verbosity level to 1 for progress updates
)

# Make predictions on the exam dataset
flaml_pred = flaml.predict(exam_X_values)

# Save the predictions to a pickle file
with open(f'flaml_pred_exam.pkl', 'wb') as f:
    pickle.dump(flaml_pred, f)
#flaml_score = r2_score(test[label], flaml_pred)


# AutoGluon Training

Train a model which uses an automated machine learning (AutoML) library Autogluon designed to simplify the process of training and optimizing machine learning models. The predictions are saved.

In [None]:
# Fit the AutoGluon model with the given time budget
gluon = TabularPredictor(label=label, problem_type='regression', eval_metric='r2').fit(train_dataset, time_limit=time, presets='medium_quality', verbosity=1)

# Make predictions on the exam dataset
gluon_pred = gluon.predict(exam_X_values)

# Save the predictions to a pickle file
with open('gluon_predictions_EXAM.pkl', 'wb') as f:
    pickle.dump(gluon_pred, f)

# TPOT Training
Train a TPOT model which uses genetic algorithms to optimize machine learning pipelines. The best pipeline is saved and then used for making predictions. TPOT focuses on pipeline optimization and offers a high degree of automation.

In [None]:
from tpot import TPOTRegressor
import joblib

# Import the necessary libraries

# Instantiate the TPOTRegressor
tpot = TPOTRegressor(
    random_state=random_seed,
    n_jobs=-1,              # Utilize all CPU cores
    max_time_mins=1,        # Max total time in minutes
    max_eval_time_mins=0.2  # Max time per pipeline in minutes
)

# Fit the TPOTRegressor on the training data
tpot.fit(X_train, y_train)

# Save the fitted pipeline
joblib.dump(tpot.fitted_pipeline_, "tpot_pipeline.joblib")

# At prediction time, load the saved pipeline
loaded_pipeline = joblib.load("tpot_pipeline.joblib")

# Make predictions using the loaded pipeline
tpot_predictions = loaded_pipeline.predict(test.drop(columns=[label]))

# Calculate the R2 score of TPOT predictions
tpot_score = r2_score(test[label], tpot_predictions)

# Print the TPOT R2 score
print("TPOT R2 score:", tpot_score)

TPOT R2 score: 0.8568653401098989


#  MLJAR Training
Train an MLJAR model using the full dataset. MLJAR performs automated machine learning and provides a model that is evaluated on a test set. It focuses on easy-to-use interfaces and interpretability.

In [None]:
import pickle
from supervised import AutoML

# Prepare data
X_train = train_dataset.drop(columns=[label])
y_train = train_dataset[label]

# Initialize AutoML for regression
mljar_automl_regressor = AutoML(
    mode="Compete",
    total_time_limit=90,
    random_state=random_seed,
    n_jobs=-1
)

# Fit AutoML
mljar_automl_regressor.fit(X_train, y_train)

# Make predictions on the test set
mljar_predict = mljar_automl_regressor.predict(exam_X_values)

# Save the model as a pickle file
with open('mljar_model_EXAM.pkl', 'wb') as f:
    pickle.dump(mljar_automl_regressor, f)

# Save the predictions to a pickle file
with open('mljar_predictions_EXAM.pkl', 'wb') as f:
    pickle.dump(mljar_predict, f)

# Print completion messages
print("Model training completed and model saved as 'mljar_model_EXAM.pkl'.")
print("Predictions saved as 'mljar_predictions_EXAM.pkl'.")


# Load all predictions

In [None]:
# Load the mljar pred
with open('mljar_predictions_EXAM.pkl', 'rb') as f:
    mljar_pred = pickle.load(f)

#mljar_score = r2_score(test[label][0:len(mljar_pred)], mljar_pred)
print(mljar_pred, len(mljar_pred))

[13.67054082 12.99571632 12.26738305 ... 12.68263049 12.77947423
 12.12948934] 2162


In [None]:
# Load the flaml score
with open('flaml_pred_exam.pkl', 'rb') as f:
    flaml_pred = pickle.load(f)

#flaml_score = r2_score(test[label], flaml_pred)
print(flaml_pred)

0.980278110859552


In [None]:
# Load the AutoSKLEARN model
with open('autosklearn_pred_exam.pkl', 'rb') as f:
    autosklearn_pred = pickle.load(f)

#autosklearn_score = r2_score(test[label], autosklearn_pred)
print(autosklearn_pred, len(autosklearn_pred))

[13.77257299 12.98191023 12.3235178  ... 12.63088846 12.71451759
 12.15282154] 2162


In [None]:
# Load the gluon predictions
with open('gluon_predictions_EXAM.pkl', 'rb') as f:
    gluon_pred = pickle.load(f)

#gluon_score = r2_score(test[label], gluon_pred)
print(gluon_pred, len(gluon_pred))

# Optimize a weighted ensemble prediction

### TPOT is excluded due to poor performance measures

In [None]:
import numpy as np
from sklearn.metrics import r2_score
from skopt import gp_minimize
from skopt.space import Real

# Initial R2 scores
r2_flaml = flaml_score.clip(min=0)
r2_autosklearn = autosklearn_score.clip(min=0)
r2_gluon = eval_gluon['r2'].clip(min=0)
r2_mljar = mljar_score.clip(min=0)

# Normalize R2 scores to use as initial weights
total_r2 = r2_flaml + r2_autosklearn + r2_gluon + r2_mljar
initial_weights = [r2_flaml/total_r2, r2_autosklearn/total_r2, r2_gluon/total_r2, r2_mljar/total_r2]

def objective(weights):
    """
    Objective function for optimization.
    Calculates the ensemble prediction using the given weights and returns the negative R2 score.
    """
    w1, w2, w3, w4 = weights
    ensemble_pred = (w1 * flaml_pred + w2 * autosklearn_pred + w3 * gluon_pred +
                     w4 * mljar_pred) / (w1 + w2 + w3 + w4)
    return -r2_score(test[label], ensemble_pred)

# Define the search space
space = [Real(max(0, w-0.3), min(1, w+0.3), name=f'w{i+1}') for i, w in enumerate(initial_weights)]
# Set up the optimization
res = gp_minimize(objective, space, n_calls=50, random_state=random_seed, x0=initial_weights)

# Start from the initial weights

best_weights = res.x
total = sum(best_weights)
normalized_weights = [w/total for w in best_weights]
w1, w2, w3, w4 = normalized_weights

optimal_ensemble = (w1 * flaml_pred + w2 * autosklearn_pred + w3 * gluon_pred +
                    w4 * mljar_pred) / (w1 + w2 + w3 + w4)
optimal_r2 = r2_score(test[label], optimal_ensemble)


# Make the final ensemble-prediction

In [None]:
# Predictions of the final exam X dataset

# Ensemble to use from testing: AutoSklearn: 0.067, Gluon: 0.584, TPOT: 0.0000, MLJAR: 0.34891

final_exam_pred = (0.067 * autosklearn_pred + 0.584 * gluon_pred + 0.349 * mljar_pred)

# Convert to numpy array
predictions_array = final_exam_pred.to_numpy()

np.save('final_test_preds.npy', predictions_array)

loaded_preds = np.load('final_test_preds.npy')
print(loaded_preds)

In [None]:
# Compare it with a longer gluon run (benchmark)

gluon_benchmark = TabularPredictor(label=label, problem_type='regression', eval_metric='r2').fit(train_dataset, time_limit=180, presets='medium_quality', verbosity=1)
benchmark_pred = gluon_benchmark.predict(test.drop(columns=[label]))
eval_benchmark = gluon_benchmark.evaluate(test)

print('A 180 sec autogluon run on same dataset.', eval_benchmark['r2'])

# Print results of different experiements

Compare the performance of the models trained by AutoGluon, MLJAR, FLAML , Auto-Sklearn based on the R2 score, which measures the goodness of fit.

In [None]:
print(f'Experiment 21 - EXAM, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, MLJAR: {initial_weights[2]:.4f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, MLJAR: {w4:.4f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 21 - EXAM, random_seed=42, 45 sec each 


autosklearn r2: 0.9018991352967078
flaml r2: 0.903215268789459
gluon r2: 0.9086400096192336
mljar r2: 0.9066244754568729
Initial weights: FLAML: 0.249, AutoSklearn: 0.249, Gluon: 0.251, MLJAR: 0.2510
Optimal weights: FLAML: 0.000, AutoSklearn: 0.121, Gluon: 0.656, MLJAR: 0.2225
Optimal ensemble R2: 0.909100


In [None]:
print(f'Experiment 20 - Bike, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, MLJAR: {initial_weights[2]:.4f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, MLJAR: {w4:.4f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 20 - Bike, random_seed=21, 180 sec each 


autosklearn r2: 0.9711783604432226
flaml r2: 0.980278110859552
gluon r2: 0.9789584859714217
mljar r2: 0.9807043791233425
Initial weights: FLAML: 0.251, AutoSklearn: 0.248, Gluon: 0.250, MLJAR: 0.2503
Optimal weights: FLAML: 0.486, AutoSklearn: 0.000, Gluon: 0.000, MLJAR: 0.5142
Optimal ensemble R2: 0.986615


In [None]:
print(f'Experiment 19 - Brazil, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, MLJAR: {initial_weights[2]:.4f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, MLJAR: {w4:.4f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 19 - Brazil, random_seed=21, 180 sec each 


autosklearn r2: 0.9997569800698058
flaml r2: 0.9999512347718132
gluon r2: 0.999960956341897
mljar r2: 0.9999051208946943
Initial weights: FLAML: 0.250, AutoSklearn: 0.250, Gluon: 0.250, MLJAR: 0.2500
Optimal weights: FLAML: 0.309, AutoSklearn: 0.008, Gluon: 0.428, MLJAR: 0.2547
Optimal ensemble R2: 0.999976


In [None]:
print(f'Experiment 18 - property, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, MLJAR: {initial_weights[2]:.4f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, MLJAR: {w4:.4f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 18 property, random_seed=21, 180 sec each 


autosklearn r2: 0.6493061332266326
flaml r2: 0.9216527799760716
gluon r2: 0.8899306049614928
mljar r2: 0.9076180029394667
Initial weights: FLAML: 0.274, AutoSklearn: 0.193, Gluon: 0.264, MLJAR: 0.2642
Optimal weights: FLAML: 0.560, AutoSklearn: 0.000, Gluon: 0.000, MLJAR: 0.4404
Optimal ensemble R2: 0.946380


In [None]:
print(f'Experiment 17 (exam set), split train/test: 0.2, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 17 (exam set), split train/test: 0.2, random_seed=22, 300 sec each 


autosklearn r2: 0.8969039991304489
flaml r2: 0.8978766056739506
gluon r2: 0.9036253357963947
tpot r2: 0.8568653401098989
mljar r2: 0.9008634374359363
Initial weights: FLAML: 0.249, AutoSklearn: 0.249, Gluon: 0.251, TPOT: 0.2511, MLJAR: 0.25106
Optimal weights: FLAML: 0.000, AutoSklearn: 0.000, Gluon: 0.615, TPOT: 0.0000, MLJAR: 0.38536
Optimal ensemble R2: 0.903428


In [None]:
print(f'Experiment 16 (exam set), split train/test: 0.2, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 16 (exam set), split train/test: 0.2, random_seed=21, 180 sec each 


autosklearn r2: 0.8989279448427613
flaml r2: 0.8960400316232903
gluon r2: 0.9008207868994915
tpot r2: 0.8568653401098989
mljar r2: 0.8999698409260453
Initial weights: FLAML: 0.201, AutoSklearn: 0.202, Gluon: 0.202, TPOT: 0.2023, MLJAR: 0.20231
Optimal weights: FLAML: 0.000, AutoSklearn: 0.261, Gluon: 0.460, TPOT: 0.0000, MLJAR: 0.27934
Optimal ensemble R2: 0.902039


In [None]:
print(f'Experiment 15 (exam set), split train/test: 0.2, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

Experiment 15 (exam set), split train/test: 0.2, random_seed=20, 90 sec each 


autosklearn r2: 0.8999241727897213
flaml r2: 0.9015054467686099
gluon r2: 0.9052667552869527
tpot r2: 0.8596347668809607
mljar r2: 0.9035029820680053
Initial weights: FLAML: 0.202, AutoSklearn: 0.201, Gluon: 0.203, TPOT: 0.2025, MLJAR: 0.20253
Optimal weights: FLAML: 0.072, AutoSklearn: 0.000, Gluon: 0.670, TPOT: 0.0000, MLJAR: 0.25814
Optimal ensemble R2: 0.905631


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Data for optimal ensemble weights
experiments = ['Bike', 'Brazil', 'Property', 'Exam']
weights = [
    {'FLAML': 0.495, 'AutoSklearn': 0.000, 'Gluon': 0.238, 'TPOT': 0.0000, 'MLJAR': 0.26731},
    {'FLAML': 0.273, 'AutoSklearn': 0.000, 'Gluon': 0.544, 'TPOT': 0.0000, 'MLJAR': 0.18274},
    {'FLAML': 0.790, 'AutoSklearn': 0.000, 'Gluon': 0.066, 'TPOT': 0.0000, 'MLJAR': 0.14347},
    {'FLAML': 0.000, 'AutoSklearn': 0.067, 'Gluon': 0.584, 'TPOT': 0.0000, 'MLJAR': 0.34891}
]

# R2 scores
ensemble_r2 = [0.983029, 0.999959, 0.929676, 0.908760]
autogluon_r2 = [0.980324167040719, 0.9999610596082303, 0.8973623343877719, 0.9090960050759133]

# Colors for each model
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#ff99cc']

# Create pie plots for each experiment
fig, axs = plt.subplots(2, 2, figsize=(12, 12))
fig.suptitle('Optimal Ensemble Weights for Each Experiment', fontsize=16)

for i, ax in enumerate(axs.flatten()):
    labels = []
    sizes = []
    for key, value in weights[i].items():
        if value > 0:
            labels.append(key)
            sizes.append(value)
    ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=colors)
    ax.set_title(f'Experiment: {experiments[i]}')

plt.tight_layout()
plt.savefig('ensemble_weights_pie_charts.png')
plt.close()

# Create average distribution pie chart
avg_weights = {key: np.mean([w[key] for w in weights]) for key in weights[0]}
labels = []
sizes = []
for key, value in avg_weights.items():
    if value > 0:
        labels.append(key)
        sizes.append(value)

plt.figure(figsize=(8, 8))
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=colors)
plt.title('Average Distribution of Optimal Ensemble Weights', fontsize=16)
plt.savefig('average_ensemble_weights_pie_chart.png')
plt.close()

# Create bar plot comparing R2 scores
x = np.arange(len(experiments))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width/2, ensemble_r2, width, label='Optimal Ensemble', color='#66b3ff')
rects2 = ax.bar(x + width/2, autogluon_r2, width, label='AutoGluon (225 sec)', color='#ff9999')

ax.set_ylabel('R2 Score')
ax.set_title('Comparison of R2 Scores: Optimal Ensemble vs AutoGluon', fontsize=16)
ax.set_xticks(x)
ax.set_xticklabels(experiments)
ax.legend()

ax.bar_label(rects1, padding=3, fmt='%.4f')
ax.bar_label(rects2, padding=3, fmt='%.4f')

fig.tight_layout()
plt.savefig('r2_score_comparison.png')
plt.close()

In [None]:
print(f'Experiment 14 (exam set), split train/test: 0.2, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 225 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 14 (exam set), split train/test: 0.2, random_seed=42, 45 sec each 


autosklearn r2: 0.9018991352967078
flaml r2: 0.9021714011999556
gluon r2: 0.9082444600501948
tpot r2: 0.85345478487002
mljar r2: 0.9066244754568729
Initial weights: FLAML: 0.202, AutoSklearn: 0.202, Gluon: 0.203, TPOT: 0.2031, MLJAR: 0.20308
Optimal weights: FLAML: 0.000, AutoSklearn: 0.067, Gluon: 0.584, TPOT: 0.0000, MLJAR: 0.34891
Optimal ensemble R2: 0.908760

Compared with a 225 sec autogluon run on same dataset. 0.9090960050759133


In [None]:
print(f'Experiment 13 - property, pre split train/test, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 225 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 13 - property, pre split train/test, random_seed=42, 45 sec each 


autosklearn r2: 0.025615876031262363
flaml r2: 0.9292393814353446
gluon r2: 0.8970115852805034
tpot r2: 0.3901742233619726
mljar r2: 0.9038038161802924
Initial weights: FLAML: 0.295, AutoSklearn: 0.008, Gluon: 0.285, TPOT: 0.2851, MLJAR: 0.28514
Optimal weights: FLAML: 0.790, AutoSklearn: 0.000, Gluon: 0.066, TPOT: 0.0000, MLJAR: 0.14347
Optimal ensemble R2: 0.929676

Compared with a 225 sec autogluon run on same dataset. 0.8973623343877719


In [None]:
print(f'Experiment 12 - Brazil, pre split train/test, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 225 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 12 - Brazil, pre split train/test, random_seed=42, 45 sec each 


autosklearn r2: 0.9987812976782319
flaml r2: 0.9998829775590306
gluon r2: 0.9999361241713965
tpot r2: 0.994005936813352
mljar r2: 0.9998062947255467
Initial weights: FLAML: 0.200, AutoSklearn: 0.200, Gluon: 0.200, TPOT: 0.2003, MLJAR: 0.20029
Optimal weights: FLAML: 0.273, AutoSklearn: 0.000, Gluon: 0.544, TPOT: 0.0000, MLJAR: 0.18274
Optimal ensemble R2: 0.999959

Compared with a 225 sec autogluon run on same dataset. 0.9999610596082303


In [None]:
print(f'Experiment 11 - bike set, pre split train/test, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 225 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 11 - bike set, pre split train/test, random_seed=42, 45 sec each 


autosklearn r2: 0.96424752975455
flaml r2: 0.9812247807406366
gluon r2: 0.9801217428409494
tpot r2: 0.9385249767051255
mljar r2: 0.9795437216375422
Initial weights: FLAML: 0.203, AutoSklearn: 0.199, Gluon: 0.202, TPOT: 0.2024, MLJAR: 0.20235
Optimal weights: FLAML: 0.495, AutoSklearn: 0.000, Gluon: 0.238, TPOT: 0.0000, MLJAR: 0.26731
Optimal ensemble R2: 0.983029

Compared with a 225 sec autogluon run on same dataset. 0.980324167040719


In [None]:
print(f'Experiment 10 (exam set), split train/test: 0.2, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', autosklearn_score)
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)
print('mljar r2:', mljar_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}, MLJAR: {initial_weights[2]:.5f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}, MLJAR: {w5:.5f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 225 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 10 (exam set), split train/test: 0.2, random_seed=42, 45 sec each 


autosklearn r2: 0.9464678404550451
flaml r2: 0.92094008726315
gluon r2: 0.9477057918725242
tpot r2: 0.8802628203042405
mljar r2: 0.9429443640791044
Initial weights: FLAML: 0.199, AutoSklearn: 0.204, Gluon: 0.204, TPOT: 0.2043, MLJAR: 0.20432
Optimal weights: FLAML: 0.000, AutoSklearn: 0.443, Gluon: 0.557, TPOT: 0.0000, MLJAR: 0.00000
Optimal ensemble R2: 0.949917

Compared with a 225 sec autogluon run on same dataset. 0.9466974619226166


In [None]:
print(f'Experiment 9, {base_path}, pre split train/test, train subsample = 20 000, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', r2_score(test[label], autosklearn_pred))
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, "
      f"Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[3]:.3f}, MLJAR: {initial_weights[4]:.3f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.3f}, MLJAR: {w5:.3f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 195 sec autogluon run on same dataset.', eval_benchmark['r2'])

In [None]:
print(f'Experiment 8 (exam set), split train/test: 0.2, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', r2_score(test[label], autosklearn_pred))
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2:', tpot_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 195 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 8 (exam set), split train/test: 0.2, random_seed=100, 45 sec each 


autosklearn r2: 0.943178769284982
flaml r2: 0.9157893012151598
gluon r2: 0.9430203638395451
tpot r2: 0.8530380803372511
Initial weights: FLAML: 0.251, AutoSklearn: 0.258, Gluon: 0.258, TPOT: 0.2580
Optimal weights: FLAML: 0.000, AutoSklearn: 0.345, Gluon: 0.337, TPOT: 0.0000
Optimal ensemble R2: 0.945840

Compared with a 195 sec autogluon run on same dataset. 0.9430960612867438


In [None]:
print(f'Experiment 7, {base_path}, pre split train/test, train subsample = 20 000, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', r2_score(test[label], autosklearn_pred))
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
print('tpot r2 (60 sec):', tpot_score)

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}, TPOT: {initial_weights[2]:.4f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}, TPOT: {w4:.4f}         # using skopt.gp_minimize")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 180 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 7, ../../data/361098, pre split train/test, train subsample = 20 000, random_seed=42, 45 sec each 


autosklearn r2: -2.0545698919960387e-08
flaml r2: 0.9997542121304047
gluon r2: 0.9999007470404888
tpot r2 (60 sec): 0.9966102873818953
Initial weights: FLAML: 0.334, AutoSklearn: 0.000, Gluon: 0.334, TPOT: 0.3337
Optimal weights: FLAML: 0.634, AutoSklearn: 0.000, Gluon: 0.564, TPOT: 0.0326         # using skopt.gp_minimize
Optimal ensemble R2: 0.999894

Compared with a 180 sec autogluon run on same dataset. 0.9999499985304586


In [None]:
print(f'Experiment 6, {base_path}, pre split train/test, train subsample = 20 000, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', r2_score(test[label], autosklearn_pred))
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
ensemble_4 = (flaml_pred + autosklearn_pred + gluon_pred) / 3
print('R2 score 33/33/33 FLAML, gluon, autosklearn', r2_score(ensemble_4, test[label]))
ensemble_3 = (gluon_pred + flaml_pred) / 2
print('R2 score ensemble 50/50 FLAML and gluon', r2_score(ensemble_3, test[label]))

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}          # using skopt.gp_minimize")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 135 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 6, ../../data/361098, pre split train/test, train subsample = 20 000, random_seed=42, 45 sec each 


autosklearn r2: -2.0545698919960387e-08
flaml r2: 0.9997542121304047
gluon r2: 0.9999007470404888
R2 score 33/33/33 FLAML, gluon, autosklearn 0.7494935506893418
R2 score ensemble 50/50 FLAML and gluon 0.9999030672087769
Initial weights: FLAML: 0.500, AutoSklearn: 0.000, Gluon: 0.500
Optimal weights: FLAML: 0.200, AutoSklearn: 0.000, Gluon: 0.711          # using skopt.gp_minimize
Optimal ensemble R2: 0.999920

Compared with a 135 sec autogluon run on same dataset. 0.9999479563293344


In [None]:
print(f'Experiment 5, {base_path}, pre split train/test, train subsample = 20 000, random_seed={random_seed}, {time} sec each \n\n')

print('autosklearn r2:', r2_score(test[label], autosklearn_pred))
print('flaml r2:', flaml_score)
print('gluon r2:', eval_gluon['r2'])
ensemble_4 = (flaml_pred + autosklearn_pred + gluon_pred) / 3
print('R2 score 33/33/33 FLAML, gluon, autosklearn', r2_score(ensemble_4, test[label]))
ensemble_3 = (gluon_pred + flaml_pred) / 2
print('R2 score ensemble 50/50 FLAML and gluon', r2_score(ensemble_3, test[label]))

print(f"Initial weights: FLAML: {initial_weights[0]:.3f}, AutoSklearn: {initial_weights[1]:.3f}, Gluon: {initial_weights[2]:.3f}")
print(f"Optimal weights: FLAML: {w1:.3f}, AutoSklearn: {w2:.3f}, Gluon: {w3:.3f}          # using skopt.gp_minimize")
print(f"Optimal ensemble R2: {optimal_r2:.6f}")

print('\nCompared with a 135 sec autogluon run on same dataset.', eval_benchmark['r2'])

Experiment 5, ../../data/361099, pre split train/test, train subsample = 20 000, random_seed=42, 45 sec each 


autosklearn r2: 0.96424752975455
flaml r2: 0.9793072031648467
gluon r2: 0.9775395392560148
R2 score 33/33/33 FLAML, gluon, autosklearn 0.9777513822896438
R2 score ensemble 50/50 FLAML and gluon 0.9806118558444856
Initial weights: FLAML: 0.335, AutoSklearn: 0.330, Gluon: 0.335
Optimal weights: FLAML: 0.635, AutoSklearn: 0.030, Gluon: 0.447          # using skopt.gp_minimize
Optimal ensemble R2: 0.981362

Compared with a 135 sec autogluon run on same dataset. 0.9801217428409494
