## AutoML
## Empirical Tests - Auto-sklearn

This project aims to explore some of the main **AutoML tools** available, which involves the following tasks:
1. Reading of technical articles concerning the automated machine learning field.
2. Discussion about machine learning pipelines and the automation of some of their components.
3. Identification of the most interesting Python libraries for automatic ML pipeline construction.
4. Quick implementation of the selected tools with simulated data.
5. Careful exploration of the APIs of the selected tools.
6. Comparison among selected tools concerning: model performance, computation time, and usability.

All of these activities derive from the **objectives** of this project, which are: i) reflection about ML pipeline components; ii) discussion and analysis of AutoML tools; iii) identification of key-points of AutoML frameworks; iv) definition of: the advantages and disadvantages of main AutoML tools, and, first of all, the relavance and adequacy of implementing AutoML.

---------------------

In this series of notebooks, we test out different AutoML Python libraries and compare them according to the following criteria: performance metrics of developed pipelines evaluated on test data; computation time (i.e., the performance relative to the available time budget of the search process); and usability of the tool.

* **Performance:** for each tool, after providing them with a training data (that will receive the appropriate validation approach by each tool), and after the search for the best ML pipeline, the selected one will be evaluated on a hold-out dataset (25% of the complete dataset). The model assessment will be based on the following metrics, since the supervised learning task is a binary classification here: [ROC-AUC](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html), [average precision score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html), [Brier score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.brier_score_loss.html), [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html), and [MCC](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html).

* **Computation time:** all tested AutoML tools have some sort of time budget for the search process. Therefore, instead of minimizing the computation time across all tested tools, we will explore three different time budgets: 20 minutes, 1 hour, and 6 hours. Consequently, one of the main aspects of the comparison among tools will be the performance achieved by each one of them given different time budgets, besides of the average performance throughout all time budgets.

* **Usability:** this aspect of the comparison refers to how easy it is to set up the search for each one of the tested tools. Also important are the outputs of the search process, mainly in terms of the visualization and assessment of constructed and selected pipelines. Besides, the diversity of produced information about the search and how clear it is to access and interpret these data are also an aspect to have in mind. Finally, the more straightforward it is to use a selected pipeline the better is the tool.

The empirical tests follow the reading and discussing of the APIs of all selected tools. So, since the main initialization arguments, methods and attributes have been defined, they will be used accordingly in these notebooks.

The data used for the empirical tests was found in Kaggle repository of datasets. It consists of a dataset for binary classification whose objective is to construct a classification algorithm for the [identification of malware apps](https://www.kaggle.com/saurabhshahane/android-permission-dataset). It has 27310 unique instances (mobile phone applications) and 184 variables, among which one is the binary outcome variable and another is the name of the app. Since the main objective of this project is to explore AutoML tools, only some basic feature engineering operations were implemented, besides of a short description and exploration of the data.

------------

**Summary:**
1. [Libraries](#libraries)<a href='#libraries'></a>.
2. [Functions and classes](#functions_classes)<a href='#functions_classes'></a>.
3. [Settings](#settings)<a href='#settings'></a>.
4. [Importing datasets](#imports)<a href='#imports'></a>.
    * [Features and outcome variables](#feats_outcomes)<a href='#feats_outcomes'></a>.

<a id='libraries'></a>

## Libraries

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
cd "/content/gdrive/MyDrive/Studies/autoML/Codes"

/content/gdrive/MyDrive/Studies/autoML/Codes


In [None]:
# !curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip3 install

In [None]:
# pip install auto-sklearn

In [None]:
# pip install -r requirements.txt

In [None]:
# pip uninstall scikit-learn

In [None]:
# pip install scikit-learn==0.24.2

In [None]:
import pandas as pd
import numpy as np
import os
import json
from datetime import datetime
from time import time
import pickle

from sklearn.metrics import roc_auc_score, accuracy_score, average_precision_score, brier_score_loss, accuracy_score, matthews_corrcoef

# sudo apt-get install build-essential swig
# curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip3 install
# pip install auto-sklearn
import autosklearn # Auto-sklearn
from autosklearn.classification import AutoSklearnClassifier
print(f'Auto-sklearn: version {autosklearn.__version__}.')

Auto-sklearn: version 0.13.0.


<a id='functions_classes'></a>

## Functions and classes

In [None]:
from utils import running_time, correct_col_name, train_test_split
from pre_process import pre_process

<a id='settings'></a>

## Settings

### Data management

In [None]:
# Identification of the test:
estimation_id = str(int(time()))

# Declare whether to export results:
export = True

### ML pipeline search

#### Search complexity parameters

In [None]:
# Time (seconds) for constructing a ML pipeline:
time_constraint_search = 6*60*60

# Time constraint for training a model:
time_constraint_models = None

#### Estimation parameters

In [None]:
# Number of models for the final ensemble:
ensemble_size = 50

# Collection of ML algorithms to include into and exclude from the search:
include_estimators, exclude_estimators = (None, None)

# Collection of data preprocessing procedures to include into and exclude from the search:
include_preprocessors, exclude_preprocessors = (None, None)

# Performance metric for choosing the best models.
val_metric = autosklearn.metrics.roc_auc

# Procedure to more adequately choose parameters and models:
val_strategy = 'cv'

# Arguments for the chosen validation strategy:
val_strategy_args = {'folds': 5}

#### Computation parameters

In [None]:
# Number of jobs to run in parallel:
n_jobs = 1

<a id='imports'></a>

## Importing datasets

<a id='feats_outcomes'></a>

### Features and outcome variables

In [None]:
# Importing data:
df = pd.read_csv('../Datasets/Android_Permission.csv')

# Columns names:
df.columns = [correct_col_name(c) for c in df.columns]

# Auxiliary variables:
drop_vars = ['app', 'package', 'class']

# Removing duplicates:
df.drop_duplicates(inplace=True)

print(f'Shape of data: {df.shape}.')
df.head(3)

Shape of data: (27310, 184).


Unnamed: 0,app,package,category,description,rating,number_of_ratings,price,related_apps,dangerous_permissions_count,safe_permissions_count,access_drm_content_,access_email_provider_data,access_all_system_downloads,access_download_manager_,advanced_download_manager_functions_,audio_file_access,install_drm_content_,modify_google_service_configuration,modify_google_settings,move_application_resources,read_google_settings,send_download_notifications_,voice_search_shortcuts,access_surfaceflinger,access_checkin_properties,access_the_cache_filesystem,access_to_passwords_for_google_accounts,act_as_an_account_authenticator,bind_to_a_wallpaper,bind_to_an_input_method,change_screen_orientation,coarse,control_location_update_notifications,control_system_backup_and_restore,delete_applications,delete_other_applications_caches,delete_other_applications_data,directly_call_any_phone_numbers,directly_install_applications,disable_or_modify_status_bar,...,your_accounts_access_other_google_services,your_accounts_act_as_an_account_authenticator,your_accounts_act_as_the_accountmanagerservice,your_accounts_contacts_data_in_google_accounts,your_accounts_discover_known_accounts,your_accounts_manage_the_accounts_list,your_accounts_read_google_service_configuration,your_accounts_use_the_authentication_credentials_of_an_account,your_accounts_view_configured_accounts,your_location_access_extra_location_provider_commands,your_location_coarse,your_location_fine,your_location_mock_location_sources_for_testing,your_messages_read_email_attachments,your_messages_send_gmail,your_messages_edit_sms_or_mms,your_messages_modify_gmail,your_messages_read_gmail,your_messages_read_gmail_attachment_previews,your_messages_read_sms_or_mms,your_messages_read_instant_messages,your_messages_receive_mms,your_messages_receive_sms,your_messages_receive_wap,your_messages_send_sms_received_broadcast,your_messages_send_wap_push_received_broadcast,your_messages_write_instant_messages,your_personal_information_add_or_modify_calendar_events_and_send_email_to_guests,your_personal_information_choose_widgets,your_personal_information_read_browsers_history_and_bookmarks,your_personal_information_read_calendar_events,your_personal_information_read_contact_data,your_personal_information_read_sensitive_log_data,your_personal_information_read_user_defined_dictionary,your_personal_information_retrieve_system_internal_state,your_personal_information_set_alarm_in_alarm_clock,your_personal_information_write_browsers_history_and_bookmarks,your_personal_information_write_contact_data,your_personal_information_write_to_user_defined_dictionary,class
0,Canada Post Corporation,com.canadapost.android,Business,Canada Post Mobile App gives you access to som...,3.1,77,0.0,"{com.adaffix.pub.ca.android, com.kevinquan.gas...",7.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0
1,Word Farm,com.realcasualgames.words,Brain & Puzzle,Speed and strategy combine in this exciting wo...,4.3,199,0.0,"{air.com.zubawing.FastWordLite, com.joybits.do...",3.0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Fortunes of War FREE,fortunesofwar.free,Cards & Casino,"Fortunes of War is a fast-paced, easy to learn...",4.1,243,0.0,"{com.kevinquan.condado, hu.monsta.pazaak, net....",1.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Features names

In [None]:
features_names = list(df.drop(drop_vars, axis=1).columns)

for i in range(len(features_names)):
  print(features_names[i:i+10], '\n')

['category', 'description', 'rating', 'number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data'] 

['description', 'rating', 'number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data', 'access_all_system_downloads'] 

['rating', 'number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data', 'access_all_system_downloads', 'access_download_manager_'] 

['number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data', 'access_all_system_downloads', 'access_download_manager_', 'advanced_download_manager_functions_'] 

['price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_

#### Data types

In [None]:
data_types = pd.DataFrame(df.dtypes, columns=['type']).reset_index(drop=False)
data_types.columns = ['feature', 'type']

print('\033[1mDistribution of data types:\033[0m')
print(data_types.type.value_counts())

[1mDistribution of data types:[0m
int64      176
object       5
float64      3
Name: type, dtype: int64


#### Train-test split

In [None]:
df_train, df_test = train_test_split(df, test_ratio=0.25, shuffle=True)

#### Feature engineering

Related apps

In [None]:
# Creating the variable with the number of related apps:
df_train['related_apps'] = df_train['related_apps'].apply(lambda x: x if pd.isna(x) else x.replace('{', '').replace('}', ''))
df_train['num_related_apps'] = df_train.related_apps.apply(lambda x: x if pd.isna(x) else len(x.split(',')))
df_test['num_related_apps'] = df_test.related_apps.apply(lambda x: x if pd.isna(x) else len(x.split(',')))

# Updating the list of auxiliary variables:
drop_vars.append('related_apps')

Description

In [None]:
# Creating the variable that indicates the number of words in a description:
df_train['num_words_desc'] = df_train.description.apply(lambda x: x if pd.isna(x) else len(x.split(' ')))
df_test['num_words_desc'] = df_test.description.apply(lambda x: x if pd.isna(x) else len(x.split(' ')))

# Updating the list of auxiliary variables:
drop_vars.append('description')

Category

Even though this feature engineering is actually a transformation applied over categorical features, we first implement one-hot encoding in order to translate this categorical attribute into a numerical one, since some AutoML tools explored within this project do not allow textual inputs.

In [None]:
from transformations import applying_one_hot

In [None]:
transf_data = applying_one_hot(training_data=df_train, cat_vars=['category'], variance_param=-1, test_data=df_test)
df_train = transf_data['training_data']
df_test = transf_data['test_data']

[1mNumber of categorical features:[0m 1
[1mNumber of overall selected dummies:[0m 30.


<a id='data_description'></a>

## Data description

<a id='features_types'></a>

### Features types

In [None]:
feature_types = pd.DataFrame(df_train.drop(drop_vars, axis=1).dtypes, columns=['type']).reset_index(drop=False)
feature_types.columns = ['feature', 'type']

print('\033[1mDistribution of data types (features):\033[0m')
print(feature_types.type.value_counts())

[1mDistribution of data types (features):[0m
int64      175
uint8       30
float64      5
Name: type, dtype: int64


<a id='ml_pipeline'></a>

## ML pipeline

In [None]:
help(AutoSklearnClassifier)

Help on class AutoSklearnClassifier in module autosklearn.estimators:

class AutoSklearnClassifier(AutoSklearnEstimator, sklearn.base.ClassifierMixin)
 |  AutoSklearnClassifier(time_left_for_this_task=3600, per_run_time_limit=None, initial_configurations_via_metalearning=25, ensemble_size: int = 50, ensemble_nbest=50, max_models_on_disc=50, seed=1, memory_limit=3072, include_estimators=None, exclude_estimators=None, include_preprocessors=None, exclude_preprocessors=None, resampling_strategy='holdout', resampling_strategy_arguments=None, tmp_folder=None, delete_tmp_folder_after_terminate=True, n_jobs: Union[int, NoneType] = None, dask_client: Union[distributed.client.Client, NoneType] = None, disable_evaluator_output=False, get_smac_object_callback=None, smac_scenario_args=None, logging_config=None, metadata_directory=None, metric=None, scoring_functions: Union[List[autosklearn.metrics.Scorer], NoneType] = None, load_models: bool = True, get_trials_callback=None)
 |  
 |  This class imp

<a id='ml_pipeline_search'></a>

### ML pipeline search

#### Setting the search

In [None]:
# Creating the AutoML object:
model = AutoSklearnClassifier(
    # Search complexity parameters:
    time_left_for_this_task=time_constraint_search, per_run_time_limit=time_constraint_models, memory_limit=3072,

    # Estimation parameters:
    ensemble_size=ensemble_size, ensemble_nbest=ensemble_size, max_models_on_disc=None, seed=1,
    include_estimators=include_estimators, exclude_estimators=exclude_estimators,
    include_preprocessors=include_preprocessors, exclude_preprocessors=exclude_preprocessors,
    resampling_strategy=val_strategy, resampling_strategy_arguments=val_strategy_args,
    metric=val_metric, scoring_functions=None,
    
    # Computation parameters:
    n_jobs=n_jobs, dask_client=None
)

#### Running the search

In [None]:
start_time = datetime.now()

# Running the search:
model.fit(df_train.drop(drop_vars, axis=1), df_train['class'])

# Total elapsed time:
end_time = datetime.now()
search_time = running_time(start_time=start_time, end_time=end_time)

------------------------------------
[1mRunning time:[0m 360.0 minutes.
Start time: 2021-09-05, 20:20:39
End time: 2021-09-06, 02:20:39
------------------------------------


### Assessing the outcomes

#### ML pipeline

Search process

In [None]:
print(model.sprint_statistics())

auto-sklearn results:
  Dataset name: bc8176dc-0e86-11ec-8213-0242ac1c0002
  Metric: roc_auc
  Best validation score: 0.903132
  Number of target algorithm runs: 80
  Number of successful target algorithm runs: 57
  Number of crashed target algorithm runs: 1
  Number of target algorithms that exceeded the time limit: 5
  Number of target algorithms that exceeded the memory limit: 17



Constructed pipelines

In [None]:
ml_pipelines = pd.DataFrame(data=model.cv_results_)

print(f'Shape of ml_pipelines: {ml_pipelines.shape}.')
print(f'Number of tested pipelines: {len(ml_pipelines)}.')

ml_pipelines.head(3)

Shape of ml_pipelines: (80, 173).
Number of tested pipelines: 80.


Unnamed: 0,mean_test_score,mean_fit_time,params,rank_test_scores,status,budgets,param_balancing:strategy,param_classifier:__choice__,param_data_preprocessing:categorical_transformer:categorical_encoding:__choice__,param_data_preprocessing:categorical_transformer:category_coalescence:__choice__,param_data_preprocessing:numerical_transformer:imputation:strategy,param_data_preprocessing:numerical_transformer:rescaling:__choice__,param_feature_preprocessor:__choice__,param_classifier:adaboost:algorithm,param_classifier:adaboost:learning_rate,param_classifier:adaboost:max_depth,param_classifier:adaboost:n_estimators,param_classifier:bernoulli_nb:alpha,param_classifier:bernoulli_nb:fit_prior,param_classifier:decision_tree:criterion,param_classifier:decision_tree:max_depth_factor,param_classifier:decision_tree:max_features,param_classifier:decision_tree:max_leaf_nodes,param_classifier:decision_tree:min_impurity_decrease,param_classifier:decision_tree:min_samples_leaf,param_classifier:decision_tree:min_samples_split,param_classifier:decision_tree:min_weight_fraction_leaf,param_classifier:extra_trees:bootstrap,param_classifier:extra_trees:criterion,param_classifier:extra_trees:max_depth,param_classifier:extra_trees:max_features,param_classifier:extra_trees:max_leaf_nodes,param_classifier:extra_trees:min_impurity_decrease,param_classifier:extra_trees:min_samples_leaf,param_classifier:extra_trees:min_samples_split,param_classifier:extra_trees:min_weight_fraction_leaf,param_classifier:gradient_boosting:early_stop,param_classifier:gradient_boosting:l2_regularization,param_classifier:gradient_boosting:learning_rate,param_classifier:gradient_boosting:loss,...,param_feature_preprocessor:liblinear_svc_preprocessor:loss,param_feature_preprocessor:liblinear_svc_preprocessor:multi_class,param_feature_preprocessor:liblinear_svc_preprocessor:penalty,param_feature_preprocessor:liblinear_svc_preprocessor:tol,param_feature_preprocessor:nystroem_sampler:kernel,param_feature_preprocessor:nystroem_sampler:n_components,param_feature_preprocessor:pca:keep_variance,param_feature_preprocessor:pca:whiten,param_feature_preprocessor:polynomial:degree,param_feature_preprocessor:polynomial:include_bias,param_feature_preprocessor:polynomial:interaction_only,param_feature_preprocessor:random_trees_embedding:bootstrap,param_feature_preprocessor:random_trees_embedding:max_depth,param_feature_preprocessor:random_trees_embedding:max_leaf_nodes,param_feature_preprocessor:random_trees_embedding:min_samples_leaf,param_feature_preprocessor:random_trees_embedding:min_samples_split,param_feature_preprocessor:random_trees_embedding:min_weight_fraction_leaf,param_feature_preprocessor:random_trees_embedding:n_estimators,param_feature_preprocessor:select_percentile_classification:percentile,param_feature_preprocessor:select_percentile_classification:score_func,param_feature_preprocessor:select_rates_classification:alpha,param_feature_preprocessor:select_rates_classification:score_func,param_classifier:gradient_boosting:n_iter_no_change,param_classifier:gradient_boosting:validation_fraction,param_classifier:lda:shrinkage_factor,param_classifier:libsvm_svc:coef0,param_classifier:libsvm_svc:degree,param_classifier:mlp:validation_fraction,param_classifier:sgd:epsilon,param_classifier:sgd:eta0,param_classifier:sgd:l1_ratio,param_classifier:sgd:power_t,param_feature_preprocessor:fast_ica:n_components,param_feature_preprocessor:kernel_pca:coef0,param_feature_preprocessor:kernel_pca:degree,param_feature_preprocessor:kernel_pca:gamma,param_feature_preprocessor:nystroem_sampler:coef0,param_feature_preprocessor:nystroem_sampler:degree,param_feature_preprocessor:nystroem_sampler:gamma,param_feature_preprocessor:select_rates_classification:mode
0,0.0,118.147032,"{'balancing:strategy': 'none', 'classifier:__c...",58,Memout,0.0,none,random_forest,one_hot_encoding,minority_coalescer,mean,standardize,no_preprocessing,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,0.8461,265.246051,"{'balancing:strategy': 'weighting', 'classifie...",26,Success,0.0,weighting,mlp,no_encoding,no_coalescense,mean,standardize,feature_agglomeration,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,0.781926,411.26472,"{'balancing:strategy': 'none', 'classifier:__c...",42,Success,0.0,none,libsvm_svc,no_encoding,minority_coalescer,mean,robust_scaler,feature_agglomeration,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [None]:
print('Distribution of pipelines by status:')
print(ml_pipelines.status.value_counts())

Distribution of pipelines by status:
Success    57
Memout     17
Timeout     5
Crash       1
Name: status, dtype: int64


Final ensemble

In [None]:
# Collecting the constructed pipeline:
selected_pipeline = []

# Loop over pipelines:
for p in model.get_models_with_weights():
  selected_pipeline.append((p[0], str(p[1])))

In [None]:
# Selected ML pipelines and their weights in the ensemble:
print(f'Number of models in the final ensemble: {len(model.get_models_with_weights())}.')
model.get_models_with_weights()

Number of models in the final ensemble: 12.


[(0.56,
  SimpleClassificationPipeline({'balancing:strategy': 'weighting', 'classifier:__choice__': 'qda', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'median', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'minmax', 'feature_preprocessor:__choice__': 'nystroem_sampler', 'classifier:qda:reg_param': 0.9079051309300096, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.3413101525454199, 'feature_preprocessor:nystroem_sampler:kernel': 'rbf', 'feature_preprocessor:nystroem_sampler:n_components': 188, 'feature_preprocessor:nystroem_sampler:gamma': 0.0009110507785680229},
  dataset_properties={
    'task': 1,
    'sparse': False,
    'multilabel': False,
    'multiclass': False,
    'target_type': 'classificat

In [None]:
# Statistics concerning the selected ML pipelines:
ensemble_assess = model.leaderboard(detailed=False)
ensemble_assess

Unnamed: 0_level_0,rank,ensemble_weight,type,cost,duration
model_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
46,1,0.06,gradient_boosting,0.096868,35.922419
12,2,0.1,gradient_boosting,0.09709,114.912219
41,3,0.06,gradient_boosting,0.097497,27.913286
79,4,0.06,gradient_boosting,0.097712,102.14776
17,5,0.04,gradient_boosting,0.098254,23.485076
26,6,0.02,gradient_boosting,0.100196,27.322839
65,7,0.02,gradient_boosting,0.105482,485.348571
49,8,0.02,gradient_boosting,0.107242,23.552509
68,9,0.02,gradient_boosting,0.157131,46.842356
30,10,0.02,k_nearest_neighbors,0.166713,600.451983


In [None]:
# Statistics concerning the selected ML pipelines:
ensemble_assess_detailed = model.leaderboard(detailed=True)
ensemble_assess_detailed

Unnamed: 0_level_0,rank,ensemble_weight,type,cost,duration,config_id,train_loss,seed,start_time,end_time,budget,status,data_preprocessors,feature_preprocessors,balancing_strategy,config_origin
model_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
46,1,0.06,gradient_boosting,0.096868,35.922419,45,0.071783,0,1630886000.0,1630886000.0,0.0,StatusType.SUCCESS,"[one_hot_encoding, minority_coalescer, minmax]",[select_rates_classification],none,Local Search
12,2,0.1,gradient_boosting,0.09709,114.912219,11,0.070078,0,1630875000.0,1630875000.0,0.0,StatusType.SUCCESS,"[no_encoding, no_coalescense, robust_scaler]",[feature_agglomeration],none,Initial design
41,3,0.06,gradient_boosting,0.097497,27.913286,40,0.080366,0,1630884000.0,1630884000.0,0.0,StatusType.SUCCESS,"[one_hot_encoding, no_coalescense, minmax]",[select_rates_classification],weighting,Local Search
79,4,0.06,gradient_boosting,0.097712,102.14776,78,0.083312,0,1630894000.0,1630894000.0,0.0,StatusType.SUCCESS,"[no_encoding, minority_coalescer, robust_scaler]",[no_preprocessing],weighting,Local Search
17,5,0.04,gradient_boosting,0.098254,23.485076,16,0.064328,0,1630876000.0,1630876000.0,0.0,StatusType.SUCCESS,"[one_hot_encoding, no_coalescense, none]",[select_percentile_classification],weighting,Initial design
26,6,0.02,gradient_boosting,0.100196,27.322839,25,0.08732,0,1630880000.0,1630880000.0,0.0,StatusType.SUCCESS,"[one_hot_encoding, no_coalescense, standardize]",[select_rates_classification],none,Random Search (sorted)
65,7,0.02,gradient_boosting,0.105482,485.348571,64,0.050628,0,1630890000.0,1630891000.0,0.0,StatusType.SUCCESS,"[one_hot_encoding, no_coalescense, none]",[fast_ica],none,Local Search
49,8,0.02,gradient_boosting,0.107242,23.552509,48,0.070971,0,1630886000.0,1630886000.0,0.0,StatusType.SUCCESS,"[encoding, no_coalescense, robust_scaler]",[select_percentile_classification],none,Local Search
68,9,0.02,gradient_boosting,0.157131,46.842356,67,0.121718,0,1630891000.0,1630891000.0,0.0,StatusType.SUCCESS,"[one_hot_encoding, no_coalescense, power_trans...",[feature_agglomeration],weighting,Local Search
30,10,0.02,k_nearest_neighbors,0.166713,600.451983,29,0.065745,0,1630881000.0,1630881000.0,0.0,StatusType.SUCCESS,"[no_encoding, minority_coalescer, quantile_tra...",[feature_agglomeration],weighting,Random Search


#### Model evaluation

In [None]:
# Predictions for hold-out data:
y_hat = [p[1] for p in model.predict_proba(df_test.drop(drop_vars, axis=1))]

# Performance metrics of the best model:
test_roc_auc = roc_auc_score(df_test['class'], y_hat)
test_avg_prec = average_precision_score(df_test['class'], y_hat)
test_brier = brier_score_loss(df_test['class'], y_hat)
test_acc = accuracy_score(df_test['class'], [1 if p > 0.5 else 0 for p in y_hat])
test_mcc = matthews_corrcoef(df_test['class'], [1 if p > 0.5 else 0 for p in y_hat])

print(f'Test ROC-AUC: {test_roc_auc:.4f}.')
print(f'Test average-precision score: {test_avg_prec:.4f}.')
print(f'Test Brier score: {test_brier:.4f}.')
print(f'Test accuracy: {test_acc:.4f}.')
print(f'Test MCC: {test_mcc:.4f}.')

Test ROC-AUC: 0.9051.
Test average-precision score: 0.9575.
Test Brier score: 0.1550.
Test accuracy: 0.8179.
Test MCC: 0.5856.


#### Model assessment

In [None]:
model_assess = {
    "estimation_id": str(estimation_id),
    "autoML": "auto_sklearn",
    "parameters": {
      "search_complexity": {
        "time_constraint_search": time_constraint_search, "time_constraint_models": time_constraint_models
      },
      "estimation": {
        "ensemble_size": ensemble_size, "include_estimators": include_estimators, "exclude_estimators": exclude_estimators,
        "include_preprocessors": include_preprocessors, "exclude_preprocessors": exclude_preprocessors, "val_metric": str(val_metric),
        "val_strategy": val_strategy, "val_strategy_args": val_strategy_args
      },
      "computation": {"n_jobs": n_jobs}
    },
    "running_time": search_time,
    "performance_metrics": {
        "test_roc_auc": test_roc_auc, "test_avg_prec": test_avg_prec, "test_brier": test_brier, "test_acc": test_acc, "test_mcc": test_mcc
    }
}

### Exporting the outcomes

In [None]:
if export:
  # Constructed ML pipelines:
  ml_pipelines.to_csv(f'../Datasets/Outcomes/auto_sklearn/ml_pipelines_{estimation_id}.csv', index=False)
  ensemble_assess.to_csv(f'../Datasets/Outcomes/auto_sklearn/ensemble_assess_{estimation_id}.csv', index=False)
  ensemble_assess_detailed.to_csv(f'../Datasets/Outcomes/auto_sklearn/ensemble_assess_detailed_{estimation_id}.csv', index=False)

  # Selected ML pipeline:
  with open(f'../Datasets/Outcomes/auto_sklearn/selected_pipeline_{estimation_id}.json', 'w') as json_file:
    json.dump(selected_pipeline, json_file, indent=2)

  # ML pipeline:
  pickle.dump(model, open(f'../Datasets/Outcomes/auto_sklearn/model_{estimation_id}.pickle', 'wb'))

  # Model assessment:
  with open(f'../Datasets/Outcomes/auto_sklearn/model_assess_{estimation_id}.json', 'w') as json_file:
    json.dump(model_assess, json_file, indent=2)