## AutoML
## Empirical Tests - MLJAR

This project aims to explore some of the main **AutoML tools** available, which involves the following tasks:
1. Reading of technical articles concerning the automated machine learning field.
2. Discussion about machine learning pipelines and the automation of some of their components.
3. Identification of the most interesting Python libraries for automatic ML pipeline construction.
4. Quick implementation of the selected tools with simulated data.
5. Careful exploration of the APIs of the selected tools.
6. Comparison among selected tools concerning: model performance, computation time, and usability.

All of these activities derive from the **objectives** of this project, which are: i) reflection about ML pipeline components; ii) discussion and analysis of AutoML tools; iii) identification of key-points of AutoML frameworks; iv) definition of: the advantages and disadvantages of main AutoML tools, and, first of all, the relavance and adequacy of implementing AutoML.

---------------------

In this series of notebooks, we test out different AutoML Python libraries and compare them according to the following criteria: performance metrics of developed pipelines evaluated on test data; computation time (i.e., the performance relative to the available time budget of the search process); and usability of the tool.

* **Performance:** for each tool, after providing them with a training data (that will receive the appropriate validation approach by each tool), and after the search for the best ML pipeline, the selected one will be evaluated on a hold-out dataset (25% of the complete dataset). The model assessment will be based on the following metrics, since the supervised learning task is a binary classification here: [ROC-AUC](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html), [average precision score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html), [Brier score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.brier_score_loss.html), [accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html), and [MCC](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html).

* **Computation time:** all tested AutoML tools have some sort of time budget for the search process. Therefore, instead of minimizing the computation time across all tested tools, we will explore three different time budgets: 20 minutes, 1 hour, and 6 hours. Consequently, one of the main aspects of the comparison among tools will be the performance achieved by each one of them given different time budgets, besides of the average performance throughout all time budgets.

* **Usability:** this aspect of the comparison refers to how easy it is to set up the search for each one of the tested tools. Also important are the outputs of the search process, mainly in terms of the visualization and assessment of constructed and selected pipelines. Besides, the diversity of produced information about the search and how clear it is to access and interpret these data are also an aspect to have in mind. Finally, the more straightforward it is to use a selected pipeline the better is the tool.

The empirical tests follow the reading and discussing of the APIs of all selected tools. So, since the main initialization arguments, methods and attributes have been defined, they will be used accordingly in these notebooks.

The data used for the empirical tests was found in Kaggle repository of datasets. It consists of a dataset for binary classification whose objective is to construct a classification algorithm for the [identification of malware apps](https://www.kaggle.com/saurabhshahane/android-permission-dataset). It has 27310 unique instances (mobile phone applications) and 184 variables, among which one is the binary outcome variable and another is the name of the app. Since the main objective of this project is to explore AutoML tools, only some basic feature engineering operations were implemented, besides of a short description and exploration of the data.

------------

**Summary:**
1. [Libraries](#libraries)<a href='#libraries'></a>.
2. [Functions and classes](#functions_classes)<a href='#functions_classes'></a>.
3. [Settings](#settings)<a href='#settings'></a>.
4. [Importing datasets](#imports)<a href='#imports'></a>.
    * [Features and outcome variables](#feats_outcomes)<a href='#feats_outcomes'></a>.


5. [Data description](#data_description)<a href='#data_description'></a>.
6. [ML pipeline](#ml_pipeline)<a href='#ml_pipeline'></a>.

<a id='libraries'></a>

## Libraries

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
cd "/content/gdrive/MyDrive/Studies/autoML/Codes"

/content/gdrive/MyDrive/Studies/autoML/Codes


In [3]:
# pip install mljar-supervised

In [4]:
# pip install -r requirements.txt

In [5]:
# pip uninstall scikit-learn

In [6]:
# pip install scikit-learn==0.24.2

In [8]:
import pandas as pd
import numpy as np
import os
import json
from datetime import datetime
from time import time
import pickle
from IPython.display import Markdown, display

from sklearn.metrics import roc_auc_score, accuracy_score, average_precision_score, brier_score_loss, accuracy_score, matthews_corrcoef

# pip install mljar-supervised
import supervised
from supervised import AutoML # MLJAR
print(f'MLJAR: version {supervised.__version__}.')

pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.


MLJAR: version 0.11.0.


<a id='functions_classes'></a>

## Functions and classes

In [9]:
from utils import running_time, correct_col_name, train_test_split
from pre_process import pre_process

<a id='settings'></a>

## Settings

### Data management

In [10]:
# Identification of the test:
estimation_id = str(int(time()))

# Declare whether to export results:
export = True

### ML pipeline search

#### Search complexity parameters

In [11]:
# Modes for the AutoML implementation:
# modes = ['Compete', 'Optuna']
modes = ['Optuna']

# Time budget (seconds) for the search:
total_time_limit = 6*60*60

# Time budget (seconds) for each developed model:
model_time_limit = None

# Time budget (seconds) for the Optuna hyperparameters tuning:
optuna_time_budget = int(total_time_limit/14)

#### Estimation parameters

In [12]:
# Whether an ensemble and stack of models should be constructed from the collection of tested models:
train_ensemble, stack_models = (True, True)

# Performance metric that should be optimized during the search:
eval_metric = 'auc'

# Dictionary with the validation strategy and its parameters:
validation_strategy = {'validation_type': 'kfold', 'k_folds': 5, 'shuffle': True, 'stratify': True}

# Level of explanations to produced for the final model:
explain_level = 2

#### Computation parameters

In [13]:
# Number of processes to use in parallel:
n_jobs = -1

<a id='imports'></a>

## Importing datasets

<a id='feats_outcomes'></a>

### Features and outcome variables

In [14]:
# Importing data:
df = pd.read_csv('../Datasets/Android_Permission.csv')

# Columns names:
df.columns = [correct_col_name(c) for c in df.columns]

# Auxiliary variables:
drop_vars = ['app', 'package', 'class']

# Removing duplicates:
df.drop_duplicates(inplace=True)

print(f'Shape of data: {df.shape}.')
df.head(3)

Shape of data: (27310, 184).


Unnamed: 0,app,package,category,description,rating,number_of_ratings,price,related_apps,dangerous_permissions_count,safe_permissions_count,access_drm_content_,access_email_provider_data,access_all_system_downloads,access_download_manager_,advanced_download_manager_functions_,audio_file_access,install_drm_content_,modify_google_service_configuration,modify_google_settings,move_application_resources,read_google_settings,send_download_notifications_,voice_search_shortcuts,access_surfaceflinger,access_checkin_properties,access_the_cache_filesystem,access_to_passwords_for_google_accounts,act_as_an_account_authenticator,bind_to_a_wallpaper,bind_to_an_input_method,change_screen_orientation,coarse,control_location_update_notifications,control_system_backup_and_restore,delete_applications,delete_other_applications_caches,delete_other_applications_data,directly_call_any_phone_numbers,directly_install_applications,disable_or_modify_status_bar,...,your_accounts_access_other_google_services,your_accounts_act_as_an_account_authenticator,your_accounts_act_as_the_accountmanagerservice,your_accounts_contacts_data_in_google_accounts,your_accounts_discover_known_accounts,your_accounts_manage_the_accounts_list,your_accounts_read_google_service_configuration,your_accounts_use_the_authentication_credentials_of_an_account,your_accounts_view_configured_accounts,your_location_access_extra_location_provider_commands,your_location_coarse,your_location_fine,your_location_mock_location_sources_for_testing,your_messages_read_email_attachments,your_messages_send_gmail,your_messages_edit_sms_or_mms,your_messages_modify_gmail,your_messages_read_gmail,your_messages_read_gmail_attachment_previews,your_messages_read_sms_or_mms,your_messages_read_instant_messages,your_messages_receive_mms,your_messages_receive_sms,your_messages_receive_wap,your_messages_send_sms_received_broadcast,your_messages_send_wap_push_received_broadcast,your_messages_write_instant_messages,your_personal_information_add_or_modify_calendar_events_and_send_email_to_guests,your_personal_information_choose_widgets,your_personal_information_read_browsers_history_and_bookmarks,your_personal_information_read_calendar_events,your_personal_information_read_contact_data,your_personal_information_read_sensitive_log_data,your_personal_information_read_user_defined_dictionary,your_personal_information_retrieve_system_internal_state,your_personal_information_set_alarm_in_alarm_clock,your_personal_information_write_browsers_history_and_bookmarks,your_personal_information_write_contact_data,your_personal_information_write_to_user_defined_dictionary,class
0,Canada Post Corporation,com.canadapost.android,Business,Canada Post Mobile App gives you access to som...,3.1,77,0.0,"{com.adaffix.pub.ca.android, com.kevinquan.gas...",7.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0
1,Word Farm,com.realcasualgames.words,Brain & Puzzle,Speed and strategy combine in this exciting wo...,4.3,199,0.0,"{air.com.zubawing.FastWordLite, com.joybits.do...",3.0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Fortunes of War FREE,fortunesofwar.free,Cards & Casino,"Fortunes of War is a fast-paced, easy to learn...",4.1,243,0.0,"{com.kevinquan.condado, hu.monsta.pazaak, net....",1.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Features names

In [15]:
features_names = list(df.drop(drop_vars, axis=1).columns)

for i in range(len(features_names)):
  print(features_names[i:i+10], '\n')

['category', 'description', 'rating', 'number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data'] 

['description', 'rating', 'number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data', 'access_all_system_downloads'] 

['rating', 'number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data', 'access_all_system_downloads', 'access_download_manager_'] 

['number_of_ratings', 'price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_', 'access_email_provider_data', 'access_all_system_downloads', 'access_download_manager_', 'advanced_download_manager_functions_'] 

['price', 'related_apps', 'dangerous_permissions_count', 'safe_permissions_count', 'access_drm_content_

#### Data types

In [16]:
data_types = pd.DataFrame(df.dtypes, columns=['type']).reset_index(drop=False)
data_types.columns = ['feature', 'type']

print('\033[1mDistribution of data types:\033[0m')
print(data_types.type.value_counts())

[1mDistribution of data types:[0m
int64      176
object       5
float64      3
Name: type, dtype: int64


#### Train-test split

In [17]:
df_train, df_test = train_test_split(df, test_ratio=0.25, shuffle=True)

#### Feature engineering

Related apps

In [18]:
# Creating the variable with the number of related apps:
df_train['related_apps'] = df_train['related_apps'].apply(lambda x: x if pd.isna(x) else x.replace('{', '').replace('}', ''))
df_train['num_related_apps'] = df_train.related_apps.apply(lambda x: x if pd.isna(x) else len(x.split(',')))
df_test['num_related_apps'] = df_test.related_apps.apply(lambda x: x if pd.isna(x) else len(x.split(',')))

# Updating the list of auxiliary variables:
drop_vars.append('related_apps')

Description

In [19]:
# Creating the variable that indicates the number of words in a description:
df_train['num_words_desc'] = df_train.description.apply(lambda x: x if pd.isna(x) else len(x.split(' ')))
df_test['num_words_desc'] = df_test.description.apply(lambda x: x if pd.isna(x) else len(x.split(' ')))

# Updating the list of auxiliary variables:
drop_vars.append('description')

Category

Even though this feature engineering is actually a transformation applied over categorical features, we first implement one-hot encoding in order to translate this categorical attribute into a numerical one, since some AutoML tools explored within this project do not allow textual inputs.

In [20]:
from transformations import applying_one_hot

In [21]:
transf_data = applying_one_hot(training_data=df_train, cat_vars=['category'], variance_param=-1, test_data=df_test)
df_train = transf_data['training_data']
df_test = transf_data['test_data']

[1mNumber of categorical features:[0m 1
[1mNumber of overall selected dummies:[0m 30.


<a id='data_description'></a>

## Data description

<a id='features_types'></a>

### Features types

In [22]:
feature_types = pd.DataFrame(df_train.drop(drop_vars, axis=1).dtypes, columns=['type']).reset_index(drop=False)
feature_types.columns = ['feature', 'type']

print('\033[1mDistribution of data types (features):\033[0m')
print(feature_types.type.value_counts())

[1mDistribution of data types (features):[0m
int64      175
uint8       30
float64      5
Name: type, dtype: int64


<a id='ml_pipeline'></a>

## ML pipeline

In [23]:
help(AutoML)

Help on class AutoML in module supervised.automl:

class AutoML(supervised.base_automl.BaseAutoML)
 |  AutoML(results_path=None, total_time_limit=3600, mode='Explain', ml_task='auto', model_time_limit=None, algorithms='auto', train_ensemble=True, stack_models='auto', eval_metric='auto', validation_strategy='auto', explain_level='auto', golden_features='auto', features_selection='auto', start_random_models='auto', hill_climbing_steps='auto', top_models_to_improve='auto', boost_on_errors='auto', kmeans_features='auto', mix_encoding='auto', max_single_prediction_time=None, optuna_time_budget=None, optuna_init_params={}, optuna_verbose=True, n_jobs=-1, verbose=1, random_state=1234)
 |  
 |  Automated Machine Learning for supervised tasks (binary classification, multiclass classification, regression).
 |  
 |  Method resolution order:
 |      AutoML
 |      supervised.base_automl.BaseAutoML
 |      sklearn.base.BaseEstimator
 |      abc.ABC
 |      builtins.object
 |  
 |  Methods defined h

<a id='ml_pipeline_search'></a>

### ML pipeline search

#### Setting and running the search

In [24]:
# Dictionary with the model of each implemented AutoML mode:
models = {}
search_times = {}

for mode in modes:
  # Path to the folder where outcomes should be placed:
  results_path = f'../Datasets/Outcomes/mljar/{mode}/{estimation_id}'

  # Creating the AutoML object:
  models[mode] = AutoML(
      # Search complexity parameters:
      mode=mode, total_time_limit=total_time_limit, model_time_limit=model_time_limit, optuna_time_budget=optuna_time_budget,
      
      # Estimation parameters:
      train_ensemble=train_ensemble, stack_models=stack_models, eval_metric=eval_metric, validation_strategy=validation_strategy,

      # Computation parameters:
      results_path=results_path, n_jobs=n_jobs,
                  )

  start_time = datetime.now()

  # Running the search:
  models[mode].fit(df_train.drop(drop_vars, axis=1), df_train['class'])

  # Total elapsed time:
  end_time = datetime.now()
  search_times[mode] = running_time(start_time=start_time, end_time=end_time)

AutoML directory: ../Datasets/Outcomes/mljar/Optuna/1631290500
Expected computing time:
Time for tuning with Optuna: len(algorithms) * optuna_time_budget = 9252 seconds
There is no time limit for ML model training after Optuna tuning (total_time_limit parameter is ignored).
The task is binary_classification with evaluation metric auc
AutoML will use algorithms: ['Random Forest', 'Extra Trees', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network']
AutoML will stack models
AutoML will ensemble available models
AutoML steps: ['simple_algorithms', 'default_algorithms', 'ensemble', 'stack', 'ensemble_stacked']
Skip simple_algorithms because no parameters were generated.
* Step default_algorithms will try to check up to 6 models


[32m[I 2021-09-10 16:15:24,606][0m A new study created in memory with name: no-name-06c5079c-f627-440e-b3d7-75cb13ea6825[0m


Optuna optimizes LightGBM with time budget 1542 seconds eval_metric auc (maximize)


[32m[I 2021-09-10 16:15:25,705][0m Trial 0 finished with value: 0.894486676158713 and parameters: {'learning_rate': 0.1, 'num_leaves': 1598, 'lambda_l1': 2.840098794801191e-06, 'lambda_l2': 3.0773599420974e-06, 'feature_fraction': 0.8613105322932351, 'bagging_fraction': 0.970697557159987, 'bagging_freq': 7, 'min_data_in_leaf': 36, 'extra_trees': False}. Best is trial 0 with value: 0.894486676158713.[0m
[32m[I 2021-09-10 16:15:29,411][0m Trial 1 finished with value: 0.9000389044757905 and parameters: {'learning_rate': 0.0125, 'num_leaves': 30, 'lambda_l1': 0.09024841733204539, 'lambda_l2': 0.8785585624049705, 'feature_fraction': 0.5554201923798203, 'bagging_fraction': 0.7307773310574073, 'bagging_freq': 1, 'min_data_in_leaf': 37, 'extra_trees': True}. Best is trial 1 with value: 0.9000389044757905.[0m
[32m[I 2021-09-10 16:15:32,516][0m Trial 2 finished with value: 0.8977490290023987 and parameters: {'learning_rate': 0.025, 'num_leaves': 1781, 'lambda_l1': 8.42482357544477e-05, '

1_Optuna_LightGBM auc 0.900586 trained in 15.12 seconds


[32m[I 2021-09-10 16:43:11,244][0m A new study created in memory with name: no-name-2ab5de7a-ead7-4242-a8c0-f52397e5b541[0m


Optuna optimizes Xgboost with time budget 1542 seconds eval_metric auc (maximize)



ntree_limit is deprecated, use `iteration_range` or model slicing instead.

[32m[I 2021-09-10 16:43:14,813][0m Trial 0 finished with value: 0.8673522840640164 and parameters: {'eta': 0.1, 'max_depth': 10, 'lambda': 2.840098794801191e-06, 'alpha': 3.0773599420974e-06, 'colsample_bytree': 0.8613105322932351, 'subsample': 0.970697557159987, 'min_child_weight': 88}. Best is trial 0 with value: 0.8673522840640164.[0m

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

[32m[I 2021-09-10 16:43:19,117][0m Trial 1 finished with value: 0.8602506029386601 and parameters: {'eta': 0.1, 'max_depth': 6, 'lambda': 0.0011239983523033718, 'alpha': 0.0003370920325799477, 'colsample_bytree': 0.30963791485116204, 'subsample': 0.8409786428569279, 'min_child_weight': 89}. Best is trial 0 with value: 0.8673522840640164.[0m

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

[32m[I 2021-09-10 16:43:25,694][0m Trial 2 finished with value: 0.857516393151

2_Optuna_Xgboost auc 0.902536 trained in 33.06 seconds


[32m[I 2021-09-10 17:09:43,183][0m A new study created in memory with name: no-name-1d370055-b159-46ea-a50e-4b7e75a9beb7[0m


Optuna optimizes CatBoost with time budget 1542 seconds eval_metric auc (maximize)


[32m[I 2021-09-10 17:09:49,258][0m Trial 0 finished with value: 0.9029648116280798 and parameters: {'learning_rate': 0.1, 'depth': 8, 'l2_leaf_reg': 7.7997800836072235, 'random_strength': 2.7259260601004898, 'rsm': 0.34881782962878705, 'min_data_in_leaf': 81}. Best is trial 0 with value: 0.9029648116280798.[0m
[32m[I 2021-09-10 17:09:56,266][0m Trial 1 finished with value: 0.9036843830008428 and parameters: {'learning_rate': 0.05, 'depth': 6, 'l2_leaf_reg': 6.834661005427845, 'random_strength': 7.127020272701981, 'rsm': 0.43322567931135547, 'min_data_in_leaf': 57}. Best is trial 1 with value: 0.9036843830008428.[0m
[32m[I 2021-09-10 17:10:01,704][0m Trial 2 finished with value: 0.900070921298882 and parameters: {'learning_rate': 0.2, 'depth': 9, 'l2_leaf_reg': 3.648923350415333, 'random_strength': 6.1539617881809745, 'rsm': 0.16784311747867892, 'min_data_in_leaf': 37}. Best is trial 1 with value: 0.9036843830008428.[0m
[32m[I 2021-09-10 17:10:10,279][0m Trial 3 finished with

3_Optuna_CatBoost auc 0.903371 trained in 20.79 seconds


[32m[I 2021-09-10 17:36:19,999][0m A new study created in memory with name: no-name-7ee7ecd0-87fd-4a31-8097-7a6a337c4cfc[0m


Optuna optimizes Neural Network with time budget 1542 seconds eval_metric auc (maximize)


[32m[I 2021-09-10 17:36:30,119][0m Trial 0 finished with value: 0.8746322101078455 and parameters: {'dense_1_size': 22, 'dense_2_size': 63, 'learning_rate': 0.01, 'learning_rate_type': 'adaptive', 'alpha': 0.7645285581846926}. Best is trial 0 with value: 0.8746322101078455.[0m
[32m[I 2021-09-10 17:36:36,163][0m Trial 1 finished with value: 0.868313999476969 and parameters: {'dense_1_size': 38, 'dense_2_size': 51, 'learning_rate': 0.01, 'learning_rate_type': 'adaptive', 'alpha': 0.8785585624049705}. Best is trial 0 with value: 0.8746322101078455.[0m
[32m[I 2021-09-10 17:36:43,042][0m Trial 2 finished with value: 0.8670040002195438 and parameters: {'dense_1_size': 39, 'dense_2_size': 62, 'learning_rate': 0.05, 'learning_rate_type': 'constant', 'alpha': 0.0012968444078239212}. Best is trial 0 with value: 0.8746322101078455.[0m
[32m[I 2021-09-10 17:37:02,535][0m Trial 3 finished with value: 0.8707833305890342 and parameters: {'dense_1_size': 88, 'dense_2_size': 45, 'learning_rat

4_Optuna_NeuralNetwork auc 0.869787 trained in 55.89 seconds


[32m[I 2021-09-10 18:03:14,197][0m A new study created in memory with name: no-name-b37a48d6-45bf-4563-9275-b336c99a70d1[0m


Optuna optimizes Random Forest with time budget 1542 seconds eval_metric auc (maximize)


[32m[I 2021-09-10 18:03:36,920][0m Trial 0 finished with value: 0.8692665672261808 and parameters: {'criterion': 'entropy', 'max_depth': 15, 'min_samples_split': 79, 'min_samples_leaf': 78, 'max_features': 0.27986667922981523}. Best is trial 0 with value: 0.8692665672261808.[0m
[32m[I 2021-09-10 18:04:19,869][0m Trial 1 finished with value: 0.8864468214020893 and parameters: {'criterion': 'entropy', 'max_depth': 31, 'min_samples_split': 88, 'min_samples_leaf': 36, 'max_features': 0.5059851742682241}. Best is trial 1 with value: 0.8864468214020893.[0m
[32m[I 2021-09-10 18:04:24,846][0m Trial 2 finished with value: 0.8650917349243812 and parameters: {'criterion': 'entropy', 'max_depth': 13, 'min_samples_split': 57, 'min_samples_leaf': 51, 'max_features': 0.023630765094775418}. Best is trial 1 with value: 0.8864468214020893.[0m
[32m[I 2021-09-10 18:04:55,584][0m Trial 3 finished with value: 0.8885423090209955 and parameters: {'criterion': 'entropy', 'max_depth': 13, 'min_sample

5_Optuna_RandomForest auc 0.896113 trained in 88.5 seconds


[32m[I 2021-09-10 18:31:27,803][0m A new study created in memory with name: no-name-183a91bc-6038-4c3d-b494-e4db62d4f5c5[0m


Optuna optimizes Extra Trees with time budget 1542 seconds eval_metric auc (maximize)


[32m[I 2021-09-10 18:31:54,928][0m Trial 0 finished with value: 0.8414419300924775 and parameters: {'criterion': 'entropy', 'max_depth': 15, 'min_samples_split': 79, 'min_samples_leaf': 78, 'max_features': 0.27986667922981523}. Best is trial 0 with value: 0.8414419300924775.[0m
[32m[I 2021-09-10 18:32:56,732][0m Trial 1 finished with value: 0.8691761667845102 and parameters: {'criterion': 'entropy', 'max_depth': 31, 'min_samples_split': 88, 'min_samples_leaf': 36, 'max_features': 0.5059851742682241}. Best is trial 1 with value: 0.8691761667845102.[0m
[32m[I 2021-09-10 18:33:00,906][0m Trial 2 finished with value: 0.8025043612832128 and parameters: {'criterion': 'entropy', 'max_depth': 13, 'min_samples_split': 57, 'min_samples_leaf': 51, 'max_features': 0.023630765094775418}. Best is trial 1 with value: 0.8691761667845102.[0m
[32m[I 2021-09-10 18:33:34,353][0m Trial 3 finished with value: 0.8513702393136453 and parameters: {'criterion': 'entropy', 'max_depth': 13, 'min_sample

6_Optuna_ExtraTrees auc 0.879138 trained in 103.83 seconds
* Step ensemble will try to check up to 1 model
Ensemble auc 0.904018 trained in 4.68 seconds
* Step stack will try to check up to 6 models
3_Optuna_CatBoost_Stacked auc 0.904975 trained in 15.49 seconds
2_Optuna_Xgboost_Stacked auc 0.896752 trained in 15.52 seconds
1_Optuna_LightGBM_Stacked auc 0.904322 trained in 12.52 seconds
5_Optuna_RandomForest_Stacked auc 0.905996 trained in 107.08 seconds
6_Optuna_ExtraTrees_Stacked auc 0.904128 trained in 109.33 seconds
4_Optuna_NeuralNetwork_Stacked auc 0.897783 trained in 48.93 seconds
* Step ensemble_stacked will try to check up to 1 model
Ensemble_Stacked auc 0.90659 trained in 6.75 seconds
AutoML fit time: 10192.69 seconds
AutoML best model: Ensemble_Stacked
------------------------------------
[1mRunning time:[0m 169.89 minutes.
Start time: 2021-09-10, 16:15:22
End time: 2021-09-10, 19:05:16
------------------------------------


### Assessing the outcomes

In [25]:
# compete_model = models['Compete']
optuna_model = models['Optuna']

#### ML pipeline

Constructed pipelines (Compete mode)

In [26]:
# Dataframe with tested models:
compete_leaderboard = compete_model.get_leaderboard()

print(f'Number of tested pipelines: {len(compete_leaderboard)}.')
compete_leaderboard.sort_values('metric_value', ascending=True)

In [None]:
# Information about tested models and the best one:
compete_model.report()

Final pipeline (Compete mode)

In [None]:
# Best model:
best_model_compete = compete_leaderboard.sort_values('metric_value', ascending=True)['name'].iloc[0]
display(Markdown(f'../Datasets/Outcomes/mljar/Compete/{estimation_id}/{best_model_compete}/README.md'))

Constructed pipelines (Optuna mode)

In [27]:
# Dataframe with tested models:
optuna_leaderboard = optuna_model.get_leaderboard()

print(f'Number of tested pipelines: {len(optuna_leaderboard)}.')
optuna_leaderboard

Number of tested pipelines: 14.


Unnamed: 0,name,model_type,metric_type,metric_value,train_time
0,1_Optuna_LightGBM,LightGBM,auc,-0.900586,16.67
1,2_Optuna_Xgboost,Xgboost,auc,-0.902536,34.46
2,3_Optuna_CatBoost,CatBoost,auc,-0.903371,22.27
3,4_Optuna_NeuralNetwork,Neural Network,auc,-0.869787,57.4
4,5_Optuna_RandomForest,Random Forest,auc,-0.896113,90.03
5,6_Optuna_ExtraTrees,Extra Trees,auc,-0.879138,105.29
6,Ensemble,Ensemble,auc,-0.904018,4.68
7,3_Optuna_CatBoost_Stacked,CatBoost,auc,-0.904975,17.02
8,2_Optuna_Xgboost_Stacked,Xgboost,auc,-0.896752,17.08
9,1_Optuna_LightGBM_Stacked,LightGBM,auc,-0.904322,13.96


In [28]:
# Information about tested models and the best one:
optuna_model.report()

Final pipeline (Optuna mode)

In [29]:
# Best model:
best_model_optuna = optuna_leaderboard.sort_values('metric_value', ascending=True)['name'].iloc[0]
display(Markdown(f'../Datasets/Outcomes/mljar/Optuna/{estimation_id}/{best_model_compete}/README.md'))

NameError: ignored

#### Model evaluation

In [30]:
perf_metrics = {}

for model in models:
  # Predictions for hold-out data:
  y_hat = [p[1] for p in models[model].predict_proba(df_test.drop(drop_vars, axis=1))]

  # Performance metrics of the best model:
  perf_metrics[model] = {
      f'test_roc_auc': roc_auc_score(df_test['class'], y_hat),
      f'test_avg_prec': average_precision_score(df_test['class'], y_hat),
      f'test_brier': brier_score_loss(df_test['class'], y_hat),
      f'test_acc': accuracy_score(df_test['class'], [1 if p > 0.5 else 0 for p in y_hat]),
      f'test_mcc': matthews_corrcoef(df_test['class'], [1 if p > 0.5 else 0 for p in y_hat])
  }

  print(f'{models[model].mode} mode:\n')
  print(f'Test ROC-AUC: {perf_metrics[model]["test_roc_auc"]:.4f}.')
  print(f'Test average-precision score: {perf_metrics[model]["test_avg_prec"]:.4f}.')
  print(f'Test Brier score: {perf_metrics[model]["test_brier"]:.4f}.')
  print(f'Test accuracy: {perf_metrics[model]["test_acc"]:.4f}.')
  print(f'Test MCC: {perf_metrics[model]["test_mcc"]:.4f}.\n')

Optuna mode:

Test ROC-AUC: 0.9109.
Test average-precision score: 0.9587.
Test Brier score: 0.1145.
Test accuracy: 0.8242.
Test MCC: 0.6266.



In [31]:
# Accuracy evaluated on test data:
for model in models:
  print(f'{mode} mode:')
  print(f'Test accuracy: {models[model].score(df_test.drop(drop_vars, axis=1), df_test["class"]):.4f}.')

Optuna mode:
Test accuracy: 0.8226.


#### Model assessment

In [32]:
model_assess = {
    "estimation_id": str(estimation_id),
    "autoML": "mljar",
    "parameters": {
      "search_complexity": {
        "modes": modes, "total_time_limit": total_time_limit, "model_time_limit": model_time_limit, "optuna_time_budget": optuna_time_budget
      },
      "estimation": {
        "train_ensemble": train_ensemble, "stack_models": stack_models, "eval_metric": eval_metric, "validation_strategy": validation_strategy,
        "explain_level": explain_level
      },
      "computation": {
        "n_jobs": n_jobs
      }
    },
    "running_time": search_times,
    "performance_metrics": perf_metrics
}

### Exporting the outcomes

In [33]:
if export:
  # Constructed ML pipelines:
  # compete_leaderboard.to_csv(f'../Datasets/Outcomes/mljar/compete_leaderboard_{estimation_id}.csv', index=False)
  optuna_leaderboard.to_csv(f'../Datasets/Outcomes/mljar/optuna_leaderboard_{estimation_id}.csv', index=False)

  # ML pipeline:
  # for mode in models:
    # pickle.dump(models[mode], open(f'../Datasets/Outcomes/mljar/model_{mode}_{estimation_id}.pickle', 'wb'))

  # Model assessment:
  with open(f'../Datasets/Outcomes/mljar/model_assess_{estimation_id}.json', 'w') as json_file:
    json.dump(model_assess, json_file, indent=2)