## End-to-end machine learning application
## Data engineering

This project aims to integrate different aspects of a machine learning system, thus developing an end-to-end ML project. The final product is an app (hypothetically called *AppSafe*) composed of a model that calculates the risk of a mobile app being a malware and an API that could integrate with an app store and with the user by sending him/her a warning message when the mobile app that is about to be downloaded is too risky.

The project follows the traditional [CRISP-DM](https://pt.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining) methodology, so these are the main stages that make the core of the project: data engineering, data preparation, data modeling, and deployment.

-----------

This notebook imports all relevant libraries, custom functions and classes, and the data in order to understand it and to make it ready for being pre-processed and finally modeled so a binary classifier can be constructed to predict whether a given mobile app is a malware.

Consequently, this notebook has a section of [data understanding and cleaning](#data_und_clean)<a href='#data_und_clean'></a>, where the data type of all variables is defined together with their domains, as the quantity and samples of unique values are collected. Next, the number of missing values is assessed and features are classified according to their data types and following an empirical classification that reflects how a mobile app works.

An [exploratory data analysis](#eda)<a href='#eda'></a> section extracts insights from data by calculating several distributions regarding input variables and the binary target. The same sort of statistics are applied after new features are created from original data in a [feature engineering](#feat_eng)<a href='#feat_eng'></a> section.

By the end of this notebook, data is ready to be processed in the appropriate way so the designed solution can be developed. As such, data engineering can be seen here as a collection of ETL (extract, transform and load) operations, except for data understanding and EDA tasks, which have an analytical perspective.

**Summary:**
1. [Libraries](#libraries)<a href='#libraries'></a>.
2. [Functions and classes](#functions_classes)<a href='#functions_classes'></a>.
3. [Settings](#settings)<a href='#settings'></a>.
4. [Data imports](#data_imports)<a href='#data_imports'></a>.
  * [Features and labels](#features_labels)<a href='#features_labels'></a>.

5. [Data understanding and cleaning](#data_und_clean)<a href='#data_und_clean'></a>.
  * [Data types](#data_types)<a href='#data_types'></a>.
  * [Unique values](#unique_values)<a href='#unique_values'></a>.
  * [Missings](#missings)<a href='#missings'></a>.
  * [Features](#features)<a href='#features'></a>.
  * [Data cleaning](#data_cleaning)<a href='#data_cleaning'></a>.


6. [Exploratory data analysis](#eda)<a href='#eda'></a>.
  * [Distribution of labels (P(Y))](#dist_y)<a href='#dist_y'></a>.
  * [Distribution of covariates (P(X))](#dist_x)<a href='#dist_x'></a>.
  * [Distribution of covariates conditional on labels (P(X|Y))](#dist_x_y)<a href='#dist_x_y'></a>.
  * [Distribution of labels conditional on covariates (P(Y|X))](#dist_y_x)<a href='#dist_y_x'></a>.
  

7. [Feature engineering](#feat_eng)<a href='#feat_eng'></a>.
  * [Number of related apps](#num_related_apps)<a href='#num_related_apps'></a>.
  * [Number of words in description](#num_words_desc)<a href='#num_words_desc'></a>.
  * [Share of malware related apps](#share_malware_related_apps)<a href='#share_malware_related_apps'></a>.
  * [Natural language processing](#nlp)<a href='#nlp'></a>.
  * [Exporting training and test data](#export_data)<a href='#export_data'></a>.

<a id='libraries'></a>

## Libraries





In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
cd "/content/gdrive/MyDrive/Studies/end_to_end_ml/notebooks/"

/content/gdrive/MyDrive/Studies/end_to_end_ml/model_dev


In [None]:
# !pip install -r ../requirements.txt

In [None]:
import pandas as pd
import numpy as np
import os
import json
from datetime import datetime
import time

In [None]:
import sys

sys.path.append(
    os.path.abspath(
        os.path.join(
            os.path.dirname(__doc__), '../src'
        )
    )
)

<a id='functions_classes'></a>

## Functions and classes

In [None]:
from utils import train_test_split, correct_col_name
from kfolds import Kfolds_fit
from feat_eng import known_related_apps, related_malwares
from data_vis import plot_bar

<a id='settings'></a>

## Settings

In [None]:
# Declare whether outcomes should be exported:
EXPORT = False

<a id='data_imports'></a>

## Data imports

<a id='features_labels'></a>

### Features and labels

In [None]:
df_train = pd.read_csv('../data/Android_Permission.csv')

# Columns names:
df_train.columns = [correct_col_name(c) for c in df_train.columns]

print(f'Shape of df: {df_train.shape}.')

# Removing duplicates:
df_train.drop_duplicates(inplace=True)
print(f'Number of instances after removing duplicates: {len(df_train)}.')

# Creating an id variable for each app:
df_train['app_id'] = [i+1 for i in range(len(df_train))]

# Missings in the response variable:
if df_train['class'].isnull().sum() > 0:
  print('There are missings in the response variable!')

# Missings in the primary key:
if df_train['app_id'].isnull().sum() > 0:
  print('There are missings in the primary key!')

# Auxiliary variables:
drop_vars = ['app', 'package', 'class', 'app_id']

df_train.head(3)

Shape of df: (29999, 184).
Number of instances after removing duplicates: 27310.


Unnamed: 0,app,package,category,description,rating,number_of_ratings,price,related_apps,dangerous_permissions_count,safe_permissions_count,access_drm_content_,access_email_provider_data,access_all_system_downloads,access_download_manager_,advanced_download_manager_functions_,audio_file_access,install_drm_content_,modify_google_service_configuration,modify_google_settings,move_application_resources,read_google_settings,send_download_notifications_,voice_search_shortcuts,access_surfaceflinger,access_checkin_properties,access_the_cache_filesystem,access_to_passwords_for_google_accounts,act_as_an_account_authenticator,bind_to_a_wallpaper,bind_to_an_input_method,change_screen_orientation,coarse,control_location_update_notifications,control_system_backup_and_restore,delete_applications,delete_other_applications_caches,delete_other_applications_data,directly_call_any_phone_numbers,directly_install_applications,disable_or_modify_status_bar,...,your_accounts_act_as_an_account_authenticator,your_accounts_act_as_the_accountmanagerservice,your_accounts_contacts_data_in_google_accounts,your_accounts_discover_known_accounts,your_accounts_manage_the_accounts_list,your_accounts_read_google_service_configuration,your_accounts_use_the_authentication_credentials_of_an_account,your_accounts_view_configured_accounts,your_location_access_extra_location_provider_commands,your_location_coarse,your_location_fine,your_location_mock_location_sources_for_testing,your_messages_read_email_attachments,your_messages_send_gmail,your_messages_edit_sms_or_mms,your_messages_modify_gmail,your_messages_read_gmail,your_messages_read_gmail_attachment_previews,your_messages_read_sms_or_mms,your_messages_read_instant_messages,your_messages_receive_mms,your_messages_receive_sms,your_messages_receive_wap,your_messages_send_sms_received_broadcast,your_messages_send_wap_push_received_broadcast,your_messages_write_instant_messages,your_personal_information_add_or_modify_calendar_events_and_send_email_to_guests,your_personal_information_choose_widgets,your_personal_information_read_browsers_history_and_bookmarks,your_personal_information_read_calendar_events,your_personal_information_read_contact_data,your_personal_information_read_sensitive_log_data,your_personal_information_read_user_defined_dictionary,your_personal_information_retrieve_system_internal_state,your_personal_information_set_alarm_in_alarm_clock,your_personal_information_write_browsers_history_and_bookmarks,your_personal_information_write_contact_data,your_personal_information_write_to_user_defined_dictionary,class,app_id
0,Canada Post Corporation,com.canadapost.android,Business,Canada Post Mobile App gives you access to som...,3.1,77,0.0,"{com.adaffix.pub.ca.android, com.kevinquan.gas...",7.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1
1,Word Farm,com.realcasualgames.words,Brain & Puzzle,Speed and strategy combine in this exciting wo...,4.3,199,0.0,"{air.com.zubawing.FastWordLite, com.joybits.do...",3.0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
2,Fortunes of War FREE,fortunesofwar.free,Cards & Casino,"Fortunes of War is a fast-paced, easy to learn...",4.1,243,0.0,"{com.kevinquan.condado, hu.monsta.pazaak, net....",1.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3


<a id='data_und_clean'>

## Data understanding and cleaning

<a id='data_types'></a>

### Data types

In [None]:
# Dataframe with data types:
data_types = pd.DataFrame(df_train.dtypes, columns=['type']).reset_index(drop=False)
data_types.columns = ['feature', 'type']

# Dictionary with data types:
data_types_dict = dict(zip(data_types['feature'], data_types['type']))

print('\033[1mDistribution of data types:\033[0m')
print(data_types.type.value_counts())

[1mDistribution of data types:[0m
int64      177
object       5
float64      3
Name: type, dtype: int64


<a id='unique_values'></a>

### Unique values

In [None]:
n_unique_df = pd.DataFrame(data={
    'feature': [c for c in df_train.columns],
    'n_unique': [df_train[c].nunique() for c in df_train.columns]
}).sort_values('n_unique', ascending=False)

n_unique_df['sample_values'] = n_unique_df.feature.apply(lambda x: list(df_train[x].unique()) if len(list(df_train[x].unique())) <= 10 else
                                                         np.random.choice(list(df_train[x].unique()), size=10, replace=False))

n_unique_df.head(10)

Unnamed: 0,feature,n_unique,sample_values
184,app_id,27310,"[8277, 14107, 3219, 5482, 3626, 2618, 95, 1996..."
7,related_apps,23868,"[{com.swarcon.cumin.full, com.skycomuk.android..."
3,description,23552,[Rocket dialer - the most professional dialer ...
1,package,23485,"[org.hou.qoutes.love, com.untappdllc.app, elet..."
0,app,22823,"[IHideUFind-Colors, Crazy Orchid HD Wallpaper,..."
5,number_of_ratings,5312,"[15835, 25371, 1612, 2962, 1438180, 73905, 192..."
6,price,425,"[3.89, 14.75, 4.69, 5.58, 4.82, 1.02, 2.44, 3...."
4,rating,42,"[4.4, 2.2, 2.6, 2.7, 2.4, 4.6, 1.9, 4.2, 4.9, ..."
2,category,30,"[Lifestyle, Sports Games, Libraries & Demo, Ca..."
8,dangerous_permissions_count,28,"[23.0, 22.0, 12.0, 11.0, 10.0, 15.0, 18.0, 7.0..."


#### Unique values of the primary key

In [None]:
print(f'Number of rows: {len(df_train)}.')
n_unique_df[n_unique_df.feature=='app_id']

Number of rows: 27310.


Unnamed: 0,feature,n_unique,sample_values
184,app_id,27310,"[8277, 14107, 3219, 5482, 3626, 2618, 95, 1996..."


#### Unique values of the response variable

In [None]:
n_unique_df[n_unique_df.feature=='class']

Unnamed: 0,feature,n_unique,sample_values
183,class,2,"[0, 1]"


<a id='missings'></a>

### Missings

In [None]:
missings_df = pd.DataFrame(data={
    'feature': df_train.isnull().sum().index,
    'num_missings': df_train.isnull().sum().values,
    'share_missings': [v/len(df_train) for v in df_train.isnull().sum().values]
}).sort_values('num_missings', ascending=False)
missings_df.head(10)

Unnamed: 0,feature,num_missings,share_missings
7,related_apps,720,0.026364
8,dangerous_permissions_count,201,0.00736
3,description,3,0.00011
0,app,1,3.7e-05
128,system_tools_set_wallpaper_size_hints,0,0.0
119,system_tools_read_sync_statistics,0,0.0
120,system_tools_read_write_to_resources_owned_by_...,0,0.0
121,system_tools_reorder_running_applications,0,0.0
122,system_tools_retrieve_running_applications,0,0.0
123,system_tools_send_package_removed_broadcast,0,0.0


#### Missings by label

In [None]:
# Observations with y = 0:
missings_y0_df = pd.DataFrame(data={
    'feature': df_train[df_train['class']==0].isnull().sum().index,
    'num_missings_y0': df_train[df_train['class']==0].isnull().sum().values,
    'share_missings_y0': [v/len(df_train[df_train['class']==0]) for v in df_train[df_train['class']==0].isnull().sum().values]
}).sort_values('num_missings_y0', ascending=False)

# Observations with y = 1:
missings_y1_df = pd.DataFrame(data={
    'feature': df_train[df_train['class']==1].isnull().sum().index,
    'num_missings_y1': df_train[df_train['class']==1].isnull().sum().values,
    'share_missings': [v/len(df_train[df_train['class']==1]) for v in df_train[df_train['class']==1].isnull().sum().values]
}).sort_values('num_missings_y1', ascending=False)

missings_by_label_df = missings_y0_df.merge(missings_y1_df, on='feature', how='left').sort_values('num_missings_y1', ascending=False)
missings_by_label_df.head(10)

Unnamed: 0,feature,num_missings_y0,share_missings_y0,num_missings_y1,share_missings
0,related_apps,81,0.008923,639,0.035048
2,dangerous_permissions_count,3,0.00033,198,0.01086
3,app,0,0.0,1,5.5e-05
128,access_download_manager_,0,0.0,0,0.0
119,access_checkin_properties,0,0.0,0,0.0
120,category,0,0.0,0,0.0
121,rating,0,0.0,0,0.0
122,number_of_ratings,0,0.0,0,0.0
123,price,0,0.0,0,0.0
124,safe_permissions_count,0,0.0,0,0.0


#### Missings by observation

In [None]:
missings_rows_df = pd.DataFrame(data={
    'idx_obs': df_train.T.isnull().sum().index,
    'num_missings': df_train.T.isnull().sum().values,
    'share_missings': [v/len(df_train) for v in df_train.T.isnull().sum().values]
}).sort_values('num_missings', ascending=False)
missings_rows_df.head(10)

Unnamed: 0,idx_obs,num_missings,share_missings
26092,28548,2,7.3e-05
19011,20322,2,7.3e-05
11180,11639,2,7.3e-05
14087,14813,2,7.3e-05
27059,29694,2,7.3e-05
27062,29699,2,7.3e-05
9515,9832,2,7.3e-05
8751,9017,2,7.3e-05
7670,7869,2,7.3e-05
2413,2434,2,7.3e-05


In [None]:
display(missings_rows_df.num_missings.describe())
print('\n')
display(missings_rows_df.num_missings.value_counts())

count    27310.000000
mean         0.033870
std          0.217818
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          2.000000
Name: num_missings, dtype: float64





0    26586
1      523
2      201
Name: num_missings, dtype: int64

<a id='features'></a>

### Features

In [None]:
data_und = data_types.merge(n_unique_df, on='feature', how='left')
data_und = data_und.merge(missings_df, on='feature', how='left')

data_und.sample(10)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings
85,network_communication_download_files_without_n...,int64,1,[0],0,0.0
107,system_tools_expand_collapse_status_bar,int64,2,"[0, 1]",0,0.0
92,phone_calls_modify_phone_state,int64,2,"[0, 1]",0,0.0
127,system_tools_set_wallpaper,int64,2,"[0, 1]",0,0.0
172,your_personal_information_choose_widgets,int64,2,"[0, 1]",0,0.0
35,delete_other_applications_caches,int64,2,"[0, 1]",0,0.0
101,system_tools_change_background_data_usage_setting,int64,1,[0],0,0.0
117,system_tools_read_subscribed_feeds,int64,2,"[0, 1]",0,0.0
169,your_messages_send_wap_push_received_broadcast,int64,2,"[0, 1]",0,0.0
10,access_drm_content_,int64,2,"[0, 1]",0,0.0


#### Features by type

In [None]:
data_und['var_class'] = ''

Text variables

In [None]:
data_und[data_und['type']==object]

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class
0,app,object,22823,"[Alabama Crimson Tide News, Blood Demon Movie,...",1,3.7e-05,
1,package,object,23485,"[com.estrongs.android.pop.app.shortcut, com.tw...",0,0.0,
2,category,object,30,"[Shopping, Racing, Productivity, Sports Games,...",0,0.0,
3,description,object,23552,"[Enjoy Navionics??? Anytime, Anywhere.<p>The W...",3,0.00011,
7,related_apps,object,23868,[{com.warting.blogg.wis_trevortransdgtl_feed_n...,720,0.026364,


In [None]:
cat_vars = list(data_und[data_und['type']==object]['feature'])
data_und.loc[data_und.feature.isin(cat_vars), 'var_class'] = 'categorical'

Binary variables

In [None]:
data_und[data_und['feature'].isin([c for c in data_und['feature'] if (data_types_dict[c]!=object) & (df_train[c].nunique()==2)])]

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class
10,access_drm_content_,int64,2,"[0, 1]",0,0.0,
11,access_email_provider_data,int64,2,"[0, 1]",0,0.0,
13,access_download_manager_,int64,2,"[0, 1]",0,0.0,
14,advanced_download_manager_functions_,int64,2,"[0, 1]",0,0.0,
15,audio_file_access,int64,2,"[0, 1]",0,0.0,
...,...,...,...,...,...,...,...
179,your_personal_information_set_alarm_in_alarm_c...,int64,2,"[0, 1]",0,0.0,
180,your_personal_information_write_browsers_histo...,int64,2,"[0, 1]",0,0.0,
181,your_personal_information_write_contact_data,int64,2,"[1, 0]",0,0.0,
182,your_personal_information_write_to_user_define...,int64,2,"[0, 1]",0,0.0,


In [None]:
binary_vars = list(data_und[data_und['feature'].isin([c for c in data_und['feature'] if (data_types_dict[c]!=object) &
                                                      (df_train[c].nunique()==2)])]['feature'])
data_und.loc[data_und.feature.isin(binary_vars), 'var_class'] = 'binary'

Continuous variables

In [None]:
data_und[data_und['feature'].isin([c for c in data_und['feature'] if (data_types_dict[c]!=object) & (df_train[c].nunique()>2)])]

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class
4,rating,float64,42,"[0.0, 1.9, 1.1, 4.6, 2.5, 3.6, 1.0, 2.9, 4.5, ...",0,0.0,
5,number_of_ratings,int64,5312,"[6366, 9556, 1937, 2115, 13590, 625, 3685, 377...",0,0.0,
6,price,float64,425,"[6.78, 1.07, 19.95, 2.41, 1.29, 4.55, 7.95, 4....",0,0.0,
8,dangerous_permissions_count,float64,28,"[4.0, 13.0, 12.0, 0.0, 22.0, 15.0, 11.0, 10.0,...",201,0.00736,
9,safe_permissions_count,int64,16,"[11, 16, 2, 10, 1, 4, 13, 14, 7, 9]",0,0.0,
184,app_id,int64,27310,"[2602, 22638, 16825, 23771, 16064, 17568, 2043...",0,0.0,


In [None]:
cont_vars = list(data_und[data_und['feature'].isin([c for c in data_und['feature'] if (data_types_dict[c]!=object) &
                                                    (df_train[c].nunique()>2)])]['feature'])
data_und.loc[data_und.feature.isin(cont_vars), 'var_class'] = 'numerical'

In [None]:
data_und[data_und.var_class=='']

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class
12,access_all_system_downloads,int64,1,[0],0,0.0,
17,modify_google_service_configuration,int64,1,[0],0,0.0,
26,access_to_passwords_for_google_accounts,int64,1,[0],0,0.0,
27,act_as_an_account_authenticator,int64,1,[0],0,0.0,
31,coarse,int64,1,[0],0,0.0,
40,discover_known_accounts,int64,1,[0],0,0.0,
45,full_internet_access,int64,1,[0],0,0.0,
48,mock_location_sources_for_testing,int64,1,[0],0,0.0,
52,modify_delete_usb_storage_contents_modify_dele...,int64,1,[0],0,0.0,
55,permanently_disable_device,int64,1,[0],0,0.0,


#### Features by category

In [None]:
# Features related with attributes of the app:
app_attributes = [
	"related_apps",
	"price",
	"number_of_ratings",
	"rating",
	"description",
	"category",
	"package",
	"app",
	"app_id"
]

# Features related with actions performed by the app:
actions_calls = list(data_und[(data_und.feature.str.startswith('phone_call')) | (data_und.feature=='directly_call_any_phone_numbers')]['feature'])
actions_write = list(data_und[data_und.feature.str.startswith('write_')]['feature'])
actions_read = list(data_und[data_und.feature.str.startswith('read_')]['feature'])
actions_modify = list(data_und[data_und.feature.str.startswith('modify_')]['feature'])
actions_force = list(data_und[data_und.feature.str.startswith('force_')]['feature'])
actions_control = list(data_und[data_und.feature.str.startswith('control_')]['feature'])
actions_delete = list(data_und[data_und.feature.str.startswith('delete_')]['feature'])
actions_bind = list(data_und[data_und.feature.str.startswith('bind_')]['feature'])
actions_access = list(data_und[data_und.feature.str.startswith('access_')]['feature'])

# Features related with interaction between app and system (device, data, other applications):
interactions_personal_info = list(data_und[data_und.feature.str.startswith('your_personal_info')]['feature'])
interactions_messages = list(data_und[data_und.feature.str.startswith('your_message')]['feature'])
interactions_location = list(data_und[data_und.feature.str.startswith('your_loca')]['feature'])
interactions_accounts = list(data_und[data_und.feature.str.startswith('your_acc')]['feature'])
interactions_sys_tools = list(data_und[data_und.feature.str.startswith('system_tools')]['feature'])
interactions_networks = list(data_und[data_und.feature.str.startswith('network_comm')]['feature'])
interactions_hardware = list(data_und[data_und.feature.str.startswith('hardware_contr')]['feature'])
interactions_dev_tools = list(data_und[data_und.feature.str.startswith('development_tool')]['feature'])

# Other features related with actions performed by the app:
actions_others = [
	"directly_install_applications",
	"dangerous_permissions_count",
	"safe_permissions_count",
	"services_that_cost_you_money_directly_call_phone_numbers",
	"services_that_cost_you_money_send_sms_messages",
	"start_im_service",
	"full_internet_access",
	"enable_or_disable_application_components",
	"manage_application_tokens",
	"prevent_app_switches",
	"update_component_usage_statistics",
	"run_in_factory_test_mode", "coarse", "voice_search_shortcuts",
	"disable_or_modify_status_bar",
	"display_unauthorized_windows",
	"partial_shutdown",
	"power_device_on_or_off",
	"set_time",
	"change_screen_orientation",
	"press_keys_and_control_buttons",
	"send_download_notifications_",
	"permanently_disable_device",
]

# Other features related with interaction between app and system (device, data, other applications):
interactions_others = [
	"storage_modify_delete_usb_storage_contents_modify_delete_sd_card_contents",
	 "set_wallpaper_size_hints",
	"reset_system_to_factory_defaults",
	"record_what_you_type_and_actions_you_take",
	"permission_to_install_a_location_provider",
	"monitor_and_control_all_application_launching",
	"mock_location_sources_for_testing",
	"interact_with_a_device_admin",
	"act_as_an_account_authenticator",
	"move_application_resources",
	"install_drm_content_",
	"audio_file_access",
	"advanced_download_manager_functions_",
	"discover_known_accounts"
]

# Categories of features:
categories_feat = ["app_attributes",
"actions_calls",
"actions_write",
"actions_read",
"actions_modify",
"actions_force",
"actions_control",
"actions_delete",
"actions_bind",
"actions_access",
"interactions_personal_info",
"interactions_messages",
"interactions_location",
"interactions_accounts",
"interactions_sys_tools",
"interactions_networks",
"interactions_hardware",
"interactions_dev_tools",
"actions_others",
"interactions_others"
]

In [None]:
categories_feat_dict = {}
categories_feat_dict['class'] = 'target'

# Loop over categories:
for c in categories_feat:
  # Loop over features:
  for f in eval(c):
    categories_feat_dict[f] = c

# Category of each variable:
data_und['category'] = data_und.feature.apply(lambda x: categories_feat_dict[x])

In [None]:
if EXPORT:
    data_und.to_csv('../data/features.csv', index=False)
    
    # Original data:
    input_data = pd.read_csv('../data/Android_Permission.csv')
    input_data.columns = [correct_col_name(c) for c in input_data.columns]

    # Schema of original data:
    schema = dict(
        zip(
            [c for c in input_data.drop(['class'], axis=1).columns],
            ['str' if type(input_data[c].iloc[0])==str else 'numeric' for c in input_data.drop(['class'], axis=1).columns]
        )
    )

<a id='data_cleaning'></a>

### Data cleaning

In [None]:
# Removing signs from the text of related apps:
df_train['related_apps'] = df_train['related_apps'].apply(lambda x: x if pd.isna(x) else x.replace('{', '').replace('}', ''))

#### Train-test split

In [None]:
df_train, df_test = train_test_split(df_train, test_ratio=0.33, shuffle=True)

<a id='eda'></a>

## Exploratory data analysis

In [None]:
print('\033[1mDistribution of features by category:\033[0m')
data_und.category.value_counts()

[1mDistribution of features by category:[0m


interactions_sys_tools        35
actions_others                23
interactions_accounts         21
interactions_messages         14
interactions_others           14
interactions_personal_info    12
interactions_networks          9
app_attributes                 9
actions_access                 8
interactions_hardware          6
actions_modify                 6
interactions_location          4
actions_calls                  4
actions_read                   4
interactions_dev_tools         4
actions_delete                 3
actions_control                2
actions_write                  2
actions_force                  2
actions_bind                   2
target                         1
Name: category, dtype: int64

<a id='dist_y'></a>

### Distribution of labels (P(Y))

In [None]:
df_train['class'].value_counts()/len(df_train)

1    0.667122
0    0.332878
Name: class, dtype: float64

<a id='dist_x'></a>

### Distribution of covariates (P(X))

#### Category

In [None]:
df_train['category'].value_counts().head(10)/len(df_train)

Entertainment        0.095530
Travel & Local       0.072303
Books & Reference    0.067931
Arcade & Action      0.060826
Brain & Puzzle       0.059515
Casual               0.054214
Personalization      0.052738
Lifestyle            0.048912
Tools                0.044431
Education            0.043775
Name: category, dtype: float64

#### Rating

In [None]:
df_train['rating'].describe()

count    18298.000000
mean         3.500628
std          1.451806
min          0.000000
25%          3.300000
50%          4.000000
75%          4.400000
max          5.000000
Name: rating, dtype: float64

In [None]:
df_train['number_of_ratings'].describe()

count    1.829800e+04
mean     5.602205e+03
std      4.641166e+04
min      0.000000e+00
25%      4.000000e+00
50%      3.800000e+01
75%      4.920000e+02
max      1.908590e+06
Name: number_of_ratings, dtype: float64

#### Price

In [None]:
df_train['price'].describe()

count    18298.000000
mean         0.664250
std          3.406534
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max        149.990000
Name: price, dtype: float64

#### Interaction with system tools

In [None]:
df_train[list(data_und[data_und.category=='interactions_sys_tools']['feature'])].mean().sort_values(ascending=False)

system_tools_prevent_device_from_sleeping             0.195868
system_tools_automatically_start_at_boot              0.092251
system_tools_retrieve_running_applications            0.048202
system_tools_modify_global_system_settings            0.042136
system_tools_set_wallpaper                            0.037928
system_tools_change_wi_fi_state                       0.028473
system_tools_kill_background_processes                0.022953
system_tools_disable_keylock                          0.021368
system_tools_change_network_connectivity              0.017925
system_tools_bluetooth_administration                 0.013991
system_tools_mount_and_unmount_filesystems            0.011094
system_tools_display_system_level_alerts              0.009291
system_tools_read_sync_settings                       0.008689
system_tools_send_sticky_broadcast                    0.008307
system_tools_write_sync_settings                      0.007979
system_tools_change_your_ui_settings                  0

#### Interaction with accounts

In [None]:
df_train[list(data_und[data_und.category=='interactions_accounts']['feature'])].mean().sort_values(ascending=False)

your_accounts_discover_known_accounts                             0.057438
your_accounts_use_the_authentication_credentials_of_an_account    0.012898
your_accounts_manage_the_accounts_list                            0.010821
your_accounts_act_as_an_account_authenticator                     0.006613
your_accounts_view_configured_accounts                            0.003990
your_accounts_act_as_the_accountmanagerservice                    0.001749
your_accounts_read_google_service_configuration                   0.001694
your_accounts_access_other_google_services                        0.001585
your_accounts_google_mail                                         0.000601
your_accounts_youtube_usernames                                   0.000383
your_accounts_youtube                                             0.000383
your_accounts_google_spreadsheets                                 0.000328
your_accounts_google_maps                                         0.000328
your_accounts_google_docs

#### Interaction with personal information

In [None]:
df_train[list(data_und[data_und.category=='interactions_personal_info']['feature'])].mean().sort_values(ascending=False)

your_personal_information_read_contact_data                                         0.084709
your_personal_information_write_contact_data                                        0.037490
your_personal_information_read_sensitive_log_data                                   0.022625
your_personal_information_read_browsers_history_and_bookmarks                       0.017762
your_personal_information_read_calendar_events                                      0.014428
your_personal_information_write_browsers_history_and_bookmarks                      0.014264
your_personal_information_add_or_modify_calendar_events_and_send_email_to_guests    0.013225
your_personal_information_choose_widgets                                            0.001476
your_personal_information_write_to_user_defined_dictionary                          0.001202
your_personal_information_read_user_defined_dictionary                              0.001148
your_personal_information_set_alarm_in_alarm_clock                    

#### Interaction with network resources

In [None]:
df_train[list(data_und[data_und.category=='interactions_networks']['feature'])].mean().sort_values(ascending=False)

network_communication_full_internet_access                        0.808012
network_communication_view_network_state                          0.554979
network_communication_view_wi_fi_state                            0.133840
network_communication_receive_data_from_internet                  0.039786
network_communication_create_bluetooth_connections                0.022243
network_communication_control_near_field_communication            0.001421
network_communication_make_receive_internet_calls                 0.000273
network_communication_broadcast_data_messages_to_applications_    0.000055
network_communication_download_files_without_notification         0.000000
dtype: float64

#### Interaction with the hardware

In [None]:
df_train[list(data_und[data_und.category=='interactions_hardware']['feature'])].mean().sort_values(ascending=False)

hardware_controls_control_vibrator              0.221062
hardware_controls_take_pictures_and_videos      0.065417
hardware_controls_record_audio                  0.038419
hardware_controls_change_your_audio_settings    0.027981
hardware_controls_control_flashlight            0.015029
hardware_controls_test_hardware                 0.001366
dtype: float64

#### Interaction with develoment tools

In [None]:
df_train[list(data_und[data_und.category=='interactions_dev_tools']['feature'])].mean().sort_values(ascending=False)

development_tools_send_linux_signals_to_applications        0.000164
development_tools_enable_application_debugging              0.000109
development_tools_make_all_background_applications_close    0.000055
development_tools_limit_number_of_running_processes         0.000000
dtype: float64

<a id='dist_x_y'></a>

### Distribution of covariates conditional on labels (P(X|Y))

#### Rating

In [None]:
df_train.groupby('class').describe()[['rating']]

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,6091.0,3.965441,0.696474,0.0,3.7,4.1,4.4,5.0
1,12207.0,3.268698,1.660093,0.0,2.9,3.9,4.4,5.0


In [None]:
df_train.groupby('class').describe()[['number_of_ratings']]

Unnamed: 0_level_0,number_of_ratings,number_of_ratings,number_of_ratings,number_of_ratings,number_of_ratings,number_of_ratings,number_of_ratings,number_of_ratings
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,6091.0,9031.736661,60879.754529,0.0,40.0,240.0,1825.5,1897622.0
1,12207.0,3890.951503,37025.969904,0.0,2.0,12.0,155.5,1908590.0


#### Price

In [None]:
df_train.groupby('class').describe()[['price']]

Unnamed: 0_level_0,price,price,price,price,price,price,price,price
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,6091.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,12207.0,0.995695,4.131012,0.0,0.0,0.0,0.99,149.99


<a id='dist_y_x'></a>

### Distribution of labels conditional on covariates (P(Y|X))

#### Category

In [None]:
df_train.groupby('category').mean()[['class']].sort_values('class', ascending=False).head(10)

Unnamed: 0_level_0,class
category,Unnamed: 1_level_1
Transportation,0.98524
Medical,0.983051
Travel & Local,0.981859
Sports,0.956186
News & Magazines,0.928082
Shopping,0.926606
Photography,0.89071
Tools,0.884379
Music & Audio,0.803797
Productivity,0.797531


#### Interaction with system tools

In [None]:
sel_vars = list(data_und[data_und.category=='interactions_sys_tools']['feature'])

# Loop over variables:
for v in sel_vars:
  display(df_train.groupby(v).mean()[['class']].sort_values('class', ascending=False))

Unnamed: 0_level_0,class
system_tools_allow_wi_fi_multicast_reception,Unnamed: 1_level_1
0,0.667233
1,0.625


Unnamed: 0_level_0,class
system_tools_automatically_start_at_boot,Unnamed: 1_level_1
0,0.670921
1,0.629739


Unnamed: 0_level_0,class
system_tools_bluetooth_administration,Unnamed: 1_level_1
0,0.66772
1,0.625


Unnamed: 0_level_0,class
system_tools_change_wi_fi_state,Unnamed: 1_level_1
1,0.704415
0,0.666029


Unnamed: 0_level_0,class
system_tools_change_background_data_usage_setting,Unnamed: 1_level_1
0,0.667122


Unnamed: 0_level_0,class
system_tools_change_network_connectivity,Unnamed: 1_level_1
1,0.716463
0,0.666221


Unnamed: 0_level_0,class
system_tools_change_your_ui_settings,Unnamed: 1_level_1
1,0.772277
0,0.666538


Unnamed: 0_level_0,class
system_tools_delete_all_application_cache_data,Unnamed: 1_level_1
0,0.667506
1,0.472222


Unnamed: 0_level_0,class
system_tools_disable_keylock,Unnamed: 1_level_1
0,0.667281
1,0.659847


Unnamed: 0_level_0,class
system_tools_display_system_level_alerts,Unnamed: 1_level_1
0,0.667696
1,0.605882


Unnamed: 0_level_0,class
system_tools_expand_collapse_status_bar,Unnamed: 1_level_1
1,0.679245
0,0.667087


Unnamed: 0_level_0,class
system_tools_force_stop_other_applications,Unnamed: 1_level_1
1,0.75
0,0.667104


Unnamed: 0_level_0,class
system_tools_format_external_storage,Unnamed: 1_level_1
0,0.667159
1,0.571429


Unnamed: 0_level_0,class
system_tools_kill_background_processes,Unnamed: 1_level_1
0,0.668363
1,0.614286


Unnamed: 0_level_0,class
system_tools_make_application_always_run,Unnamed: 1_level_1
1,0.677966
0,0.667087


Unnamed: 0_level_0,class
system_tools_measure_application_storage_space,Unnamed: 1_level_1
0,0.667543
1,0.441176


Unnamed: 0_level_0,class
system_tools_modify_global_animation_speed,Unnamed: 1_level_1
1,1.0
0,0.667086


Unnamed: 0_level_0,class
system_tools_modify_global_system_settings,Unnamed: 1_level_1
0,0.672505
1,0.544747


Unnamed: 0_level_0,class
system_tools_mount_and_unmount_filesystems,Unnamed: 1_level_1
0,0.667809
1,0.605911


Unnamed: 0_level_0,class
system_tools_prevent_device_from_sleeping,Unnamed: 1_level_1
0,0.673372
1,0.641462


Unnamed: 0_level_0,class
system_tools_read_subscribed_feeds,Unnamed: 1_level_1
0,0.66736
1,0.411765


Unnamed: 0_level_0,class
system_tools_read_sync_settings,Unnamed: 1_level_1
0,0.66889
1,0.465409


Unnamed: 0_level_0,class
system_tools_read_sync_statistics,Unnamed: 1_level_1
0,0.6678
1,0.467742


Unnamed: 0_level_0,class
system_tools_read_write_to_resources_owned_by_diag,Unnamed: 1_level_1
0,0.667177
1,0.5


Unnamed: 0_level_0,class
system_tools_reorder_running_applications,Unnamed: 1_level_1
0,0.66725
1,0.55


Unnamed: 0_level_0,class
system_tools_retrieve_running_applications,Unnamed: 1_level_1
0,0.671566
1,0.579365


Unnamed: 0_level_0,class
system_tools_send_package_removed_broadcast,Unnamed: 1_level_1
1,0.75
0,0.667104


Unnamed: 0_level_0,class
system_tools_send_sticky_broadcast,Unnamed: 1_level_1
1,0.802632
0,0.665987


Unnamed: 0_level_0,class
system_tools_set_preferred_applications,Unnamed: 1_level_1
0,0.667233
1,0.625


Unnamed: 0_level_0,class
system_tools_set_time_zone,Unnamed: 1_level_1
1,1.0
0,0.667049


Unnamed: 0_level_0,class
system_tools_set_wallpaper,Unnamed: 1_level_1
0,0.673086
1,0.51585


Unnamed: 0_level_0,class
system_tools_set_wallpaper_size_hints,Unnamed: 1_level_1
0,0.667325
1,0.606557


Unnamed: 0_level_0,class
system_tools_write_access_point_name_settings,Unnamed: 1_level_1
1,0.693878
0,0.66705


Unnamed: 0_level_0,class
system_tools_write_subscribed_feeds,Unnamed: 1_level_1
0,0.667414
1,0.352941


Unnamed: 0_level_0,class
system_tools_write_sync_settings,Unnamed: 1_level_1
0,0.668852
1,0.452055


#### Interaction with accounts

In [None]:
sel_vars = list(data_und[data_und.category=='interactions_accounts']['feature'])

# Loop over variables:
for v in sel_vars:
  display(df_train.groupby(v).mean()[['class']].sort_values('class', ascending=False))

Unnamed: 0_level_0,class
your_accounts_blogger,Unnamed: 1_level_1
0,0.667122


Unnamed: 0_level_0,class
your_accounts_google_app_engine,Unnamed: 1_level_1
0,0.667177
1,0.333333


Unnamed: 0_level_0,class
your_accounts_google_docs,Unnamed: 1_level_1
0,0.667122
1,0.666667


Unnamed: 0_level_0,class
your_accounts_google_finance,Unnamed: 1_level_1
0,0.667195
1,0.0


Unnamed: 0_level_0,class
your_accounts_google_maps,Unnamed: 1_level_1
0,0.667177
1,0.5


Unnamed: 0_level_0,class
your_accounts_google_spreadsheets,Unnamed: 1_level_1
0,0.667122
1,0.666667


Unnamed: 0_level_0,class
your_accounts_google_voice,Unnamed: 1_level_1
0,0.667122


Unnamed: 0_level_0,class
your_accounts_google_mail,Unnamed: 1_level_1
0,0.667359
1,0.272727


Unnamed: 0_level_0,class
your_accounts_picasa_web_albums,Unnamed: 1_level_1
0,0.667122


Unnamed: 0_level_0,class
your_accounts_youtube,Unnamed: 1_level_1
0,0.667213
1,0.428571


Unnamed: 0_level_0,class
your_accounts_youtube_usernames,Unnamed: 1_level_1
0,0.667213
1,0.428571


Unnamed: 0_level_0,class
your_accounts_access_all_google_services,Unnamed: 1_level_1
0,0.667159
1,0.5


Unnamed: 0_level_0,class
your_accounts_access_other_google_services,Unnamed: 1_level_1
1,0.862069
0,0.666813


Unnamed: 0_level_0,class
your_accounts_act_as_an_account_authenticator,Unnamed: 1_level_1
0,0.668482
1,0.46281


Unnamed: 0_level_0,class
your_accounts_act_as_the_accountmanagerservice,Unnamed: 1_level_1
0,0.66747
1,0.46875


Unnamed: 0_level_0,class
your_accounts_contacts_data_in_google_accounts,Unnamed: 1_level_1
0,0.667122
1,0.666667


Unnamed: 0_level_0,class
your_accounts_discover_known_accounts,Unnamed: 1_level_1
0,0.67461
1,0.544244


Unnamed: 0_level_0,class
your_accounts_manage_the_accounts_list,Unnamed: 1_level_1
0,0.669337
1,0.464646


Unnamed: 0_level_0,class
your_accounts_read_google_service_configuration,Unnamed: 1_level_1
0,0.667543
1,0.419355


Unnamed: 0_level_0,class
your_accounts_use_the_authentication_credentials_of_an_account,Unnamed: 1_level_1
0,0.668365
1,0.572034


Unnamed: 0_level_0,class
your_accounts_view_configured_accounts,Unnamed: 1_level_1
0,0.667654
1,0.534247


#### Interaction with personal information

In [None]:
sel_vars = list(data_und[data_und.category=='interactions_personal_info']['feature'])

# Loop over variables:
for v in sel_vars:
  display(df_train.groupby(v).mean()[['class']].sort_values('class', ascending=False))

Unnamed: 0_level_0,class
your_personal_information_add_or_modify_calendar_events_and_send_email_to_guests,Unnamed: 1_level_1
1,0.747934
0,0.666039


Unnamed: 0_level_0,class
your_personal_information_choose_widgets,Unnamed: 1_level_1
1,0.740741
0,0.667013


Unnamed: 0_level_0,class
your_personal_information_read_browsers_history_and_bookmarks,Unnamed: 1_level_1
0,0.671452
1,0.427692


Unnamed: 0_level_0,class
your_personal_information_read_calendar_events,Unnamed: 1_level_1
1,0.753788
0,0.665853


Unnamed: 0_level_0,class
your_personal_information_read_contact_data,Unnamed: 1_level_1
0,0.670886
1,0.626452


Unnamed: 0_level_0,class
your_personal_information_read_sensitive_log_data,Unnamed: 1_level_1
0,0.67127
1,0.487923


Unnamed: 0_level_0,class
your_personal_information_read_user_defined_dictionary,Unnamed: 1_level_1
1,0.714286
0,0.667068


Unnamed: 0_level_0,class
your_personal_information_retrieve_system_internal_state,Unnamed: 1_level_1
0,0.667268
1,0.4


Unnamed: 0_level_0,class
your_personal_information_set_alarm_in_alarm_clock,Unnamed: 1_level_1
0,0.667141
1,0.636364


Unnamed: 0_level_0,class
your_personal_information_write_browsers_history_and_bookmarks,Unnamed: 1_level_1
0,0.671453
1,0.367816


Unnamed: 0_level_0,class
your_personal_information_write_contact_data,Unnamed: 1_level_1
0,0.669032
1,0.618076


Unnamed: 0_level_0,class
your_personal_information_write_to_user_defined_dictionary,Unnamed: 1_level_1
0,0.667159
1,0.636364


#### Interaction with network resources

In [None]:
sel_vars = list(data_und[data_und.category=='interactions_networks']['feature'])

# Loop over variables:
for v in sel_vars:
  display(df_train.groupby(v).mean()[['class']].sort_values('class', ascending=False))

Unnamed: 0_level_0,class
network_communication_broadcast_data_messages_to_applications_,Unnamed: 1_level_1
1,1.0
0,0.667104


Unnamed: 0_level_0,class
network_communication_control_near_field_communication,Unnamed: 1_level_1
0,0.66736
1,0.5


Unnamed: 0_level_0,class
network_communication_create_bluetooth_connections,Unnamed: 1_level_1
0,0.667151
1,0.665848


Unnamed: 0_level_0,class
network_communication_download_files_without_notification,Unnamed: 1_level_1
0,0.667122


Unnamed: 0_level_0,class
network_communication_full_internet_access,Unnamed: 1_level_1
0,0.75548
1,0.646128


Unnamed: 0_level_0,class
network_communication_make_receive_internet_calls,Unnamed: 1_level_1
1,0.8
0,0.667086


Unnamed: 0_level_0,class
network_communication_receive_data_from_internet,Unnamed: 1_level_1
0,0.667786
1,0.651099


Unnamed: 0_level_0,class
network_communication_view_wi_fi_state,Unnamed: 1_level_1
1,0.671703
0,0.666414


Unnamed: 0_level_0,class
network_communication_view_network_state,Unnamed: 1_level_1
0,0.744811
1,0.604825


#### Interaction with the hardware

In [None]:
sel_vars = list(data_und[data_und.category=='interactions_hardware']['feature'])

# Loop over variables:
for v in sel_vars:
  display(df_train.groupby(v).mean()[['class']].sort_values('class', ascending=False))

Unnamed: 0_level_0,class
hardware_controls_change_your_audio_settings,Unnamed: 1_level_1
0,0.667491
1,0.654297


Unnamed: 0_level_0,class
hardware_controls_control_flashlight,Unnamed: 1_level_1
0,0.667702
1,0.629091


Unnamed: 0_level_0,class
hardware_controls_control_vibrator,Unnamed: 1_level_1
0,0.680138
1,0.621261


Unnamed: 0_level_0,class
hardware_controls_record_audio,Unnamed: 1_level_1
0,0.668315
1,0.637269


Unnamed: 0_level_0,class
hardware_controls_take_pictures_and_videos,Unnamed: 1_level_1
1,0.698413
0,0.664932


Unnamed: 0_level_0,class
hardware_controls_test_hardware,Unnamed: 1_level_1
0,0.667159
1,0.64


#### Interaction with develoment tools

In [None]:
sel_vars = list(data_und[data_und.category=='interactions_dev_tools']['feature'])

# Loop over variables:
for v in sel_vars:
  display(df_train.groupby(v).mean()[['class']].sort_values('class', ascending=False))

Unnamed: 0_level_0,class
development_tools_enable_application_debugging,Unnamed: 1_level_1
0,0.66714
1,0.5


Unnamed: 0_level_0,class
development_tools_limit_number_of_running_processes,Unnamed: 1_level_1
0,0.667122


Unnamed: 0_level_0,class
development_tools_make_all_background_applications_close,Unnamed: 1_level_1
1,1.0
0,0.667104


Unnamed: 0_level_0,class
development_tools_send_linux_signals_to_applications,Unnamed: 1_level_1
0,0.667177
1,0.333333


#### Bivariate modeling

In [None]:
features, test_roc_auc, coefs = [], [], []

# Grid of hyper-parameters:
grid_param = {'C': [0.0001, 0.0003, 0.001, 0.003, 0.01, 0.03, 0.1, 0.25, 0.3, 0.5, 0.75, 1, 3, 10]}
default_param = {'C': 1.0}
fixed_params = {'penalty':'l1', 'solver':'liblinear', 'warm_start':True, 'max_iter':500}

# Loop over variables:
for v in list(data_und[~(data_und.feature.isin(drop_vars)) & (data_und['type']!=object)]['feature']):
  # Creating K-folds CV object:
  kfolds = Kfolds_fit(task='classification', method='logistic_regression', num_folds=5, metric='roc_auc',
                      shuffle=False,
                      random_search=False,
                      grid_param=grid_param, default_param=default_param, fixed_params=fixed_params,
                      pre_selecting=False,
                      parallelize=False)

  # Running K-folds CV:
  kfolds.fit(train_inputs=df_train.dropna()[[v]], train_output=df_train.dropna()['class'],
            test_inputs=df_test.dropna()[[v]], test_output=df_test.dropna()['class'])

  # Collecting outcomes:
  features.append(v)
  test_roc_auc.append(kfolds.performance_metrics['test_roc_auc'])
  coefs.append(kfolds.model.coef_)



---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.003}.
   CV performance metric associated with best hyper-parameters: 0.5725.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5689
   test_prec_avg = 0.774
   test_brier = 0.2157
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.08 minutes.
Start time: 2022-01-29, 17:58:14
End time: 2022-01-29, 17:58:19
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.7342.


Performance metrics evaluated at test data:
   test_roc_auc = 0.7351
   test_prec_avg = 0.8605
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.92 minutes.
Start time: 2022-01-29, 17:58:19
End time: 2022-01-29, 17:59:14
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0003}.
   CV performance metric associated with best hyper-parameters: 0.6382.


Performance metrics evaluated at test data:
   test_roc_auc = 0.6367
   test_prec_avg = 0.762
   test_brier = 0.2404
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 5.44 minutes.
Start time: 2022-01-29, 17:59:15
End time: 2022-01-29, 18:04:41
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.01}.
   CV performance metric associated with best hyper-parameters: 0.5321.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5343
   test_prec_avg = 0.6994
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:04:41
End time: 2022-01-29, 18:04:44
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.003}.
   CV performance metric associated with best hyper-parameters: 0.5645.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5629
   test_prec_avg = 0.7119
   test_brier = 0.2207
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:04:45
End time: 2022-01-29, 18:04:48
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5003.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5002
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:04:48
End time: 2022-01-29, 18:04:51
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:04:51
End time: 2022-01-29, 18:04:54
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:04:54
End time: 2022-01-29, 18:04:57
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:04:57
End time: 2022-01-29, 18:05:00
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:00
End time: 2022-01-29, 18:05:03
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4998
   test_prec_avg = 0.6724
   test_brier = 0.2207
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:03
End time: 2022-01-29, 18:05:06
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5004.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5008
   test_prec_avg = 0.6728
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:06
End time: 2022-01-29, 18:05:09
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:09
End time: 2022-01-29, 18:05:12
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:12
End time: 2022-01-29, 18:05:15
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:15
End time: 2022-01-29, 18:05:18
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 1}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4998
   test_prec_avg = 0.6724
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:18
End time: 2022-01-29, 18:05:21
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:21
End time: 2022-01-29, 18:05:24
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:24
End time: 2022-01-29, 18:05:27
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:27
End time: 2022-01-29, 18:05:30
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5003
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:30
End time: 2022-01-29, 18:05:33
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:33
End time: 2022-01-29, 18:05:36
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:36
End time: 2022-01-29, 18:05:39
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:39
End time: 2022-01-29, 18:05:42
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5006.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5009
   test_prec_avg = 0.6728
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:42
End time: 2022-01-29, 18:05:45
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5003.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:45
End time: 2022-01-29, 18:05:48
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5007.


Performance metrics evaluated at test data:
   test_roc_auc = 0.499
   test_prec_avg = 0.6721
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:48
End time: 2022-01-29, 18:05:51
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:51
End time: 2022-01-29, 18:05:54
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:55
End time: 2022-01-29, 18:05:57
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5001
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:05:58
End time: 2022-01-29, 18:06:00
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5015.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5004
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:00
End time: 2022-01-29, 18:06:03
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5004.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:04
End time: 2022-01-29, 18:06:06
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.501
   test_prec_avg = 0.6729
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:07
End time: 2022-01-29, 18:06:09
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5014
   test_prec_avg = 0.6731
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:10
End time: 2022-01-29, 18:06:13
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5012.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5001
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:13
End time: 2022-01-29, 18:06:16
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:16
End time: 2022-01-29, 18:06:19
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:19
End time: 2022-01-29, 18:06:22
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5005
   test_prec_avg = 0.6727
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:22
End time: 2022-01-29, 18:06:25
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:25
End time: 2022-01-29, 18:06:28
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:28
End time: 2022-01-29, 18:06:31
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5003.


Performance metrics evaluated at test data:
   test_roc_auc = 0.501
   test_prec_avg = 0.6729
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:31
End time: 2022-01-29, 18:06:34
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:34
End time: 2022-01-29, 18:06:37
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5003
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:37
End time: 2022-01-29, 18:06:40
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:40
End time: 2022-01-29, 18:06:43
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:43
End time: 2022-01-29, 18:06:46
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:46
End time: 2022-01-29, 18:06:49
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5015.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5006
   test_prec_avg = 0.6727
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:49
End time: 2022-01-29, 18:06:52
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5003
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:52
End time: 2022-01-29, 18:06:55
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:55
End time: 2022-01-29, 18:06:58
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:06:58
End time: 2022-01-29, 18:07:01
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:01
End time: 2022-01-29, 18:07:04
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:04
End time: 2022-01-29, 18:07:07
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:07
End time: 2022-01-29, 18:07:10
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5014.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5019
   test_prec_avg = 0.6734
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:10
End time: 2022-01-29, 18:07:13
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5006.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:13
End time: 2022-01-29, 18:07:16
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:16
End time: 2022-01-29, 18:07:19
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 1}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5002
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:20
End time: 2022-01-29, 18:07:23
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:23
End time: 2022-01-29, 18:07:26
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:26
End time: 2022-01-29, 18:07:29
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:29
End time: 2022-01-29, 18:07:32
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:32
End time: 2022-01-29, 18:07:35
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:35
End time: 2022-01-29, 18:07:38
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:39
End time: 2022-01-29, 18:07:42
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:42
End time: 2022-01-29, 18:07:45
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:45
End time: 2022-01-29, 18:07:48
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:48
End time: 2022-01-29, 18:07:51
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:51
End time: 2022-01-29, 18:07:54
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:54
End time: 2022-01-29, 18:07:57
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:07:57
End time: 2022-01-29, 18:08:00
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:01
End time: 2022-01-29, 18:08:04
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:04
End time: 2022-01-29, 18:08:07
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5001
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:07
End time: 2022-01-29, 18:08:10
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.3}.
   CV performance metric associated with best hyper-parameters: 0.502.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4992
   test_prec_avg = 0.6721
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:10
End time: 2022-01-29, 18:08:13
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5025.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4994
   test_prec_avg = 0.6722
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:13
End time: 2022-01-29, 18:08:16
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.52.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5162
   test_prec_avg = 0.6797
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:17
End time: 2022-01-29, 18:08:20
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.504.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5023
   test_prec_avg = 0.6734
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:20
End time: 2022-01-29, 18:08:23
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5045.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5111
   test_prec_avg = 0.6778
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:23
End time: 2022-01-29, 18:08:26
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 1}.
   CV performance metric associated with best hyper-parameters: 0.5003.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5003
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:26
End time: 2022-01-29, 18:08:29
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:30
End time: 2022-01-29, 18:08:33
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5006.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:33
End time: 2022-01-29, 18:08:36
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:36
End time: 2022-01-29, 18:08:39
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:39
End time: 2022-01-29, 18:08:42
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.01}.
   CV performance metric associated with best hyper-parameters: 0.5351.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5372
   test_prec_avg = 0.691
   test_brier = 0.2197
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:08:42
End time: 2022-01-29, 18:08:46
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:46
End time: 2022-01-29, 18:08:49
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5014.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4994
   test_prec_avg = 0.6722
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:50
End time: 2022-01-29, 18:08:53
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:08:53
End time: 2022-01-29, 18:08:56
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.01}.
   CV performance metric associated with best hyper-parameters: 0.5745.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5743
   test_prec_avg = 0.7088
   test_brier = 0.2166
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:08:56
End time: 2022-01-29, 18:09:00
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:00
End time: 2022-01-29, 18:09:03
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5012.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5012
   test_prec_avg = 0.6729
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:03
End time: 2022-01-29, 18:09:06
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.01}.
   CV performance metric associated with best hyper-parameters: 0.5316.


Performance metrics evaluated at test data:
   test_roc_auc = 0.539
   test_prec_avg = 0.6903
   test_brier = 0.2199
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:09:06
End time: 2022-01-29, 18:09:10
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5149.


Performance metrics evaluated at test data:
   test_roc_auc = 0.511
   test_prec_avg = 0.6777
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:10
End time: 2022-01-29, 18:09:13
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5025.


Performance metrics evaluated at test data:
   test_roc_auc = 0.502
   test_prec_avg = 0.6733
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:13
End time: 2022-01-29, 18:09:16
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5227.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5127
   test_prec_avg = 0.6781
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:09:16
End time: 2022-01-29, 18:09:20
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5007.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:20
End time: 2022-01-29, 18:09:23
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.512.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5066
   test_prec_avg = 0.6753
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:23
End time: 2022-01-29, 18:09:26
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:26
End time: 2022-01-29, 18:09:29
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5022.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5017
   test_prec_avg = 0.6732
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:29
End time: 2022-01-29, 18:09:32
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:09:32
End time: 2022-01-29, 18:09:35
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.07 minutes.
Start time: 2022-01-29, 18:09:35
End time: 2022-01-29, 18:09:40
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5014.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5007
   test_prec_avg = 0.6728
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.07 minutes.
Start time: 2022-01-29, 18:09:40
End time: 2022-01-29, 18:09:45
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.3}.
   CV performance metric associated with best hyper-parameters: 0.5007.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.08 minutes.
Start time: 2022-01-29, 18:09:45
End time: 2022-01-29, 18:09:50
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.09 minutes.
Start time: 2022-01-29, 18:09:50
End time: 2022-01-29, 18:09:55
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.08 minutes.
Start time: 2022-01-29, 18:09:56
End time: 2022-01-29, 18:10:00
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.07 minutes.
Start time: 2022-01-29, 18:10:01
End time: 2022-01-29, 18:10:05
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4998
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:10:05
End time: 2022-01-29, 18:10:09
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:10:09
End time: 2022-01-29, 18:10:12
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5013.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5039
   test_prec_avg = 0.6742
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:12
End time: 2022-01-29, 18:10:15
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5002
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:16
End time: 2022-01-29, 18:10:18
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5007.


Performance metrics evaluated at test data:
   test_roc_auc = 0.501
   test_prec_avg = 0.6729
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:19
End time: 2022-01-29, 18:10:22
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:22
End time: 2022-01-29, 18:10:25
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5091.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5113
   test_prec_avg = 0.6775
   test_brier = 0.2202
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:25
End time: 2022-01-29, 18:10:28
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5016.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5016
   test_prec_avg = 0.6731
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:10:28
End time: 2022-01-29, 18:10:31
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5118.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5185
   test_prec_avg = 0.6807
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.08 minutes.
Start time: 2022-01-29, 18:10:31
End time: 2022-01-29, 18:10:36
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5004
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.08 minutes.
Start time: 2022-01-29, 18:10:37
End time: 2022-01-29, 18:10:42
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5035.


Performance metrics evaluated at test data:
   test_roc_auc = 0.503
   test_prec_avg = 0.6737
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.08 minutes.
Start time: 2022-01-29, 18:10:42
End time: 2022-01-29, 18:10:47
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5014.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5011
   test_prec_avg = 0.6729
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.07 minutes.
Start time: 2022-01-29, 18:10:47
End time: 2022-01-29, 18:10:51
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.07 minutes.
Start time: 2022-01-29, 18:10:52
End time: 2022-01-29, 18:10:56
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 1}.
   CV performance metric associated with best hyper-parameters: 0.5003.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5001
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:56
End time: 2022-01-29, 18:10:59
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5086.


Performance metrics evaluated at test data:
   test_roc_auc = 0.51
   test_prec_avg = 0.6769
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:10:59
End time: 2022-01-29, 18:11:02
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:02
End time: 2022-01-29, 18:11:05
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5029.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5027
   test_prec_avg = 0.6739
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:05
End time: 2022-01-29, 18:11:08
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:08
End time: 2022-01-29, 18:11:11
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:11
End time: 2022-01-29, 18:11:14
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5102.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5119
   test_prec_avg = 0.6777
   test_brier = 0.22
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:14
End time: 2022-01-29, 18:11:17
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:11:18
End time: 2022-01-29, 18:11:21
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:21
End time: 2022-01-29, 18:11:24
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5006
   test_prec_avg = 0.6727
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:24
End time: 2022-01-29, 18:11:27
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5034.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5032
   test_prec_avg = 0.6739
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:27
End time: 2022-01-29, 18:11:30
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:30
End time: 2022-01-29, 18:11:33
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:33
End time: 2022-01-29, 18:11:37
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:37
End time: 2022-01-29, 18:11:40
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:40
End time: 2022-01-29, 18:11:43
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4999
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:43
End time: 2022-01-29, 18:11:46
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:46
End time: 2022-01-29, 18:11:49
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:49
End time: 2022-01-29, 18:11:52
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5004.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5002
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:52
End time: 2022-01-29, 18:11:55
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:55
End time: 2022-01-29, 18:11:58
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5001
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:11:58
End time: 2022-01-29, 18:12:01
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5002
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:01
End time: 2022-01-29, 18:12:04
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5009
   test_prec_avg = 0.6728
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:13
End time: 2022-01-29, 18:12:16
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 3}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4997
   test_prec_avg = 0.6723
   test_brier = 0.2207
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:16
End time: 2022-01-29, 18:12:19
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5151.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5127
   test_prec_avg = 0.6781
   test_brier = 0.22
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:19
End time: 2022-01-29, 18:12:22
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5047.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5024
   test_prec_avg = 0.6735
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:22
End time: 2022-01-29, 18:12:26
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.501.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5005
   test_prec_avg = 0.6727
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:26
End time: 2022-01-29, 18:12:29
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.503.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5012
   test_prec_avg = 0.673
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:29
End time: 2022-01-29, 18:12:32
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5013.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4993
   test_prec_avg = 0.6721
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:32
End time: 2022-01-29, 18:12:35
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.506.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5085
   test_prec_avg = 0.677
   test_brier = 0.2203
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:35
End time: 2022-01-29, 18:12:38
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5059.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5041
   test_prec_avg = 0.6743
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:38
End time: 2022-01-29, 18:12:41
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.01}.
   CV performance metric associated with best hyper-parameters: 0.5297.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5315
   test_prec_avg = 0.6876
   test_brier = 0.2197
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:41
End time: 2022-01-29, 18:12:44
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.51.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5105
   test_prec_avg = 0.6784
   test_brier = 0.2198
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:45
End time: 2022-01-29, 18:12:48
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.5}.
   CV performance metric associated with best hyper-parameters: 0.5005.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5004
   test_prec_avg = 0.6726
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:48
End time: 2022-01-29, 18:12:51
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5002.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:51
End time: 2022-01-29, 18:12:54
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5014.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4992
   test_prec_avg = 0.6721
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:54
End time: 2022-01-29, 18:12:57
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:12:57
End time: 2022-01-29, 18:13:00
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 1}.
   CV performance metric associated with best hyper-parameters: 0.5004.


Performance metrics evaluated at test data:
   test_roc_auc = 0.501
   test_prec_avg = 0.6729
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:00
End time: 2022-01-29, 18:13:03
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:13:03
End time: 2022-01-29, 18:13:06
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5017.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4981
   test_prec_avg = 0.6716
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:07
End time: 2022-01-29, 18:13:10
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:10
End time: 2022-01-29, 18:13:13
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5012.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5019
   test_prec_avg = 0.6733
   test_brier = 0.2204
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.06 minutes.
Start time: 2022-01-29, 18:13:13
End time: 2022-01-29, 18:13:16
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.25}.
   CV performance metric associated with best hyper-parameters: 0.5015.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4987
   test_prec_avg = 0.6719
   test_brier = 0.2206
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:16
End time: 2022-01-29, 18:13:19
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.75}.
   CV performance metric associated with best hyper-parameters: 0.5004.


Performance metrics evaluated at test data:
   test_roc_auc = 0.4997
   test_prec_avg = 0.6723
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:19
End time: 2022-01-29, 18:13:22
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:22
End time: 2022-01-29, 18:13:25
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 10}.
   CV performance metric associated with best hyper-parameters: 0.5001.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5001
   test_prec_avg = 0.6725
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:26
End time: 2022-01-29, 18:13:29
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:29
End time: 2022-01-29, 18:13:32
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5033.


Performance metrics evaluated at test data:
   test_roc_auc = 0.502
   test_prec_avg = 0.6734
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:32
End time: 2022-01-29, 18:13:35
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:35
End time: 2022-01-29, 18:13:38
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5095.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5104
   test_prec_avg = 0.6771
   test_brier = 0.2197
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:38
End time: 2022-01-29, 18:13:41
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5034.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5021
   test_prec_avg = 0.6735
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:41
End time: 2022-01-29, 18:13:44
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5099.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5052
   test_prec_avg = 0.6747
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:44
End time: 2022-01-29, 18:13:47
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5071.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5099
   test_prec_avg = 0.6768
   test_brier = 0.2202
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:47
End time: 2022-01-29, 18:13:50
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:50
End time: 2022-01-29, 18:13:53
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:53
End time: 2022-01-29, 18:13:56
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:56
End time: 2022-01-29, 18:13:59
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.03}.
   CV performance metric associated with best hyper-parameters: 0.5092.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5107
   test_prec_avg = 0.6772
   test_brier = 0.2195
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:13:59
End time: 2022-01-29, 18:14:02
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.1}.
   CV performance metric associated with best hyper-parameters: 0.5036.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5034
   test_prec_avg = 0.6739
   test_brier = 0.2205
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:14:02
End time: 2022-01-29, 18:14:05
------------------------------------




---------------------------------------------------------------------
[1mTrain-test estimation outcomes:[0m


Outcomes from K-folds CV estimation:
   Number of data folds: 5.
   Estimation method: logistic regression.
   Metric for choosing best hyper-parameter: roc_auc.
   Best hyper-parameters: {'C': 0.0001}.
   CV performance metric associated with best hyper-parameters: 0.5.


Performance metrics evaluated at test data:
   test_roc_auc = 0.5
   test_prec_avg = 0.6724
   test_brier = 0.25
---------------------------------------------------------------------


------------------------------------
[1mRunning time:[0m 0.05 minutes.
Start time: 2022-01-29, 18:14:06
End time: 2022-01-29, 18:14:08
------------------------------------


In [None]:
indiv_modeling = pd.DataFrame(data={
    'feature': features, 'test_roc_auc': test_roc_auc, 'coefs': [c[0][0] for c in coefs]
}).sort_values('test_roc_auc', ascending=False)
indiv_modeling.head(10)

Unnamed: 0,feature,test_roc_auc,coefs
1,number_of_ratings,0.735114,-7.181053e-08
2,price,0.636725,0.0839036
85,network_communication_view_network_state,0.574346,-0.4501545
0,rating,0.568921,-0.1237575
4,safe_permissions_count,0.562917,-0.03663021
88,phone_calls_read_phone_state_and_identity,0.538968,-0.1249008
81,network_communication_full_internet_access,0.537204,-0.1659488
3,dangerous_permissions_count,0.534299,-0.01500169
150,your_location_fine,0.531454,0.2409798
111,system_tools_prevent_device_from_sleeping,0.518462,-0.085723


In [None]:
plot_bar(data=indiv_modeling.head(10), x=['feature'], y=['test_roc_auc'], pos=[(1,1)],
             titles=['Test ROC-AUC of individual modeling'], width=900, height=600)

In [None]:
plot_bar(data=indiv_modeling.head(10), x=['feature'], y=['coefs'], pos=[(1,1)],
         titles=['Coefficient of individual modeling'], width=900, height=600)

In [None]:
if EXPORT:
  indiv_modeling.to_csv('../experiments/indiv_modeling.csv', index=False)

<a id='feat_eng'></a>

## Feature engineering

<a id='num_related_apps'></a>

### Number of related apps

In [None]:
# Creating the variable with the number of related apps:
df_train['num_related_apps'] = df_train['related_apps'].apply(lambda x: np.NaN if pd.isna(x) else len(x.split(', ')))
df_test['num_related_apps'] = df_test['related_apps'].apply(lambda x: np.NaN if pd.isna(x) else len(x.split(', ')))

# Updating the list of auxiliary variables:
drop_vars.append('related_apps')

In [None]:
new_var = 'num_related_apps'

# Describing the new feature:
data_und_update = pd.DataFrame(data={
    'feature': new_var, 'type': df_train.dtypes[new_var], 'n_unique': df_train[new_var].nunique(),
    'sample_values': str(list(df_train[new_var].unique()) if len(list(df_train[new_var].unique())) <= 10 else np.random.choice(list(df_train[new_var].unique()),
                                                                                                                               size=10,
                                                                                                                               replace=False)),
    'num_missings': df_train[new_var].isnull().sum(), 'share_missings': df_train[new_var].isnull().sum()/len(df_train),
    'var_class': 'numerical', 'category': 'app_attributes'
}, index=[0])

# Updating the data understanding dataframe:
data_und = pd.concat([data_und, data_und_update], axis=0, sort=False)
data_und.tail(1)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class,category
0,num_related_apps,float64,4,"[4.0, 1.0, nan, 3.0, 2.0]",484,0.026451,numerical,app_attributes


#### P(X)

In [None]:
df_train['num_related_apps'].describe()

count    17814.000000
mean         3.926069
std          0.418756
min          1.000000
25%          4.000000
50%          4.000000
75%          4.000000
max          4.000000
Name: num_related_apps, dtype: float64

In [None]:
share_miss_new = df_train.num_related_apps.isnull().sum()/len(df_train)
print(f'Share of missings: {share_miss_new:.4f}.')

Share of missings: 0.0265.


#### P(X|Y)

In [None]:
df_train.groupby('class').describe()[['num_related_apps']]

Unnamed: 0_level_0,num_related_apps,num_related_apps,num_related_apps,num_related_apps,num_related_apps,num_related_apps,num_related_apps,num_related_apps
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,6040.0,3.930464,0.409523,1.0,4.0,4.0,4.0,4.0
1,11774.0,3.923815,0.423415,1.0,4.0,4.0,4.0,4.0


<a id='num_words_desc'></a>

### Number of words in description

In [None]:
# Creating the variable that indicates the number of words in a description:
df_train['num_words_desc'] = df_train.description.apply(lambda x: x if pd.isna(x) else len(x.split(' ')))
df_test['num_words_desc'] = df_test.description.apply(lambda x: x if pd.isna(x) else len(x.split(' ')))

# Updating the list of auxiliary variables:
drop_vars.append('description')

In [None]:
new_var = 'num_words_desc'

# Describing the new feature:
data_und_update = pd.DataFrame(data={
    'feature': new_var, 'type': df_train.dtypes[new_var], 'n_unique': df_train[new_var].nunique(),
    'sample_values': str(list(df_train[new_var].unique()) if len(list(df_train[new_var].unique())) <= 10 else np.random.choice(list(df_train[new_var].unique()),
                                                                                                                               size=10,
                                                                                                                               replace=False)),
    'num_missings': df_train[new_var].isnull().sum(), 'share_missings': df_train[new_var].isnull().sum()/len(df_train),
    'var_class': 'numerical', 'category': 'app_attributes'
}, index=[0])

# Updating the data understanding dataframe:
data_und = pd.concat([data_und, data_und_update], axis=0, sort=False)
data_und.tail(1)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class,category
0,num_words_desc,float64,700,[ 23. 490. 234. nan 183. 593. 660. 587. 626. ...,3,0.000164,numerical,app_attributes


#### P(X)

In [None]:
df_train['num_words_desc'].describe()

count    18295.000000
mean       171.399344
std        151.119600
min          1.000000
25%         56.000000
50%        120.000000
75%        248.000000
max       2140.000000
Name: num_words_desc, dtype: float64

In [None]:
share_miss_new = df_train.num_words_desc.isnull().sum()/len(df_train)
print(f'Share of missings: {share_miss_new:.4f}.')

Share of missings: 0.0002.


#### P(X|Y)

In [None]:
df_train.groupby('class').describe()[['num_words_desc']]

Unnamed: 0_level_0,num_words_desc,num_words_desc,num_words_desc,num_words_desc,num_words_desc,num_words_desc,num_words_desc,num_words_desc
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,6088.0,150.166393,135.681855,1.0,53.0,103.0,204.0,771.0
1,12207.0,181.988859,157.19406,1.0,58.0,131.0,269.0,2140.0


<a id='share_malware_related_apps'></a>

### Share of malware related apps

In [None]:
# Number of known related apps:
df_train['num_known_apps'] = df_train[['app_id', 'related_apps']].apply(lambda x: known_related_apps(data=df_train[df_train.app_id!=x['app_id']],
                                                                                                     related_apps=x['related_apps']),
                                                                        axis=1)
df_test['num_known_apps'] = df_test[['app_id', 'related_apps']].apply(lambda x: known_related_apps(data=df_train[df_train.app_id!=x['app_id']],
                                                                                                   related_apps=x['related_apps']),
                                                                      axis=1)

# Share of related apps that are known:
df_train['share_known'] = df_train['num_known_apps']/df_train['num_related_apps']
df_test['share_known'] = df_test['num_known_apps']/df_test['num_related_apps']

# Number of known related apps that are malwares:
df_train['num_known_malwares'] = df_train[['app_id', 'related_apps']].apply(lambda x: related_malwares(data=df_train[df_train.app_id!=x['app_id']],
                                                                                                       related_apps=x['related_apps']),
                                                                            axis=1)
df_test['num_known_malwares'] = df_test[['app_id', 'related_apps']].apply(lambda x: related_malwares(data=df_train[df_train.app_id!=x['app_id']],
                                                                                                     related_apps=x['related_apps']),
                                                                          axis=1)

# Share of known related apps that are malwares:
df_train['share_known_malwares'] = df_train['num_known_malwares']/df_train['num_known_apps']
df_test['share_known_malwares'] = df_test['num_known_malwares']/df_test['num_known_apps']

[1;30;43mA saída de streaming foi truncada nas últimas 5000 linhas.[0m

Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invalid value encountered in double_scalars


Mean of empty slice.


invali

In [None]:
new_var = 'num_known_apps'

# Describing the new feature:
data_und_update = pd.DataFrame(data={
    'feature': new_var, 'type': df_train.dtypes[new_var], 'n_unique': df_train[new_var].nunique(),
    'sample_values': str(list(df_train[new_var].unique()) if len(list(df_train[new_var].unique())) <= 10 else np.random.choice(list(df_train[new_var].unique()),
                                                                                                                               size=10,
                                                                                                                               replace=False)),
    'num_missings': df_train[new_var].isnull().sum(), 'share_missings': df_train[new_var].isnull().sum()/len(df_train),
    'var_class': 'numerical', 'category': 'app_attributes'
}, index=[0])

# Updating the data understanding dataframe:
data_und = pd.concat([data_und, data_und_update], axis=0, sort=False)
data_und.tail(1)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class,category
0,num_known_apps,float64,5,"[0.0, 1.0, 2.0, 3.0, nan, 4.0]",484,0.026451,numerical,app_attributes


In [None]:
new_var = 'share_known'

# Describing the new feature:
data_und_update = pd.DataFrame(data={
    'feature': new_var, 'type': df_train.dtypes[new_var], 'n_unique': df_train[new_var].nunique(),
    'sample_values': str(list(df_train[new_var].unique()) if len(list(df_train[new_var].unique())) <= 10 else np.random.choice(list(df_train[new_var].unique()),
                                                                                                                               size=10,
                                                                                                                               replace=False)),
    'num_missings': df_train[new_var].isnull().sum(), 'share_missings': df_train[new_var].isnull().sum()/len(df_train),
    'var_class': 'numerical', 'category': 'app_attributes'
}, index=[0])

# Updating the data understanding dataframe:
data_und = pd.concat([data_und, data_und_update], axis=0, sort=False)
data_und.tail(1)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class,category
0,share_known,float64,7,"[0.0, 0.25, 0.5, 0.75, nan, 0.3333333333333333...",484,0.026451,numerical,app_attributes


In [None]:
new_var = 'num_known_malwares'

# Describing the new feature:
data_und_update = pd.DataFrame(data={
    'feature': new_var, 'type': df_train.dtypes[new_var], 'n_unique': df_train[new_var].nunique(),
    'sample_values': str(list(df_train[new_var].unique()) if len(list(df_train[new_var].unique())) <= 10 else np.random.choice(list(df_train[new_var].unique()),
                                                                                                                               size=10,
                                                                                                                               replace=False)),
    'num_missings': df_train[new_var].isnull().sum(), 'share_missings': df_train[new_var].isnull().sum()/len(df_train),
    'var_class': 'numerical', 'category': 'app_attributes'
}, index=[0])

# Updating the data understanding dataframe:
data_und = pd.concat([data_und, data_und_update], axis=0, sort=False)
data_und.tail(1)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class,category
0,num_known_malwares,float64,5,"[0.0, 1.0, 3.0, 2.0, nan, 4.0]",484,0.026451,numerical,app_attributes


In [None]:
new_var = 'share_known_malwares'

# Describing the new feature:
data_und_update = pd.DataFrame(data={
    'feature': new_var, 'type': df_train.dtypes[new_var], 'n_unique': df_train[new_var].nunique(),
    'sample_values': str(list(df_train[new_var].unique()) if len(list(df_train[new_var].unique())) <= 10 else np.random.choice(list(df_train[new_var].unique()),
                                                                                                                               size=10,
                                                                                                                               replace=False)),
    'num_missings': df_train[new_var].isnull().sum(), 'share_missings': df_train[new_var].isnull().sum()/len(df_train),
    'var_class': 'numerical', 'category': 'app_attributes'
}, index=[0])

# Updating the data understanding dataframe:
data_und = pd.concat([data_und, data_und_update], axis=0, sort=False)
data_und.tail(1)

Unnamed: 0,feature,type,n_unique,sample_values,num_missings,share_missings,var_class,category
0,share_known_malwares,float64,7,"[nan, 1.0, 0.0, 0.5, 0.3333333333333333, 0.666...",10047,0.549076,numerical,app_attributes


#### P(X)

In [None]:
df_train[['num_known_apps', 'share_known', 'num_known_malwares', 'share_known_malwares']].describe()

Unnamed: 0,num_known_apps,share_known,num_known_malwares,share_known_malwares
count,17814.0,17814.0,17814.0,8251.0
mean,0.732514,0.187493,0.440272,0.615582
std,0.961518,0.247452,0.782749,0.465083
min,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,1.0
75%,1.0,0.25,1.0,1.0
max,4.0,1.0,4.0,1.0


In [None]:
share_miss_new = df_train.num_known_apps.isnull().sum()/len(df_train)
print(f'Share of missings of num_known_apps: {share_miss_new:.4f}.')

share_miss_new = df_train.share_known.isnull().sum()/len(df_train)
print(f'Share of missings of share_known: {share_miss_new:.4f}.')

share_miss_new = df_train.num_known_malwares.isnull().sum()/len(df_train)
print(f'Share of missings of num_known_malwares: {share_miss_new:.4f}.')

share_miss_new = df_train.share_known_malwares.isnull().sum()/len(df_train)
print(f'Share of missings of share_known_malwares: {share_miss_new:.4f}.')

Share of missings of num_known_apps: 0.0265.
Share of missings of share_known: 0.0265.
Share of missings of num_known_malwares: 0.0265.
Share of missings of share_known_malwares: 0.5491.


#### P(X|Y)

In [None]:
df_train.groupby('class').describe()[['num_known_apps', 'share_known', 'num_known_malwares', 'share_known_malwares']]

Unnamed: 0_level_0,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,share_known,share_known,share_known,share_known,share_known,share_known,share_known,share_known,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2
0,6040.0,0.928808,1.025861,0.0,0.0,1.0,2.0,4.0,6040.0,0.237252,0.262961,0.0,0.0,0.25,0.5,1.0,6040.0,0.264735,0.599399,0.0,0.0,0.0,0.0,4.0,3393.0,0.296616,0.423771,0.0,0.0,0.0,0.666667,1.0
1,11774.0,0.631816,0.910546,0.0,0.0,0.0,1.0,4.0,11774.0,0.161967,0.235064,0.0,0.0,0.0,0.25,1.0,11774.0,0.530321,0.847833,0.0,0.0,0.0,1.0,4.0,4858.0,0.838359,0.348251,0.0,1.0,1.0,1.0,1.0


In [None]:
df_test.groupby('class').describe()[['num_known_apps', 'share_known', 'num_known_malwares', 'share_known_malwares']]

Unnamed: 0_level_0,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,num_known_apps,share_known,share_known,share_known,share_known,share_known,share_known,share_known,share_known,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,num_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares,share_known_malwares
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2
0,2957.0,0.909706,1.037995,0.0,0.0,1.0,2.0,4.0,2957.0,0.231654,0.265502,0.0,0.0,0.25,0.5,1.0,2957.0,0.251945,0.562501,0.0,0.0,0.0,0.0,4.0,1606.0,0.30054,0.423323,0.0,0.0,0.0,0.666667,1.0
1,5819.0,0.608696,0.890344,0.0,0.0,0.0,1.0,4.0,5819.0,0.155052,0.227901,0.0,0.0,0.0,0.25,1.0,5819.0,0.500258,0.814398,0.0,0.0,0.0,1.0,4.0,2334.0,0.826799,0.360071,0.0,1.0,1.0,1.0,1.0


<a id='nlp'></a>

### Natural language processing

In [None]:
### Possibilities:
  ### BOW and TF-IDF.
  ### LLR for selecting risky words.
  ### Other approaches for selecting risky words.
    ### For instance: treat texts, create binary variables for each word, drop irrelevant words (dummy variable with variance less than 0.01), train
    ### bivariate models (outcome variable against each dummy variable), select only those words whose model has ROC-AUC calculated through K-folds CV
    ### higher than 0.5.
    ### Alternatively, instead of applying supervised learning models, calculate, for each dummy variable, differences in means of the dummy variable with
    ### y = 1 against y = 0, or then differences in means of y with the dummy equals to 1 against the dummy equals to 0.

<a id='export_data'></a>

### Exporting training and test data

In [None]:
if EXPORT:
  data_und.to_csv('../data/features.csv', index=False)
  df_train.to_csv('../data/training_data.csv', index=False)
  df_test.to_csv('../data/test_data.csv', index=False)