# Transactions Fraud Detection

**Authors:** [Peter Macinec](https://github.com/pmacinec), [Timotej Zatko](https://github.com/timzatko)

## Preprocessing

In this jupyter notebook, we will preprocess the data. Preprocessed data can be then used for classification.

### Setup and reading the data

At first, we need to import libraries and set initial configs.

In [1]:
# Automatically reload imported modules
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append('..')

# Supress libraries deprecation import warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
import pandas as pd
import numpy as np

from sklearn.pipeline import make_pipeline

from src.preprocessing.transformers import *
from src.preprocessing.pandas_feature_union import PandasFeatureUnion
from src.preprocessing.pandas_one_hot_encoder import PandasOneHotEncoder
from src.preprocessing.pandas_simple_imputer import PandasSimpleImputer
from src.preprocessing.pandas_missing_indicator import PandasMissingIndicator

from src.dataset import load_data, split_and_save_processed_data

In [4]:
pd.set_option('display.max_columns', 600)
pd.set_option('display.width', 1000)

The data will be loaded using our function that optimizes data types of attributes (this loading saves a lot of memory):

In [5]:
df = load_data()

In [6]:
df.shape

(590540, 434)

### Define preprocessing pipeline

Our preprocessing will be done via preprocessing **pipeline**. Preprocessing with pipelines is commonly used to ensure reproducibility.

The steps of preprocessing are defined according to results of data analysis phase:

TODO
* define main steps identified in data analysis

In [7]:
categoric_features = df.select_dtypes(include=np.object).columns.to_list()
numeric_features = df.select_dtypes(exclude=np.object).columns.to_list()
label_feature = 'isFraud'

numeric_features.remove('TransactionID') # ID should not be used
numeric_features.remove(label_feature)

pipeline = PandasFeatureUnion([
    ('numeric_features', make_pipeline(
        SelectFeatures(numeric_features),
        FilterColumnsByCountOfMissingValues(0.5),
        PandasSimpleImputer(strategy='mean'),
        PandasMissingIndicator(),
        Normalizer()
    )),
    ('categoric_features', make_pipeline(
        SelectFeatures(categoric_features),
        EmailProviderTransform(['P_emaildomain', 'R_emaildomain']),
        PandasSimpleImputer(strategy='most_frequent'),
        PandasMissingIndicator(),
        MergeSmallCategories(),
        PandasOneHotEncoder()
    )),
    ('label_feature', make_pipeline(
        SelectFeatures([label_feature])
    ))
])

In [8]:
%%time
df_preprocessed = pipeline.fit_transform(df)

CPU times: user 14min 5s, sys: 23.3 s, total: 14min 28s
Wall time: 14min 25s


In [11]:
df_preprocessed.shape

(590540, 277)

After preprocessing there are 590540 rows with 539 features.

In [12]:
df_preprocessed

Unnamed: 0,TransactionDT,TransactionAmt,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,D1,D2,D3,D4,D10,D11,D15,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34,V35,V36,V37,V38,V39,V40,V41,V42,V43,V44,V45,V46,V47,V48,V49,V50,V51,V52,V53,V54,V55,V56,V57,V58,V59,V60,V61,V62,V63,V64,V65,V66,V67,V68,V69,V70,V71,V72,V73,V74,V75,V76,V77,V78,V79,V80,V81,V82,V83,V84,V85,V86,V87,V88,V89,V90,V91,V92,V93,V94,V95,V96,V97,V98,V99,V100,V101,V102,V103,V104,V105,V106,V107,V108,V109,V110,V111,V112,V113,V114,V115,V116,V117,V118,V119,V120,V121,V122,V123,V124,V125,V126,V127,V128,V129,V130,V131,V132,V133,V134,V135,V136,V137,V279,V280,V281,V282,V283,V284,V285,V286,V287,V288,V289,V290,V291,V292,V293,V294,V295,V296,V297,V298,V299,V300,V301,V302,V303,V304,V305,V306,V307,V308,V309,V310,V311,V312,V313,V314,V315,V316,V317,V318,V319,V320,V321,ProductCD_H,ProductCD_R,ProductCD_W,ProductCD_other,ProductCD_nan,card1_nan,card2_nan,card3_nan,card4_other,card4_visa,card4_nan,card5_nan,card6_debit,card6_other,card6_nan,addr1_nan,addr2_other,addr2_nan,P_emaildomain_nan,P_emaildomain_other,P_emaildomain_nan.1,R_emaildomain_nan,R_emaildomain_other,R_emaildomain_nan.1,M1_nan,M2_nan,M3_nan,M4_M1,M4_M2,M4_nan,M5_nan,M6_nan,M7_nan,M8_nan,M9_nan,id_12_nan,id_13_nan,id_14_other,id_14_nan,id_15_other,id_15_nan,id_16_nan,id_17_nan,id_18_nan,id_19_nan,id_20_nan,id_21_nan,id_22_nan,id_23_other,id_23_nan,id_24_nan,id_25_nan,id_26_nan,id_27_nan,id_28_other,id_28_nan,id_29_nan,id_30_other,id_30_nan,id_31_other,id_31_nan,id_32_nan,id_33_other,id_33_nan,id_34_other,id_34_nan,id_35_nan,id_36_nan,id_37_nan,id_38_nan,DeviceType_other,DeviceType_nan,DeviceInfo_nan,isFraud
0,-1.577985,-0.278174,-0.098021,-0.092260,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.087363,-0.047034,-0.243806,-0.147246,-0.510173,-4.568110e-13,-3.301823e-01,-3.083695e-12,-6.510893e-01,-9.893091e-01,-8.765505e-01,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,9.239738e-01,8.069527e-01,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,4.623547e-01,3.548138e-01,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,-8.137181e-01,-7.855703e-01,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,-2.382748e-12,2.477431e-12,1.057952e-12,-5.229004e-13,2.295388e-13,1.411796e-12,8.837807e-11,-9.415936e-13,-1.104464e-12,-4.157024e-12,-2.405210e-12,-1.801105e-11,7.792423e-12,1.703315e-12,1.025730e-12,-1.503228e-12,-1.087911e-12,1.616170e-12,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,4.180961e-01,2.935747e-01,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,-8.146292e-01,-7.889622e-01,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,9.616375e-01,8.316462e-01,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,-2.169416e+00,-2.033282e+00,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,-8.449706e-01,-8.121510e-01,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.049362,-0.049845,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.023030,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.051825,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.023157,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.332330,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.033240,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.066413,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.032816,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
1,-1.577985,-0.443337,-0.098021,-0.092260,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.268682,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-8.670567e-01,-7.273552e-01,8.145714e-13,-8.765505e-01,-2.238147e-10,1.390975e-11,-3.017612e-12,-1.153957e-11,7.392674e-12,-2.702356e-11,1.454394e-11,-1.740229e-11,1.307106e-11,-4.022651e-12,3.499030e-12,-1.174590e+00,-1.206230e+00,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,4.623547e-01,3.548138e-01,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,-8.137181e-01,-7.855703e-01,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,-1.244533e+00,-1.271734e+00,-1.852104e-01,-2.237803e-01,-4.349123e-01,-4.145243e-01,3.200277e-02,-4.825729e-01,-4.614016e-01,-1.553483e-01,-1.958815e-01,-1.582088e-01,-1.966051e-01,-8.924011e-01,-8.674586e-01,-5.213594e-01,-4.998520e-01,-4.925501e-01,-1.210847e+00,-1.243655e+00,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,4.180961e-01,2.935747e-01,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,-8.146292e-01,-7.889622e-01,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,-1.148504e+00,-1.184745e+00,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,3.991268e-01,2.721183e-01,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,-8.449706e-01,-8.121510e-01,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.332330,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.058540,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
2,-1.577970,-0.317897,-0.098021,-0.092260,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-8.670567e-01,-7.273552e-01,1.246644e+00,8.096941e-01,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,9.239738e-01,8.069527e-01,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,4.623547e-01,3.548138e-01,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,-8.137181e-01,-7.855703e-01,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,1.049139e+00,9.239479e-01,-1.852104e-01,-2.237803e-01,-4.349123e-01,-4.145243e-01,3.200277e-02,-4.825729e-01,-4.614016e-01,-1.553483e-01,-1.958815e-01,-1.582088e-01,-1.966051e-01,-8.924011e-01,-8.674586e-01,-5.213594e-01,-4.998520e-01,-4.925501e-01,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,4.180961e-01,2.935747e-01,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,-8.146292e-01,-7.889622e-01,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,9.616375e-01,8.316462e-01,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,3.991268e-01,2.721183e-01,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,-8.449706e-01,-8.121510e-01,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.332330,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.058540,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
3,-1.577964,-0.355529,-0.090534,-0.066398,-0.037493,-0.059438,-0.21606,-0.070916,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.058284,-0.147246,0.112085,-4.482516e-01,-6.099368e-01,-2.849002e-01,-2.345597e-01,8.145714e-13,-2.823500e-01,-2.238147e-10,1.390975e-11,-3.017612e-12,-1.153957e-11,7.392674e-12,-2.702356e-11,1.454394e-11,-1.740229e-11,1.307106e-11,-4.022651e-12,3.499030e-12,9.239738e-01,8.069527e-01,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,4.623547e-01,3.548138e-01,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,-8.137181e-01,-7.855703e-01,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,1.049139e+00,9.239479e-01,-1.852104e-01,-2.237803e-01,-4.349123e-01,-4.145243e-01,3.200277e-02,-4.825729e-01,-4.614016e-01,-1.553483e-01,-1.958815e-01,-1.582088e-01,-1.966051e-01,-8.924011e-01,-8.674586e-01,-5.213594e-01,-4.998520e-01,-4.925501e-01,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,4.180961e-01,2.935747e-01,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,-8.146292e-01,-7.889622e-01,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,9.616375e-01,8.316462e-01,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,3.991268e-01,2.721183e-01,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,-8.449706e-01,-8.121510e-01,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.001808,1.118473,0.949012,-0.217552,3.345183,3.935370,0.005382,1.007068,0.884804,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.034087,0.335428,0.239547,-0.077054,0.828913,0.644661,-0.023621,0.315946,0.232312,-0.058720,-0.085947,-0.07571,-0.005854,0.934704,-0.171385,-0.887372,-0.636531,-0.261636,2.690801,-0.164934,3.374860,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,0.002788,0.902851,0.869193,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.038210,0.307225,0.229852,-0.094587,0.668040,-0.041047,0.556719,-0.222876,-0.249776,-0.229654,-0.026352,0.290551,0.224768,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
4,-1.577962,-0.355529,-0.098021,-0.092260,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.043454,-0.268682,-0.044364,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-3.083695e-12,-5.236444e-13,8.145714e-13,-5.379878e-13,-2.238147e-10,1.390975e-11,-3.017612e-12,-1.153957e-11,7.392674e-12,-2.702356e-11,1.454394e-11,-1.740229e-11,1.307106e-11,-4.022651e-12,3.499030e-12,-3.166298e-13,2.845259e-13,1.453246e-12,-3.911439e-13,-4.789172e-13,-3.426887e-15,-2.243688e-13,5.260949e-13,-9.692889e-13,-7.446081e-14,-1.634885e-13,1.975734e-12,1.759968e-13,1.275232e-12,2.209007e-12,-5.110965e-14,2.461043e-14,-5.044168e-13,3.820717e-13,-1.489384e-13,7.259062e-14,-2.585515e-13,8.316884e-16,-2.382748e-12,2.477431e-12,1.057952e-12,-5.229004e-13,2.295388e-13,1.411796e-12,8.837807e-11,-9.415936e-13,-1.104464e-12,-4.157024e-12,-2.405210e-12,-1.801105e-11,7.792423e-12,1.703315e-12,1.025730e-12,-1.503228e-12,-1.087911e-12,1.616170e-12,-1.687411e-13,-1.180340e-13,1.137835e-13,-1.770337e-12,1.624354e-13,1.787235e-13,2.500353e-13,7.103039e-14,1.748836e-12,-9.351969e-14,1.569339e-13,1.979515e-13,2.134423e-12,1.482304e-12,1.960941e-12,-3.934605e-15,-8.928302e-13,4.170006e-13,-3.763479e-13,-2.267603e-13,5.724678e-14,2.008787e-13,-3.563289e-13,1.416390e-12,-2.079929e-12,1.568357e-12,2.750577e-13,-1.784145e-13,-3.921359e-13,-2.445295e-12,8.868698e-13,-7.250247e-14,3.516805e-13,1.599245e-12,1.400787e-12,-3.165980e-11,3.274352e-14,-1.178170e-12,4.331844e-13,-9.525535e-14,3.081047e-13,7.125128e-13,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.332330,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.058540,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,1.552738,1.149548,1.392933,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
590535,1.827664,-0.359710,-0.090534,-0.092260,-0.037493,-0.059438,-0.17728,-0.126853,-0.046146,-0.053939,-0.148741,-0.054826,-0.097963,-0.047034,-0.228346,-0.127062,-0.414929,-1.094582e+00,3.565045e-02,-3.083695e-12,-3.988249e-01,-6.709449e-01,-5.767737e-01,1.017998e-02,-2.592968e-01,-3.351374e-01,-2.649521e+00,-2.538313e+00,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,3.022537e+00,2.820136e+00,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,-2.055521e+00,-1.977071e+00,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,1.284357e+00,1.147255e+00,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,-2.382748e-12,2.477431e-12,1.057952e-12,-5.229004e-13,2.295388e-13,1.411796e-12,8.837807e-11,-9.415936e-13,-1.104464e-12,-4.157024e-12,-2.405210e-12,-1.801105e-11,7.792423e-12,1.703315e-12,1.025730e-12,-1.503228e-12,-1.087911e-12,1.616170e-12,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,-2.038182e+00,-1.923137e+00,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,1.273091e+00,1.145130e+00,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,3.071779e+00,2.848037e+00,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,-2.169416e+00,-2.033282e+00,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,1.257668e+00,1.119422e+00,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.049362,-0.049845,-0.062071,-0.217552,0.038582,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.068120,-0.067818,-0.077054,-0.139978,-0.193232,-0.045691,-0.053985,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.051078,-0.164934,-0.332330,1.894546,1.276381,-0.133974,-0.040598,-0.063554,-0.045783,-0.058540,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.082135,-0.076248,-0.094587,-0.199008,-0.041047,-0.227588,0.277647,0.026701,0.181131,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
590536,1.827665,-0.399433,-0.098021,-0.092260,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-8.670567e-01,-7.273552e-01,-1.085559e+00,-8.765505e-01,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,9.239738e-01,8.069527e-01,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,4.623547e-01,3.548138e-01,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,-8.137181e-01,-7.855703e-01,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,1.049139e+00,9.239479e-01,-1.852104e-01,-2.237803e-01,-4.349123e-01,-4.145243e-01,3.200277e-02,-4.825729e-01,-4.614016e-01,-1.553483e-01,-1.958815e-01,-1.582088e-01,-1.966051e-01,-8.924011e-01,-8.674586e-01,-5.213594e-01,-4.998520e-01,-4.925501e-01,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,4.180961e-01,2.935747e-01,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,-8.146292e-01,-7.889622e-01,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,9.616375e-01,8.316462e-01,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,3.991268e-01,2.721183e-01,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,-8.449706e-01,-8.121510e-01,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.332330,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.058540,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
590537,1.827671,-0.435170,-0.098021,-0.092260,-0.037493,-0.059438,-0.17728,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-8.670567e-01,-7.273552e-01,-1.085559e+00,-8.765505e-01,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,1.415888e+00,1.299090e+00,9.239738e-01,8.069527e-01,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,4.623547e-01,3.548138e-01,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,1.284357e+00,1.147255e+00,-4.238691e-01,-4.143175e-01,-4.107423e-01,-4.169704e-01,1.049139e+00,9.239479e-01,-1.852104e-01,-2.237803e-01,-4.349123e-01,-4.145243e-01,3.200277e-02,-4.825729e-01,-4.614016e-01,-1.553483e-01,-1.958815e-01,-1.582088e-01,-1.966051e-01,1.436570e+00,1.313597e+00,-5.213594e-01,-4.998520e-01,-4.925501e-01,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,4.180961e-01,2.935747e-01,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,1.273091e+00,1.145130e+00,-4.140183e-01,-3.998812e-01,-4.089487e-01,-4.149028e-01,9.616375e-01,8.316462e-01,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,3.991268e-01,2.721183e-01,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,1.257668e+00,1.119422e+00,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.058720,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.332330,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.058540,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False
590538,1.827673,-0.075379,-0.098021,-0.092260,-0.037493,-0.059438,-0.21606,-0.084900,-0.046146,-0.053939,-0.148741,-0.054826,-0.097963,-0.035495,-0.212886,-0.147246,-0.459376,-1.149092e+00,-6.099368e-01,-7.308073e-01,-5.982897e-01,-9.226747e-01,-7.587810e-01,1.017998e-02,-2.592968e-01,-3.351374e-01,3.610747e+00,3.250376e+00,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,9.239738e-01,8.069527e-01,2.395200e-02,-3.943053e-01,-3.858147e-01,-3.940340e-01,-3.901042e-01,2.980230e+00,2.686699e+00,-4.097843e-01,-3.943524e-01,-1.504961e-01,-2.037557e-01,1.292039e-01,6.122118e-02,-2.905750e-02,-2.856266e-02,-8.137181e-01,-7.855703e-01,-4.238691e-01,-4.143175e-01,-4.107423e-01,2.579503e+00,1.049139e+00,9.239479e-01,-1.852104e-01,-2.237803e-01,-4.349123e-01,-4.145243e-01,3.200277e-02,-4.825729e-01,-4.614016e-01,-1.553483e-01,-1.958815e-01,-1.582088e-01,-1.966051e-01,-8.924011e-01,-8.674586e-01,-5.213594e-01,-4.998520e-01,2.203471e+00,8.855460e-01,7.622988e-01,-1.854372e-01,-1.962468e-01,-3.941872e-01,-3.809236e-01,-3.801127e-01,-3.655042e-01,2.874375e+00,2.510287e+00,-3.946807e-01,-3.744860e-01,1.968917e-02,9.404579e-02,8.196594e-03,-2.408753e-02,-8.146292e-01,-7.889622e-01,-4.140183e-01,-3.998812e-01,-4.089487e-01,2.312083e+00,9.616375e-01,8.316462e-01,-1.769435e-01,-2.006242e-01,-3.909932e-01,-3.811137e-01,-3.663916e-01,2.967670e+00,2.577519e+00,-4.113041e-01,-3.875649e-01,-1.677416e-01,-2.111133e-01,2.981058e-02,-3.062930e-02,-8.449706e-01,-8.121510e-01,-4.342563e-01,-4.176214e-01,-4.323952e-01,-0.001808,0.024729,-0.025961,3.292217,0.038582,0.767217,-0.043215,0.032651,-0.049819,-0.131731,-0.083350,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.005532,0.164928,-0.028940,0.951042,0.078620,0.532942,-0.045691,0.188025,-0.052655,-0.058720,-0.085947,-0.07571,-0.005854,-0.034723,-0.171385,1.284443,3.859134,2.693271,1.167535,-0.164934,0.594467,1.894546,1.276381,-0.133974,0.020932,-0.063554,-0.045783,0.219757,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.009685,0.568043,-0.037531,0.911838,1.561859,-0.041047,0.452144,3.091325,3.610518,2.490354,-0.048378,0.499093,-0.058051,-0.055289,-0.088857,-0.074144,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,False


In [18]:
df_preprocessed.dtypes.value_counts()

float64    203
uint8       73
bool         1
dtype: int64

In [10]:
%%time

split_and_save_processed_data(df_preprocessed, test_size=0.2)

splitting the data...
saving...
CPU times: user 2min 58s, sys: 8.06 s, total: 3min 6s
Wall time: 3min 58s
