# Transactions Fraud Detection

**Authors:** [Peter Macinec](https://github.com/pmacinec), [Timotej Zatko](https://github.com/timzatko)

## Preprocessing

In this jupyter notebook, we will preprocess the data. Preprocessed data can be then used for classification.

### Setup and reading the data

At first, we need to import libraries and set initial configs.

In [1]:
# Automatically reload imported modules
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append('..')

# Supress libraries deprecation import warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
import pandas as pd
import numpy as np

from sklearn.pipeline import make_pipeline

from src.preprocessing.transformers import *
from src.preprocessing.pandas_feature_union import PandasFeatureUnion
from src.preprocessing.pandas_one_hot_encoder import PandasOneHotEncoder
from src.preprocessing.pandas_simple_imputer import PandasSimpleImputer
from src.preprocessing.pandas_missing_indicator import PandasMissingIndicator

from src.dataset import load_data, split_and_save_processed_data

In [4]:
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

The data will be loaded using our function that optimizes data types of attributes (this loading saves a lot of memory):

In [5]:
df = load_data()

In [6]:
df.shape

(590540, 434)

### Define preprocessing pipeline

Our preprocessing will be done via preprocessing **pipeline**. Preprocessing with pipelines is commonly used to ensure reproducibility.

The steps of preprocessing are defined according to results of data analysis phase:

TODO
* define main steps identified in data analysis

In [7]:
categoric_features = df.select_dtypes(include=np.object).columns.to_list()
numeric_features = df.select_dtypes(exclude=np.object).columns.to_list()

pipeline = PandasFeatureUnion([
    ('numeric_features', make_pipeline(
        SelectFeatures(numeric_features),
        FilterColumnsByCountOfMissingValues(0.5),
        PandasSimpleImputer(strategy='mean'),
        PandasMissingIndicator(),
        Normalizer(),
        OutliersFilter()
    )),
    ('categoric_features', make_pipeline(
        SelectFeatures(categoric_features),
        EmailProviderTransform(['P_emaildomain', 'R_emaildomain']),    
        KeepOnlyMostCommonValues(10),
        PandasOneHotEncoder()
    ))
])

In [8]:
%%time
df_preprocessed = pipeline.fit_transform(df)

CPU times: user 1min 28s, sys: 31.8 s, total: 2min
Wall time: 2min


In [9]:
df_preprocessed.shape

(590540, 539)

After preprocessing there are 590540 rows with 539 features.

In [17]:
aaa = df[['V1', 'V2', 'V3', 'TransactionAmt']]

In [21]:
aaa

Unnamed: 0,V1,V2,V3,TransactionAmt
0,1.0,1.0,1.0,68.500000
1,,,,29.000000
2,1.0,1.0,1.0,59.000000
3,,,,50.000000
4,,,,50.000000
...,...,...,...,...
590535,1.0,1.0,1.0,49.000000
590536,1.0,1.0,1.0,39.500000
590537,1.0,1.0,1.0,30.953125
590538,1.0,1.0,1.0,117.000000


In [27]:
aaa.mean()

V1                  0.999945
V2                  1.045204
V3                  1.078075
TransactionAmt    134.877899
dtype: float32

In [26]:
aaa.std()

V1                  0.007390
V2                  0.240133
V3                  0.320890
TransactionAmt    239.157440
dtype: float32

In [22]:
aaa.dropna().describe()

Unnamed: 0,V1,V2,V3,TransactionAmt
count,311253.0,311253.0,311253.0,311253.0
mean,0.999945,1.045204,1.078075,
std,0.00739,0.240133,0.32089,
min,0.0,0.0,0.0,2.0
25%,1.0,1.0,1.0,49.0
50%,1.0,1.0,1.0,78.5
75%,1.0,1.0,1.0,146.0
max,1.0,8.0,9.0,31936.0


In [20]:
(aaa - aaa.mean()) / aaa.std()

Unnamed: 0,V1,V2,V3,TransactionAmt
0,0.007388,-0.188247,-0.243307,-0.277588
1,,,,-0.442871
2,0.007388,-0.188247,-0.243307,-0.317383
3,,,,-0.354980
4,,,,-0.354980
...,...,...,...,...
590535,0.007388,-0.188247,-0.243307,-0.359131
590536,0.007388,-0.188247,-0.243307,-0.398926
590537,0.007388,-0.188247,-0.243307,-0.434570
590538,0.007388,-0.188247,-0.243307,-0.074768


In [18]:
aaa

Unnamed: 0,V1,V2,V3,TransactionAmt
0,1.0,1.0,1.0,68.500000
1,,,,29.000000
2,1.0,1.0,1.0,59.000000
3,,,,50.000000
4,,,,50.000000
...,...,...,...,...
590535,1.0,1.0,1.0,49.000000
590536,1.0,1.0,1.0,39.500000
590537,1.0,1.0,1.0,30.953125
590538,1.0,1.0,1.0,117.000000


In [None]:
isFraud, transacionId, transaction DT, vyhodit vsetko.. addr2, addr1? object a nie je

In [10]:
df_preprocessed

Unnamed: 0,TransactionID,isFraud,TransactionDT,TransactionAmt,addr2,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,D1,D2,D3,D4,D10,D11,D15,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,V29,V30,V31,V32,V33,V34,V35,V36,V37,V38,V39,V40,V41,V42,V43,V44,V45,V46,V47,V48,V49,V50,V51,V52,V53,V54,V55,V56,V57,V58,V59,V60,V61,V62,V63,V64,V65,V66,V67,V68,V69,V70,V71,V72,V73,V74,V75,V76,V77,V78,V79,V80,V81,V82,V83,V84,V85,V86,V87,V88,V89,V90,V91,V92,V93,V94,V95,V96,V97,V98,V99,V100,V101,V102,V103,V104,V105,V106,V107,V108,V109,V110,V111,V112,V113,V114,V115,V116,V117,V118,V119,V120,V121,V122,V123,V124,V125,V126,V127,V128,V129,V130,V131,V132,V133,V134,V135,V136,V137,V279,V280,V281,V282,V283,V284,V285,V286,V287,V288,V289,V290,V291,V292,V293,V294,V295,V296,V297,V298,V299,V300,V301,V302,V303,V304,V305,V306,V307,V308,V309,V310,V311,V312,V313,V314,V315,V316,V317,V318,V319,V320,V321,ProductCD_H,ProductCD_R,ProductCD_S,ProductCD_W,ProductCD_nan,card1_12695,card1_15066,card1_15885,card1_17188,card1_2803,card1_6019,card1_7585,card1_7919,card1_9500,card1_Other,card1_nan,card2_170.0,card2_194.0,card2_321.0,card2_360.0,card2_490.0,card2_514.0,card2_545.0,card2_555.0,card2_583.0,card2_Other,card2_nan,card3_106.0,card3_117.0,card3_119.0,card3_143.0,card3_144.0,card3_146.0,card3_147.0,card3_150.0,card3_185.0,card3_Other,card3_nan,card4_american express,card4_discover,card4_mastercard,card4_visa,card4_nan,card5_117.0,...,R_emaildomain_comcast,R_emaildomain_gmail,R_emaildomain_hotmail,R_emaildomain_icloud,R_emaildomain_live,R_emaildomain_nan,R_emaildomain_outlook,R_emaildomain_yahoo,R_emaildomain_nan.1,M1_Other,M1_T,M1_nan,M2_Other,M2_T,M2_nan,M3_Other,M3_T,M3_nan,M4_M1,M4_M2,M4_Other,M4_nan,M5_Other,M5_T,M5_nan,M6_Other,M6_T,M6_nan,M7_Other,M7_T,M7_nan,M8_Other,M8_T,M8_nan,M9_Other,M9_T,M9_nan,id_12_NotFound,id_12_Other,id_12_nan,id_13_19.0,id_13_20.0,id_13_25.0,id_13_27.0,id_13_33.0,id_13_49.0,id_13_52.0,id_13_63.0,id_13_64.0,id_13_Other,id_13_nan,id_14_-240.0,id_14_-300.0,id_14_-360.0,id_14_-420.0,id_14_-480.0,id_14_-540.0,id_14_-600.0,id_14_0.0,id_14_60.0,id_14_Other,id_14_nan,id_15_New,id_15_Other,id_15_Unknown,id_15_nan,id_16_NotFound,id_16_Other,id_16_nan,id_17_102.0,id_17_121.0,id_17_142.0,id_17_148.0,id_17_150.0,id_17_159.0,id_17_166.0,id_17_191.0,id_17_225.0,id_17_Other,id_17_nan,id_18_12.0,id_18_13.0,id_18_15.0,id_18_17.0,id_18_18.0,id_18_20.0,id_18_21.0,id_18_24.0,id_18_26.0,id_18_Other,id_18_nan,id_19_153.0,id_19_215.0,id_19_266.0,id_19_312.0,id_19_410.0,id_19_417.0,id_19_427.0,id_19_529.0,id_19_542.0,id_19_Other,id_19_nan,id_20_222.0,id_20_325.0,id_20_333.0,id_20_507.0,id_20_533.0,id_20_549.0,id_20_563.0,id_20_595.0,id_20_600.0,id_20_Other,id_20_nan,id_21_252.0,id_21_255.0,id_21_277.0,id_21_576.0,id_21_596.0,id_21_668.0,id_21_755.0,id_21_848.0,id_21_849.0,id_21_Other,id_21_nan,id_22_14.0,id_22_17.0,id_22_21.0,id_22_31.0,id_22_33.0,id_22_35.0,id_22_36.0,id_22_39.0,id_22_41.0,id_22_Other,id_22_nan,id_23_IP_PROXY:HIDDEN,id_23_IP_PROXY:TRANSPARENT,id_23_Other,id_23_nan,id_24_15.0,id_24_16.0,id_24_17.0,id_24_18.0,id_24_19.0,id_24_21.0,id_24_24.0,id_24_25.0,id_24_26.0,id_24_Other,id_24_nan,id_25_126.0,id_25_205.0,id_25_321.0,id_25_371.0,id_25_426.0,id_25_442.0,id_25_501.0,id_25_509.0,id_25_524.0,id_25_Other,id_25_nan,id_26_102.0,id_26_119.0,id_26_121.0,id_26_142.0,id_26_147.0,id_26_161.0,id_26_169.0,id_26_184.0,id_26_215.0,id_26_Other,id_26_nan,id_27_NotFound,id_27_Other,id_27_nan,id_28_New,id_28_Other,id_28_nan,id_29_NotFound,id_29_Other,id_29_nan,id_30_Mac OS X 10_10_5,id_30_Mac OS X 10_11_6,id_30_Mac OS X 10_12_6,id_30_Other,id_30_Windows 10,id_30_Windows 7,id_30_Windows 8.1,id_30_iOS 11.1.2,id_30_iOS 11.2.1,id_30_iOS 11.3.0,id_30_nan,id_31_chrome 62.0,id_31_chrome 63.0,id_31_chrome 63.0 for android,id_31_chrome 64.0,id_31_chrome 65.0,id_31_chrome generic,id_31_ie 11.0 for desktop,id_31_mobile safari 11.0,id_31_mobile safari generic,id_31_safari generic,id_31_nan,id_32_16.0,id_32_24.0,id_32_32.0,id_32_Other,id_32_nan,id_33_1334x750,id_33_1366x768,id_33_1440x900,id_33_1600x900,id_33_1920x1080,id_33_2048x1536,id_33_2208x1242,id_33_2560x1440,id_33_2560x1600,id_33_Other,id_33_nan,id_34_match_status:-1,id_34_match_status:0,id_34_match_status:1,id_34_match_status:2,id_34_nan,id_35_Other,id_35_T,id_35_nan,id_36_Other,id_36_T,id_36_nan,id_37_Other,id_37_T,id_37_nan,id_38_Other,id_38_T,id_38_nan,DeviceType_desktop,DeviceType_mobile,DeviceType_nan,DeviceInfo_Other,DeviceInfo_SM-G531H Build/LMY48B,DeviceInfo_SM-G610M Build/MMB29K,DeviceInfo_SM-J700M Build/MMB29K,DeviceInfo_Trident/7.0,DeviceInfo_Windows,DeviceInfo_iOS Device,DeviceInfo_rv:11.0,DeviceInfo_rv:57.0,DeviceInfo_rv:59.0,DeviceInfo_nan
0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,-1.732041,-0.190417,-1.577985,-0.443337,0.0786,-0.098021,-0.09226,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.268682,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-0.867057,-0.727355,8.145714e-13,-0.876550,-2.238147e-10,1.390975e-11,-3.017612e-12,-1.153957e-11,7.392674e-12,-2.702356e-11,1.454394e-11,-1.740229e-11,1.307106e-11,-4.022651e-12,3.499030e-12,-1.174590,-1.206230,0.023952,-0.394305,-0.385815,-0.394034,-0.390104,0.462355,0.354814,-0.409784,-0.394352,-0.150496,-0.203756,0.129204,0.061221,-0.029058,-0.028563,-0.813718,-0.785570,-0.423869,-0.414318,-0.410742,-0.41697,-1.244533,-1.271734,-0.18521,-0.22378,-0.434912,-0.414524,0.032003,-0.482573,-0.461402,-0.155348,-0.195881,-0.158209,-0.196605,-0.892401,-0.867459,-0.521359,-0.499852,-0.49255,-1.210847,-1.243655,-0.185437,-0.196247,-0.394187,-0.380924,-0.380113,-0.365504,0.418096,0.293575,-0.394681,-0.374486,0.019689,0.094046,0.008197,-0.024088,-0.814629,-0.788962,-0.414018,-0.399881,-0.408949,-0.414903,-1.148504,-1.184745,-0.176943,-0.200624,-0.390993,-0.381114,-0.366392,0.399127,0.272118,-0.411304,-0.387565,-0.167742,-0.211113,0.029811,-0.030629,-0.844971,-0.812151,-0.434256,-0.417621,-0.432395,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.08335,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.05872,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.33233,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.05854,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,-1.732035,-0.190417,-1.577970,-0.317897,0.0786,-0.098021,-0.09226,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-0.867057,-0.727355,1.246644e+00,0.809694,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,0.923974,0.806953,0.023952,-0.394305,-0.385815,-0.394034,-0.390104,0.462355,0.354814,-0.409784,-0.394352,-0.150496,-0.203756,0.129204,0.061221,-0.029058,-0.028563,-0.813718,-0.785570,-0.423869,-0.414318,-0.410742,-0.41697,1.049139,0.923948,-0.18521,-0.22378,-0.434912,-0.414524,0.032003,-0.482573,-0.461402,-0.155348,-0.195881,-0.158209,-0.196605,-0.892401,-0.867459,-0.521359,-0.499852,-0.49255,0.885546,0.762299,-0.185437,-0.196247,-0.394187,-0.380924,-0.380113,-0.365504,0.418096,0.293575,-0.394681,-0.374486,0.019689,0.094046,0.008197,-0.024088,-0.814629,-0.788962,-0.414018,-0.399881,-0.408949,-0.414903,0.961638,0.831646,-0.176943,-0.200624,-0.390993,-0.381114,-0.366392,0.399127,0.272118,-0.411304,-0.387565,-0.167742,-0.211113,0.029811,-0.030629,-0.844971,-0.812151,-0.434256,-0.417621,-0.432395,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.08335,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.05872,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.33233,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.05854,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,...,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
590535,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
590536,1.732029,-0.190417,1.827665,-0.399433,0.0786,-0.098021,-0.09226,-0.037493,-0.059438,-0.21606,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-0.867057,-0.727355,-1.085559e+00,-0.876550,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,-1.225277e+00,-1.194301e+00,0.923974,0.806953,0.023952,-0.394305,-0.385815,-0.394034,-0.390104,0.462355,0.354814,-0.409784,-0.394352,-0.150496,-0.203756,0.129204,0.061221,-0.029058,-0.028563,-0.813718,-0.785570,-0.423869,-0.414318,-0.410742,-0.41697,1.049139,0.923948,-0.18521,-0.22378,-0.434912,-0.414524,0.032003,-0.482573,-0.461402,-0.155348,-0.195881,-0.158209,-0.196605,-0.892401,-0.867459,-0.521359,-0.499852,-0.49255,0.885546,0.762299,-0.185437,-0.196247,-0.394187,-0.380924,-0.380113,-0.365504,0.418096,0.293575,-0.394681,-0.374486,0.019689,0.094046,0.008197,-0.024088,-0.814629,-0.788962,-0.414018,-0.399881,-0.408949,-0.414903,0.961638,0.831646,-0.176943,-0.200624,-0.390993,-0.381114,-0.366392,0.399127,0.272118,-0.411304,-0.387565,-0.167742,-0.211113,0.029811,-0.030629,-0.844971,-0.812151,-0.434256,-0.417621,-0.432395,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.08335,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.05872,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.33233,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.05854,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
590537,1.732035,-0.190417,1.827671,-0.435170,0.0786,-0.098021,-0.09226,-0.037493,-0.059438,-0.17728,-0.112869,-0.046146,-0.053939,-0.208711,-0.054826,-0.097963,-0.047034,-0.243806,-0.147246,-0.599067,-4.568110e-13,-3.423025e-12,-0.867057,-0.727355,-1.085559e+00,-0.876550,1.017998e-02,-2.592968e-01,-3.351374e-01,4.806130e-01,3.560312e-01,-2.628799e-01,-3.293306e-01,-2.050870e-01,-2.527318e-01,1.415888e+00,1.299090e+00,0.923974,0.806953,0.023952,-0.394305,-0.385815,-0.394034,-0.390104,0.462355,0.354814,-0.409784,-0.394352,-0.150496,-0.203756,0.129204,0.061221,-0.029058,-0.028563,1.284357,1.147255,-0.423869,-0.414318,-0.410742,-0.41697,1.049139,0.923948,-0.18521,-0.22378,-0.434912,-0.414524,0.032003,-0.482573,-0.461402,-0.155348,-0.195881,-0.158209,-0.196605,1.436570,1.313597,-0.521359,-0.499852,-0.49255,0.885546,0.762299,-0.185437,-0.196247,-0.394187,-0.380924,-0.380113,-0.365504,0.418096,0.293575,-0.394681,-0.374486,0.019689,0.094046,0.008197,-0.024088,1.273091,1.145130,-0.414018,-0.399881,-0.408949,-0.414903,0.961638,0.831646,-0.176943,-0.200624,-0.390993,-0.381114,-0.366392,0.399127,0.272118,-0.411304,-0.387565,-0.167742,-0.211113,0.029811,-0.030629,1.257668,1.119422,-0.434256,-0.417621,-0.432395,-0.049362,-0.074702,-0.062071,-0.217552,-0.328818,-0.288834,-0.043215,-0.050871,-0.049819,-0.131731,-0.08335,-0.090174,0.020508,-0.056869,-0.115993,-0.079571,-0.036201,-0.063309,-0.045273,-0.084414,-0.170491,-0.1151,-0.011109,-0.035951,-0.020024,-0.020979,-0.06375,-0.036169,-0.136448,-0.248593,-0.180076,-0.055397,-0.079436,-0.067818,-0.077054,-0.291778,-0.193232,-0.045691,-0.053985,-0.052655,-0.05872,-0.085947,-0.07571,-0.053424,-0.070628,-0.171385,0.198536,0.005707,-0.261636,-0.355731,-0.164934,-0.33233,-0.428197,-0.394219,-0.133974,-0.040598,-0.063554,-0.045783,-0.05854,-0.055211,-0.100856,-0.141696,-0.094119,-0.099556,-0.157322,-0.163543,-0.522452,-0.454039,-0.500173,-0.002603,-0.059497,-0.093053,-0.076248,-0.094587,-0.334851,-0.041047,-0.227588,-0.222876,-0.249776,-0.229654,-0.048378,-0.062213,-0.058051,-0.055289,-0.088857,-0.074144,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
590538,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [13]:
df_preprocessed.dtypes.value_counts()

uint8      333
float64    206
dtype: int64

In [10]:
%%time

split_and_save_processed_data(df_preprocessed, test_size=0.2)

splitting the data...
saving...
CPU times: user 2min 58s, sys: 8.06 s, total: 3min 6s
Wall time: 3min 58s
