This is the final preprocessing notebook before modeling. Here, I'll try out models' performances given various variable encoding strategies. I might want to balance the data in addition to focusing on F1, kappa and ROC curves as metrics, to see if that would actually improve model performance. I might also need to explore the possibility of feature reduction; I created multiple features in the previous notebook but am not yet sure if they'll be valuable in making predictions. I can begin to explore various models in the process of all this. 

Notebook on which this one builds: https://github.com/fractaldatalearning/Capstone2/blob/main/notebooks/preprocessing2_feature_engineering.ipynb

One thing to look out for in this notebook: If I'm modeling and the computer is doing fine processing the dataset at this size, I could go back to the notebook for preprocessing1, add more rows to further increments of the full original dataset, concatenate them, re-run all the feature engineering steps with the larger dataset, and come back here to try out modeling with more rows (from twice as many, perhaps up to 10 times as many). I could also try a Naive Bayes classifier, which can be used when not all training data fits in memory. 

In [40]:
import pandas as pd
import numpy as np
import os
from library.sb_utils import save_file

import matplotlib.pyplot as plt
import seaborn as sns

import category_encoders as ce
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import metrics

from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

import random

from IPython.display import Audio
sound_file = './alert.wav'

In [2]:
df = pd.read_csv('../data/processed/features_engineered.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 218232 entries, 0 to 218231
Data columns (total 27 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   order_id                 218232 non-null  int64  
 1   user_id                  218232 non-null  int64  
 2   order_by_user_sequence   218232 non-null  int64  
 3   days_since_prior_order   218232 non-null  float64
 4   add_to_cart_sequence     218232 non-null  int64  
 5   reordered                218232 non-null  int64  
 6   product_name             218232 non-null  object 
 7   aisle_name               218232 non-null  object 
 8   dept_name                218232 non-null  object 
 9   prior_purchases          218232 non-null  int64  
 10  purchased_percent_prior  218232 non-null  float64
 11  free                     218232 non-null  int64  
 12  fresh                    218232 non-null  int64  
 13  mix                      218232 non-null  int64  
 14  natu

In [3]:
# order_id is redundant as a combination of user and order_by_user_sequence. Delete it. 
df = df.drop(columns='order_id')
df.columns

Index(['user_id', 'order_by_user_sequence', 'days_since_prior_order',
       'add_to_cart_sequence', 'reordered', 'product_name', 'aisle_name',
       'dept_name', 'prior_purchases', 'purchased_percent_prior', 'free',
       'fresh', 'mix', 'natural', 'organic', 'original', 'sweet', 'white',
       'whole', 'rice', 'fruit', 'gluten', 'dow_sin', 'dow_cos', 'hour_sin',
       'hour_cos'],
      dtype='object')

In [54]:
df['reordered'].mean()

0.0973001209721764

I can judge effect of my work by comparing model scores with scores of what would happen if I just guess that a random 10% of items get reordered (since 0.097 is the average of the whole 'reordered' column in this dataset). 

In [47]:
# Make an array with 21823 1s randomly dispersed aming the rest 0s. Then use that fake array
# as predictions to see what scores I'd get without using any of the work I've done/ will do. 
ones = [1] * 21823
zeroes = [0] * 196409
array = np.concatenate([ones, zeroes])
len(array)

218232

In [48]:
array[0:6]

array([1, 1, 1, 1, 1, 1])

In [49]:
random.shuffle(array)
array[0:6]

array([0, 0, 0, 0, 0, 0])

In [52]:
print('fake baseline conf matrix: ', metrics.confusion_matrix(y, array))
print('fake baseline RF f1 score: ', metrics.f1_score(y, array))
print('fake baseline RF kappa score: ', metrics.cohen_kappa_score(y, array))
print('fake baseline RF roc_auc score: ', metrics.roc_auc_score(y, array))

fake conf matrix:  [[177291  19707]
 [ 19118   2116]]
fake RF f1 score:  0.09828831548877069
fake RF kappa score:  -0.0003803398371760025
fake RF roc_auc score:  0.499807476856609


My understanding is that categorical features should be encoded prior to any standardization of ordinal features. Start here. 

I'd like to try multiple encoders for categorical data. A summary of my current knowledge of encoders that could make sense for this data:
- One-Hot could work for the dept_name column because there are only 19 categories, much fewer than all the other categorical columns. It wouldn't work for any of the others. 
- Hashing works with high-cardinality variables but isn't reversible and can lead to some (usuall minimal, as far as I've read) info loss. It's not clear to me whether it involves any leakage across rows. 
- My understanding of binary encoding is that it's the best of both worlds from one-hot and hashing: fewer resultant categories than one-hot but interpretable and no info loss, unlike hashing. 
- My understanding is that Bayesian encoders generally cause contamination, so make sure to split into training and test sets prior to encoding. I read that LeaveOneOut is a Bayesian encoder that avoids leakage by not using the dependent variable.  I also read that it is especially good for classification tasks, so it's a good one to consider here.
- I know very little about WeightofEvidence but it's another Bayesian encoders recommended by Springboard and I can try it out along with Target encoder (though I'd expect Target to over-fit compared with LeaveOneOut). 

I'd like any encoder(s) I use to be included in an eventual modeling pipeline, but first I want to explore and try them out individually to see better how they would each work with the data. 

In [5]:
# Start by just predicting the reordered column. Perhaps try predicting the add_to_cart_sequence
# column later. Create independent & dependent variables, encode independent categories. 
X = df.drop(columns=['reordered', 'add_to_cart_sequence'])
y = df['reordered']

categorical_columns = ['user_id', 'product_name', 'aisle_name', 'dept_name']
ce_bin = ce.BinaryEncoder(cols=categorical_columns)
Xbin = ce_bin.fit_transform(X,y)

In [6]:
Xbin.columns

Index(['user_id_0', 'user_id_1', 'user_id_2', 'user_id_3', 'user_id_4',
       'user_id_5', 'user_id_6', 'user_id_7', 'order_by_user_sequence',
       'days_since_prior_order', 'product_name_0', 'product_name_1',
       'product_name_2', 'product_name_3', 'product_name_4', 'product_name_5',
       'product_name_6', 'product_name_7', 'product_name_8', 'product_name_9',
       'product_name_10', 'product_name_11', 'product_name_12', 'aisle_name_0',
       'aisle_name_1', 'aisle_name_2', 'aisle_name_3', 'aisle_name_4',
       'aisle_name_5', 'aisle_name_6', 'aisle_name_7', 'dept_name_0',
       'dept_name_1', 'dept_name_2', 'dept_name_3', 'dept_name_4',
       'prior_purchases', 'purchased_percent_prior', 'free', 'fresh', 'mix',
       'natural', 'organic', 'original', 'sweet', 'white', 'whole', 'rice',
       'fruit', 'gluten', 'dow_sin', 'dow_cos', 'hour_sin', 'hour_cos'],
      dtype='object')

In [7]:
# Test out encoder performance in Bagging and RandomForest models. 
# These were overwhelmingly better than others when trying them out with a practice user.
# First need to standardize. Don't bother yet with tuning model hyperparameters.

Xbin_train, Xbin_test, ybin_train, ybin_test = train_test_split(Xbin, y, test_size=0.3)

scaler = StandardScaler()
Xbin_train_scaled = scaler.fit_transform(Xbin_train)
Xbin_test_scaled = scaler.transform(Xbin_test)

bgg_clf = BaggingClassifier()
bgg_clf = bgg_clf.fit(Xbin_train_scaled, ybin_train)
ybin_pred = bgg_clf.predict(Xbin_test_scaled)
print('binary bagging conf matrix: ', metrics.confusion_matrix(ybin_test, ybin_pred))
print('binary bagging f1 score: ', metrics.f1_score(ybin_test, ybin_pred))
print('binary bagging kappa score: ', metrics.cohen_kappa_score(ybin_test, ybin_pred))
print('binary bagging roc_auc score: ', metrics.roc_auc_score(ybin_test, ybin_pred))

binary bagging conf matrix:  [[57890  1077]
 [ 5211  1292]]
binary bagging f1 score:  0.2912533814247069
binary bagging kappa score:  0.2515519082836193
binary bagging roc_auc score:  0.5902065402234833


In [8]:
rf_clf = RandomForestClassifier()
rf_clf = rf_clf.fit(Xbin_train_scaled, ybin_train)
ybin_pred = rf_clf.predict(Xbin_test_scaled)
print('binary RF conf matrix: ', metrics.confusion_matrix(ybin_test, ybin_pred))
print('binary RF f1 score: ', metrics.f1_score(ybin_test, ybin_pred))
print('binary RF kappa score: ', metrics.cohen_kappa_score(ybin_test, ybin_pred))
print('binary RF roc_auc score: ', metrics.roc_auc_score(ybin_test, ybin_pred))

binary RF conf matrix:  [[58252   715]
 [ 5149  1354]]
binary RF f1 score:  0.3159122725151656
binary RF kappa score:  0.2814589323552821
binary RF roc_auc score:  0.5980430842814234


In [9]:
# Try a Bayesian encoder. Start with LeaveOneOut. Even though it has less contaminiation
# than other Bayesian encoders, it's a good idea to split data first. 

Xloo_train, Xloo_test, yloo_train, yloo_test = train_test_split(X, y, test_size=0.3)

ce_loo = ce.leave_one_out.LeaveOneOutEncoder(cols=categorical_columns, random_state=43)
ce_loo.fit(Xloo_train, yloo_train)
Xloo_train = ce_loo.transform(Xloo_train)
Xloo_test = ce_loo.transform(Xloo_test)

Xloo_train

Unnamed: 0,user_id,order_by_user_sequence,days_since_prior_order,product_name,aisle_name,dept_name,prior_purchases,purchased_percent_prior,free,fresh,...,sweet,white,whole,rice,fruit,gluten,dow_sin,dow_cos,hour_sin,hour_cos
207885,0.102878,32,8.0,0.251969,0.101280,0.138683,2,0.062500,0,0,...,0,0,0,0,0,0,0.974928,-0.222521,-5.000000e-01,-8.660254e-01
2565,0.200077,5,6.0,0.126638,0.108402,0.124511,1,0.200000,0,0,...,0,0,0,0,0,0,-0.781831,0.623490,-8.660254e-01,5.000000e-01
74749,0.133904,8,9.0,0.096774,0.150568,0.138683,1,0.125000,0,0,...,0,0,0,0,0,0,0.000000,1.000000,-8.660254e-01,-5.000000e-01
52504,0.099180,43,1.0,0.093750,0.019435,0.034083,5,0.116279,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,1.224647e-16,-1.000000e+00
167960,0.185663,50,7.0,0.028571,0.138187,0.132272,3,0.060000,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,-7.071068e-01,-7.071068e-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
138319,0.067222,64,7.0,0.000000,0.074534,0.091347,1,0.015625,0,0,...,0,0,0,0,0,0,0.781831,0.623490,2.588190e-01,9.659258e-01
22463,0.128951,1,-1.0,0.046358,0.088060,0.138683,0,0.000000,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,9.659258e-01,2.588190e-01
110838,0.147027,15,30.0,0.128205,0.108402,0.124511,1,0.066667,0,0,...,0,0,0,0,0,0,-0.781831,0.623490,-7.071068e-01,7.071068e-01
89280,0.134167,45,6.0,0.222222,0.097122,0.088838,11,0.244444,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,5.000000e-01,-8.660254e-01


In [10]:
# Now try this encoded data in models after standardization

scaler = StandardScaler()
Xloo_train_scaled = scaler.fit_transform(Xloo_train)
Xloo_test_scaled = scaler.transform(Xloo_test)

bgg_clf = BaggingClassifier()
bgg_clf = bgg_clf.fit(Xloo_train_scaled, yloo_train)
yloo_pred = bgg_clf.predict(Xloo_test_scaled)
print('LeaveOneOut bagging conf matrix: ', metrics.confusion_matrix(yloo_test, yloo_pred))
print('LeaveOneOut bagging f1 score: ', metrics.f1_score(yloo_test, yloo_pred))
print('LeaveOneOut bagging kappa score: ', metrics.cohen_kappa_score(yloo_test, yloo_pred))
print('LeaveOneOut bagging roc_auc score: ', metrics.roc_auc_score(yloo_test, yloo_pred))

LeaveOneOut bagging conf matrix:  [[57784  1356]
 [ 4829  1501]]
LeaveOneOut bagging f1 score:  0.3267660825078916
LeaveOneOut bagging kappa score:  0.2836907700887702
LeaveOneOut bagging roc_auc score:  0.6070980793159029


In [11]:
rf_clf = RandomForestClassifier()
rf_clf = rf_clf.fit(Xloo_train_scaled, yloo_train)
yloo_pred = rf_clf.predict(Xloo_test_scaled)
print('LeaveOneOut RF conf matrix: ', metrics.confusion_matrix(yloo_test, yloo_pred))
print('LeaveOneOut RF f1 score: ', metrics.f1_score(yloo_test, yloo_pred))
print('LeaveOneOut RF kappa score: ', metrics.cohen_kappa_score(yloo_test, yloo_pred))
print('LeaveOneOut RF roc_auc score: ', metrics.roc_auc_score(yloo_test, yloo_pred))

LeaveOneOut RF conf matrix:  [[58096  1044]
 [ 4729  1601]]
LeaveOneOut RF f1 score:  0.35676880222841234
LeaveOneOut RF kappa score:  0.317897202447431
LeaveOneOut RF roc_auc score:  0.6176347820605081


In [12]:
# Now see whether LeaveOneOut performs better if I set drop_invariant to True.

Xloo_train, Xloo_test, yloo_train, yloo_test = train_test_split(X, y, test_size=0.3)

ce_loo = ce.leave_one_out.LeaveOneOutEncoder(cols=categorical_columns, random_state=43,
                                            drop_invariant=True)
ce_loo.fit(Xloo_train, yloo_train)
Xloo_train = ce_loo.transform(Xloo_train)
Xloo_test = ce_loo.transform(Xloo_test)

Xloo_train

Unnamed: 0,user_id,order_by_user_sequence,days_since_prior_order,product_name,aisle_name,dept_name,prior_purchases,purchased_percent_prior,free,fresh,...,sweet,white,whole,rice,fruit,gluten,dow_sin,dow_cos,hour_sin,hour_cos
142503,0.174734,12,15.0,0.000000,0.105725,0.081617,1,0.083333,0,0,...,0,0,0,1,0,0,0.000000,1.000000,2.588190e-01,-9.659258e-01
148008,0.092706,25,10.0,0.138340,0.122594,0.125702,3,0.120000,0,0,...,0,0,0,0,0,0,0.781831,0.623490,7.071068e-01,-7.071068e-01
42984,0.117207,23,10.0,0.072368,0.089572,0.137006,2,0.086957,0,0,...,0,0,0,0,0,0,0.433884,-0.900969,-9.659258e-01,-2.588190e-01
210745,0.094412,2,21.0,0.333333,0.138936,0.137006,1,0.500000,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,1.224647e-16,-1.000000e+00
173982,0.118372,3,17.0,0.050000,0.066869,0.050336,1,0.333333,0,0,...,0,0,1,0,0,0,0.781831,0.623490,-9.659258e-01,2.588190e-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86479,0.199451,26,7.0,0.055556,0.089572,0.137006,1,0.038462,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,-5.000000e-01,8.660254e-01
174893,0.118372,16,29.0,0.022727,0.064130,0.081617,1,0.062500,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,8.660254e-01,-5.000000e-01
115102,0.072350,40,19.0,0.034483,0.071550,0.081617,2,0.050000,0,0,...,0,0,0,0,0,0,0.000000,1.000000,-7.071068e-01,-7.071068e-01
172613,0.049054,12,2.0,0.000000,0.009091,0.032550,1,0.083333,0,0,...,0,0,0,0,0,0,0.781831,0.623490,-2.588190e-01,-9.659258e-01


In [13]:
scaler = StandardScaler()
Xloo_train_scaled = scaler.fit_transform(Xloo_train)
Xloo_test_scaled = scaler.transform(Xloo_test)

rf_clf = RandomForestClassifier()
rf_clf = rf_clf.fit(Xloo_train_scaled, yloo_train)
yloo_pred = rf_clf.predict(Xloo_test_scaled)
print('LOO drop_invariant conf matrix: ', metrics.confusion_matrix(yloo_test, yloo_pred))
print('LOO drop_invariant f1 score: ', metrics.f1_score(yloo_test, yloo_pred))
print('LOO drop_invariant kappa score: ', metrics.cohen_kappa_score(yloo_test, yloo_pred))
print('LOO drop_invariant roc_auc score: ', metrics.roc_auc_score(yloo_test, yloo_pred))

LOO drop_invariant conf matrix:  [[58096  1039]
 [ 4754  1581]]
LOO drop_invariant f1 score:  0.35309882747068677
LOO drop_invariant kappa score:  0.3142729521833262
LOO drop_invariant roc_auc score:  0.6159979683424727


In [14]:
# Model performance dropped very slighly when I dropped columns without variance. 
# Try some of the other Bayesian encoders. Start with Target Encoder.
# It has hyperparameters min_sample_leaf and smoothing that I could tune if the Target encoder
# seems worthwhile compared with others. 

Xtar_train, Xtar_test, ytar_train, ytar_test = train_test_split(X, y, test_size=0.3)

ce_tar = ce.target_encoder.TargetEncoder(cols=categorical_columns)
ce_tar.fit(Xtar_train, ytar_train)
Xtar_train = ce_tar.transform(Xtar_train)
Xtar_test = ce_tar.transform(Xtar_test)

Xtar_train



Unnamed: 0,user_id,order_by_user_sequence,days_since_prior_order,product_name,aisle_name,dept_name,prior_purchases,purchased_percent_prior,free,fresh,...,sweet,white,whole,rice,fruit,gluten,dow_sin,dow_cos,hour_sin,hour_cos
192482,0.026677,10,2.0,9.615385e-02,0.074484,0.086256,1,0.100000,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,5.000000e-01,-0.866025
46172,0.068069,16,5.0,9.243697e-02,0.122973,0.125823,1,0.062500,0,0,...,0,0,0,0,0,0,-0.781831,0.623490,-8.660254e-01,0.500000
172513,0.053446,10,7.0,4.590839e-16,0.064567,0.056469,1,0.100000,0,0,...,0,0,0,0,0,0,0.000000,1.000000,-8.660254e-01,0.500000
187970,0.079087,48,7.0,1.162791e-01,0.041597,0.044824,7,0.145833,0,0,...,0,0,0,0,0,0,0.781831,0.623490,8.660254e-01,-0.500000
47198,0.068069,23,3.0,7.777778e-02,0.110469,0.125823,1,0.043478,0,0,...,0,0,0,0,0,0,0.000000,1.000000,-7.071068e-01,-0.707107
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
194792,0.040936,5,30.0,9.259259e-02,0.110469,0.125823,1,0.200000,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,-7.071068e-01,0.707107
80157,0.045455,19,14.0,4.273504e-02,0.061269,0.053123,2,0.105263,0,0,...,0,0,0,0,0,0,0.781831,0.623490,1.224647e-16,-1.000000
212662,0.072175,12,6.0,7.117438e-02,0.110469,0.125823,1,0.083333,0,0,...,0,0,0,0,0,0,-0.781831,0.623490,-8.660254e-01,-0.500000
8925,0.038462,2,30.0,8.969646e-05,0.110469,0.125823,0,0.000000,0,0,...,0,0,0,0,0,0,0.781831,0.623490,7.071068e-01,-0.707107


In [15]:
scaler = StandardScaler()
Xtar_train_scaled = scaler.fit_transform(Xtar_train)
Xtar_test_scaled = scaler.transform(Xtar_test)

rf_clf = RandomForestClassifier()
rf_clf = rf_clf.fit(Xtar_train_scaled, ytar_train)
ytar_pred = rf_clf.predict(Xtar_test_scaled)
print('Target RF conf matrix: ', metrics.confusion_matrix(ytar_test, ytar_pred))
print('Target RF f1 score: ', metrics.f1_score(ytar_test, ytar_pred))
print('Target RF kappa score: ', metrics.cohen_kappa_score(ytar_test, ytar_pred))
print('Target RF roc_auc score: ', metrics.roc_auc_score(ytar_test, ytar_pred))

Target RF conf matrix:  [[58258  1018]
 [ 4617  1577]]
Target RF f1 score:  0.3588576629878257
Target RF kappa score:  0.320919341297921
Target RF roc_auc score:  0.6187136643100778


In [16]:
# TargetEncoder seems to perform slightly better than LeaveOneOut, even on these metrics
# that are sensitive to over-fitting unbalanced data. Try WeightofEvidence.
# It does have some parameters that could be tuned but don't bother for now. 

Xwoe_train, Xwoe_test, ywoe_train, ywoe_test = train_test_split(X, y, test_size=0.3)

ce_woe = ce.woe.WOEEncoder(cols=categorical_columns)
ce_woe.fit(Xwoe_train, ywoe_train)
Xwoe_train = ce_woe.transform(Xwoe_train)
Xwoe_test = ce_woe.transform(Xwoe_test)

Xwoe_train

Unnamed: 0,user_id,order_by_user_sequence,days_since_prior_order,product_name,aisle_name,dept_name,prior_purchases,purchased_percent_prior,free,fresh,...,sweet,white,whole,rice,fruit,gluten,dow_sin,dow_cos,hour_sin,hour_cos
79316,-0.722054,5,7.0,-0.396136,-0.485932,-0.658074,1,0.200000,0,0,...,0,0,0,0,0,0,0.781831,0.623490,1.224647e-16,-1.000000
156437,-0.455851,16,6.0,-0.628249,-0.367699,-1.153288,1,0.062500,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,9.659258e-01,-0.258819
214845,-0.340601,26,6.0,0.476094,0.469844,0.388614,2,0.076923,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,5.000000e-01,-0.866025
94258,-0.088164,36,5.0,-1.349567,-1.644169,-0.942742,1,0.027778,0,0,...,0,0,0,0,0,0,0.433884,-0.900969,1.224647e-16,-1.000000
71645,-0.208395,23,4.0,-0.810570,-0.485932,-0.658074,1,0.043478,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,-2.588190e-01,-0.965926
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
149969,-0.143982,17,4.0,-2.083536,0.469844,0.388614,1,0.058824,0,0,...,0,0,0,0,0,0,-0.974928,-0.222521,2.588190e-01,-0.965926
175072,0.276265,18,25.0,-0.901542,-0.315713,-0.197089,1,0.055556,0,0,...,0,0,0,0,0,0,-0.433884,-0.900969,-9.659258e-01,-0.258819
63739,-0.394849,5,20.0,-0.656420,-1.155050,-1.153288,1,0.200000,0,0,...,0,0,0,0,0,0,-0.781831,0.623490,8.660254e-01,-0.500000
191111,0.864465,2,7.0,0.801848,0.446848,0.347489,1,0.500000,0,0,...,0,0,0,0,0,0,0.433884,-0.900969,-7.071068e-01,0.707107


In [17]:
scaler = StandardScaler()
Xwoe_train_scaled = scaler.fit_transform(Xwoe_train)
Xwoe_test_scaled = scaler.transform(Xwoe_test)

rf_clf = RandomForestClassifier()
rf_clf = rf_clf.fit(Xwoe_train_scaled, ywoe_train)
ywoe_pred = rf_clf.predict(Xwoe_test_scaled)
print('WeightofEvidence RF conf matrix: ', metrics.confusion_matrix(ywoe_test, ywoe_pred))
print('WeightofEvidence RF f1 score: ', metrics.f1_score(ywoe_test, ywoe_pred))
print('WeightofEvidence RF kappa score: ', metrics.cohen_kappa_score(ywoe_test, ywoe_pred))
print('WeightofEvidence RF roc_auc score: ', metrics.roc_auc_score(ywoe_test, ywoe_pred))

WeightofEvidence RF conf matrix:  [[57995  1018]
 [ 4829  1628]]
WeightofEvidence RF f1 score:  0.3576842799077227
WeightofEvidence RF kappa score:  0.3186168226815662
WeightofEvidence RF roc_auc score:  0.6174395177732184


WeightOfEvidence performed slightly less well than Target encoder and LeaveOneOut. 

Move forward with hyperparameter tuning. Already decided to keep LOO's drop_invariant as default False, and that seems to be the only parameter possibly worth changing. For Target, min_samples_leaf and smoothing values seem to take values greater than 0 (int and float, respectively). I've seen examples with these set to 2 instead of the default 1. Try a number of values to see what makes sense. 

In [None]:
min_sample_leaf_options = [1,2,3,4,5,6,7,8,9,10]
smoothing_options = [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]

for leaf in min_sample_leaf_options:
    for 

For resampling, under-sampling of non-reorders might actually be better than over-sampling reorders because I have such a big dataset; try out different methods and ratios of reordered:not. And/or, try SMOTE to generate synthetic samples; try penalized-SVM or other ways of penalizing models for poor precision/recall. 