In [1]:
import pandas as pd
df_train = pd.read_csv('/Users/samuelhosmer/Downloads/facebook-recruiting-iv-human-or-bot/train.csv')
df_bids = pd.read_csv('/Users/samuelhosmer/Downloads/facebook-recruiting-iv-human-or-bot/bids.csv') #add data to git?

We take on the kaggle classification competition Human or Robot, which was hosted by Facebook in 2014. We are given two tabular datasets, one containing labels, and another containing potential features. The goal is to produce an algorithm that can faithfully discern the difference in bidding activity between humans and bots.

https://www.kaggle.com/c/facebook-recruiting-iv-human-or-bot/overview

In [2]:
df_train.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2013 entries, 0 to 2012
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   bidder_id        2013 non-null   object 
 1   payment_account  2013 non-null   object 
 2   address          2013 non-null   object 
 3   outcome          2013 non-null   float64
dtypes: float64(1), object(3)
memory usage: 63.0+ KB


In [3]:
df_bids.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7656334 entries, 0 to 7656333
Data columns (total 9 columns):
 #   Column       Dtype 
---  ------       ----- 
 0   bid_id       int64 
 1   bidder_id    object
 2   auction      object
 3   merchandise  object
 4   device       object
 5   time         int64 
 6   country      object
 7   ip           object
 8   url          object
dtypes: int64(2), object(7)
memory usage: 525.7+ MB


As we can see , we have a very small labeled data-set. Nevertheless we will serve it to an ML algorithm by first reducing noise via some feature engineering.

To begin we sort the dataset for each bidder via the time of their bids. We then compute successive time differences between bids and add this as an intermediate column, from which we will generate new features.

In [4]:
df = df_bids.sort_values(['bidder_id','time'])
df['difference']=df.groupby('bidder_id')['time'].diff()

Now to build some sensible features to feed an algorthim

In [5]:
df_avg_dif = df.groupby('bidder_id')['difference'].mean().reset_index().rename(columns={'difference':'avg_diff'}) 

df_min_dif = df.groupby('bidder_id')['difference'].min().reset_index().rename(columns={'difference':'min_diff'})

df_var_dif = df.groupby('bidder_id')['difference'].var().reset_index().rename(columns={'difference':'var_diff'})

In [6]:
df_de = df.groupby('bidder_id')['device'].nunique().reset_index().rename(columns={'device':'unique_devices'})

df_ip = df.groupby('bidder_id')['ip'].nunique().reset_index().rename(columns={'ip':'ip_count'})

df_ur = df.groupby('bidder_id')['url'].nunique().reset_index().rename(columns={'url':'url_count'})

df_au = df.groupby('bidder_id')['auction'].nunique().reset_index().rename(columns={'auction':'auc_count'})

Inspecting the data, some promising categorical features are Merchandise and Country. We write a simple function returning a comma separated string containing the 10 most frequent items in a given list.

In [7]:
import numpy as np
from collections import Counter
def ten_mc(lst):
    cnt = Counter(lst)
    mc = np.array(cnt.most_common(10))
    s = ','.join(mc[:,0]) # s = np.array(mc[:,0], dtype=np.str)                
    return s

We then apply our ten_mc function to obtain the 10 most common merchandises and countries for each bidder

In [8]:
df_me = df.groupby('bidder_id')['merchandise'].apply(ten_mc).reset_index() 

In [9]:
df_me.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6614 entries, 0 to 6613
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   bidder_id    6614 non-null   object
 1   merchandise  6614 non-null   object
dtypes: object(2)
memory usage: 103.5+ KB


In [10]:
df_me['merchandise'].apply(lambda x: len(x.split(','))).sum()

6615

It turns out there's only 1 recorded type of merchandise for all but 1 bidder, which is odd and inconvenient.
Thankfully this isn't the case for the country feature.

In [11]:
df['country'] = df['country'].astype(str)
df_ct = df.groupby('bidder_id')['country'].apply(ten_mc).reset_index() 

In [12]:
df_ct.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6614 entries, 0 to 6613
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   bidder_id  6614 non-null   object
 1   country    6614 non-null   object
dtypes: object(2)
memory usage: 103.5+ KB


We also engineer some features to inspect how bidders behave within a given auction

In [13]:
df_vtpa = df.groupby(by = ['bidder_id','auction'])['time'].var().reset_index() 
df_vtpa = df_vtpa.groupby('bidder_id')['time'].median().reset_index().rename(columns = {'time':'med_var_time_per_auc'}) 

In [14]:
df_med_avg_dpa = df.groupby(by=['bidder_id','auction'])['difference'].mean().reset_index() 
df_med_avg_dpa = df_med_avg_dpa.rename(columns={'difference':'med_avg_dif_per_auc'})
df_med_avg_dpa = df_med_avg_dpa.groupby('bidder_id')['med_avg_dif_per_auc'].median().reset_index() 

In [15]:
df_med_min_dpa = df.groupby(by=['bidder_id','auction'])['difference'].min().reset_index()
df_med_min_dpa = df_med_min_dpa.rename(columns={'difference':'med_min_dif_per_auc'})
df_med_min_dpa = df_med_min_dpa.groupby('bidder_id')['med_min_dif_per_auc'].median().reset_index()

In [16]:
df_aupa = df.groupby(by=['bidder_id','auction'])['url'].nunique().reset_index()
df_aupa = df_aupa.rename(columns={'url':'avg_url_per_auc'})
df_aupa = df_aupa.groupby('bidder_id')['avg_url_per_auc'].mean().reset_index()


In [17]:
df_adpa = df.groupby(by=['bidder_id','auction'])['device'].nunique().reset_index()
df_adpa = df_adpa.rename(columns={'device':'med_avg_dev_per_auc'})
df_adpa = df_adpa.groupby('bidder_id')['med_avg_dev_per_auc'].median().reset_index()

In [18]:
df_avg_bid_per_auc = df.groupby(by=['bidder_id','auction'])['bid_id'].count().reset_index()
df_avg_bid_per_auc = df_avg_bid_per_auc.rename(columns={'bid_id':'avg_bid_per_auc'})
df_avg_bid_per_auc = df_avg_bid_per_auc.groupby('bidder_id')['avg_bid_per_auc'].mean().reset_index()

In [19]:
ip_per_auc = df.groupby(by = ['bidder_id','auction'])['ip'].nunique().reset_index()
ip_per_auc = ip_per_auc.drop(columns = ['auction'])
df_ic = ip_per_auc.groupby('bidder_id')['ip'].mean().reset_index().rename(columns = {'ip':'avg_ip_per_auc'})

In [20]:
dff = df_train.set_index('bidder_id')

In [21]:
dfs = [df_min_dif, df_avg_dif, df_med_min_dpa, df_med_avg_dpa, df_vtpa, df_avg_bid_per_auc, df_ip, df_ur, 
       df_ct, df_de, df_me, df_au, df_ic, df_aupa, df_adpa, df_var_dif]

for j in dfs:
    dff = dff.merge(j, how = 'left', on = 'bidder_id')
dft = dff.set_index('bidder_id').copy()
dft = dft.replace({pd.NA: np.nan})

In [22]:
dft[dft['outcome']==1].describe()

Unnamed: 0,outcome,min_diff,avg_diff,med_min_dif_per_auc,med_avg_dif_per_auc,med_var_time_per_auc,avg_bid_per_auc,ip_count,url_count,unique_devices,auc_count,avg_ip_per_auc,avg_url_per_auc,med_avg_dev_per_auc,var_diff
count,103.0,98.0,98.0,98.0,98.0,98.0,103.0,103.0,103.0,103.0,103.0,103.0,103.0,103.0,98.0
mean,1.0,80021480.0,53323320000.0,25104990000.0,36399580000.0,6.745955e+24,23.154672,2387.796117,544.582524,163.61165,145.038835,12.062625,6.308186,1.84466,9.075986e+23
std,0.0,360723400.0,160672500000.0,202940000000.0,217284800000.0,9.595151e+24,42.999725,11269.674137,1163.909786,222.811854,195.103186,26.301128,11.697261,1.895595,2.496103e+24
min,1.0,0.0,99282300.0,0.0,99115440.0,2.751016e+19,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,5472533000000000.0
25%,1.0,0.0,5556730000.0,105263200.0,889431200.0,7.781177e+23,6.753145,34.0,4.5,4.5,23.0,2.249042,1.234326,1.0,4.905921e+20
50%,1.0,0.0,18749250000.0,1184211000.0,6427769000.0,5.089716e+24,11.863014,290.0,88.0,78.0,74.0,5.89,3.432271,1.0,1.7266e+22
75%,1.0,0.0,46518440000.0,3190789000.0,15144740000.0,1.044141e+25,22.253428,1089.0,591.0,219.0,170.5,11.909535,8.292066,2.0,4.14889e+23
max,1.0,3052632000.0,1516000000000.0,2007868000000.0,2140171000000.0,8.198439e+25,325.0,111918.0,8551.0,1144.0,1018.0,212.714286,109.0,15.0,1.853533e+25


In [23]:
dft[dft['outcome']==0].describe()

Unnamed: 0,outcome,min_diff,avg_diff,med_min_dif_per_auc,med_avg_dif_per_auc,med_var_time_per_auc,avg_bid_per_auc,ip_count,url_count,unique_devices,auc_count,avg_ip_per_auc,avg_url_per_auc,med_avg_dev_per_auc,var_diff
count,1910.0,1584.0,1584.0,1584.0,1584.0,1319.0,1881.0,1881.0,1881.0,1881.0,1881.0,1881.0,1881.0,1881.0,1438.0
mean,0.0,1180262000000.0,3395101000000.0,1884196000000.0,2290625000000.0,1.15637e+26,6.441525,581.256247,335.187135,73.947368,58.070707,4.422164,2.666974,1.481393,1.009949e+26
std,0.0,6993803000000.0,8378402000000.0,7605833000000.0,7935211000000.0,3.610681e+26,29.986961,4140.67818,2735.527301,184.560908,142.933476,24.201059,6.301742,3.110539,2.727733e+26
min,0.0,0.0,70843430.0,0.0,62656640.0,1385042000000000.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1385042000000000.0
25%,0.0,52631580.0,100917000000.0,8315789000.0,35231360000.0,3.250935e+24,1.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.453019e+23
50%,0.0,789473700.0,709050600000.0,91947370000.0,232789500000.0,9.192128e+24,1.61039,11.0,4.0,8.0,9.0,1.4,1.172414,1.0,2.12242e+24
75%,0.0,27223680000.0,2763599000000.0,869815800000.0,1372539000000.0,1.668084e+25,3.75,88.0,34.0,52.0,41.0,3.0,2.0,1.0,3.398833e+25
max,0.0,76102950000000.0,76102950000000.0,76102950000000.0,76102950000000.0,2.590265e+27,1023.5,109159.0,81376.0,2618.0,1623.0,980.0,129.166667,82.0,2.377356e+27


From the above statistical descriptions of our features by label, we choose the most promising features as inputs.

In [24]:
num_feat_nms = [ 'ip_count','avg_bid_per_auc','avg_ip_per_auc',
                'med_min_dif_per_auc', 'med_avg_dif_per_auc',
                 'med_var_time_per_auc','auc_count','avg_diff',
                'min_diff', 'var_diff','unique_devices']   
                                              
cat_feat_nms = ['country'] 

In [25]:
x_train = dft[cat_feat_nms + num_feat_nms ]
y_train = dft["outcome"]

In order to feed the 'Country' data to an algorithm in SciKitLearn, we first implement the following custom preprocessing transformation. The idea is to convert each country to a unit coordinate vector in R^d, where d is the number of countries in our data. We then take the top 10 countries for each bidder and sum the corresponding coordinate vectors. We acomplish this with CountVectorizer.

In [26]:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_extraction.text import CountVectorizer

country_ix = x_train.columns.get_loc("country")

class CountVectPre(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self
    def transform(self,X):
        tmp = X[:,country_ix]
        vec = CountVectorizer()
        x=vec.fit_transform(tmp).astype('float32')
        return np.c_[np.delete(X,country_ix,1),x.toarray()].astype('float32') 
        

Now we're ready to build a preprocessing/cleaning pipeline to feed in our data.

In [27]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

num_pipe = Pipeline([
        ("imputer", SimpleImputer(strategy="median")),
        ("scaler", StandardScaler())
    ])

cat_pipe = Pipeline([
        ("imputer", SimpleImputer(strategy="most_frequent")), 
        ("cat_encoder", CountVectPre()) 
    ])

In [28]:
from sklearn.compose import ColumnTransformer
full_pipe = ColumnTransformer([
            ("cat",cat_pipe,cat_feat_nms),
            ("num",num_pipe,num_feat_nms)
            
        ])

In [29]:
x_train = full_pipe.fit_transform(x_train)

In [30]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, cross_val_score

kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
fc = RandomForestClassifier(max_features="sqrt",random_state=1)
fc.fit(x_train,y_train)

score_fc = cross_val_score(fc, x_train, y_train, cv=kfold, scoring="roc_auc")
print(score_fc.mean()) #.9265 before change .mean() to .median()
print(score_fc.std())

0.9290361732508329
0.047423782060115646


In [31]:
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

param_space_fc = {'criterion':['gini','entropy'], 'max_features': np.arange(1,13,2),
                  'n_estimators': [300,400,500]}

rnd_srch_fc = RandomizedSearchCV(fc, param_space_fc, cv=kfold, n_iter=30,
                                 scoring="roc_auc", verbose=0) #set verbose=3 to inspect
rnd_srch_fc.fit(x_train,y_train)

print(rnd_srch_fc.best_params_)

{'n_estimators': 400, 'max_features': 5, 'criterion': 'entropy'}


In [32]:
fcb = rnd_srch_fc.best_estimator_
fcb.fit(x_train,y_train)
#{'n_estimators': 400, 'max_features': 5, 'criterion': 'entropy'}

score_fcb = cross_val_score(fcb, x_train, y_train, cv=kfold, scoring="roc_auc")
print(score_fcb.mean()) #0.93326 before change in features
print(score_fcb.std()) 

0.9334316991908616
0.03870021426063308


And just like that, we're in the top 50 on the leaderboard of the Kaggle competition. Well within the top 10% of all participants that achived scores above the sample submission benchmark (0.5 ROC_AUC).

Instead of improving this model to attain a potentially higher score, let's test drive a Multilayer Perceptron or Neural net, on the task.

Neural nets are universal approximators of continuous functions on the n-disk 

[cf. Hornik, Kurt. 1990. https://www.sciencedirect.com/science/article/abs/pii/089360809190009T],


and are very well suited to classification tasks. To test one, or lots, we use SciKeras, a Keras wrapper to utilize the model selection functionality of Scikit-Learn.

In [33]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import warnings

tf.get_logger().setLevel('ERROR')
warnings.filterwarnings("ignore", message="Setting the random state for TF")

keras.backend.clear_session()
np.random.seed(1)
tf.random.set_seed(1)

In [34]:
def clf_net(n_hidden,n_nodes,dropout,meta):
    n_features_in_ = meta["n_features_in_"]
    n_classes_ = meta["n_classes_"]
    model = keras.models.Sequential()
    model.add(layers.Input(shape=(n_features_in_,)))
    for k in range(n_hidden):
        model.add(layers.Dense(n_nodes, activation="relu"))
    model.add(layers.Dropout(dropout))
    model.add(layers.Dense(1, activation="sigmoid"))
    return model


In [35]:
from scikeras.wrappers import KerasClassifier

kc = KerasClassifier(model=clf_net,
                     optimizer="adam",
                     optimizer__learning_rate=8e-3,
                     loss="binary_crossentropy",
                     n_hidden=1,
                     n_nodes=128,
                     dropout=0.3,
                     metrics="AUC",
                     epochs=30,
                     verbose =0, #
                     random_state=1
                    )

kc.fit(x_train,y_train,verbose=False)


2021-12-27 14:42:35.360294: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


KerasClassifier(
	model=<function clf_net at 0x186fb45f0>
	build_fn=None
	warm_start=False
	random_state=1
	optimizer=adam
	loss=binary_crossentropy
	metrics=AUC
	batch_size=None
	validation_batch_size=None
	verbose=0
	callbacks=None
	validation_split=0.0
	shuffle=True
	run_eagerly=False
	epochs=30
	optimizer__learning_rate=0.008
	n_hidden=1
	n_nodes=128
	dropout=0.3
	class_weight=None
)

In [36]:
score_kc = cross_val_score(kc, x_train, y_train, cv=kfold, scoring="roc_auc")

print(score_kc.mean()) 
print(score_kc.std())


0.8003331746787244
0.05265215029050663


This is not terrible, but underwhelming. Neural networks are famously finicky. Perhaps we can find the right hyperparameters and get some positive movement on the score. As the data is fairly imbalanced, we can modify the class_weights argument in an attempt to balance the data without resampling. [c.f. https://keras.io/examples/structured_data/imbalanced_classification/] 

In [37]:
pos = y_train.sum() #103
neg = len(y_train) - y_train.sum() #1910
weight_for_0 = 1
weight_for_1 = neg//pos
class_weight = {0: weight_for_0, 1: weight_for_1}

In [38]:
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from scipy.stats import reciprocal

param_space_kc = { "n_hidden": [1,2,3], "n_nodes": [32,64,128],
                   "dropout": np.random.uniform(0,0.5, 20),
                   "optimizer__learning_rate": reciprocal(1e-5,1e-1),
                   "class_weight": [None, class_weight]
              }
              
rnd_srch_kc = RandomizedSearchCV(kc, param_space_kc, n_iter=30, 
                                 cv=kfold, scoring="roc_auc",
                                 verbose=0
                                ) 

                                 
rnd_srch_kc.fit(x_train,y_train)



RandomizedSearchCV(cv=StratifiedKFold(n_splits=10, random_state=1, shuffle=True),
                   estimator=KerasClassifier(dropout=0.3, epochs=30, loss='binary_crossentropy', metrics='AUC', model=<function clf_net at 0x186fb45f0>, n_hidden=1, n_nodes=128, optimizer='adam', optimizer__learning_rate=0.008, random_state=1, verbose=0),
                   n_iter=30,
                   param_distributions={'class_weight': [N...
       7.33779454e-02, 4.61692974e-02, 9.31301057e-02, 1.72780364e-01,
       1.98383737e-01, 2.69408367e-01, 2.09597257e-01, 3.42609750e-01,
       1.02226125e-01, 4.39058718e-01, 1.36937966e-02, 3.35233755e-01,
       2.08652401e-01, 2.79344914e-01, 7.01934693e-02, 9.90507445e-02]),
                                        'n_hidden': [1, 2, 3],
                                        'n_nodes': [32, 64, 128],
                                        'optimizer__learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x187e7a910>},
                 

In [39]:
print(rnd_srch_kc.best_params_) #show best parameters

{'class_weight': None, 'dropout': 0.34260975019837975, 'n_hidden': 1, 'n_nodes': 128, 'optimizer__learning_rate': 0.00038967931033848135}


In [40]:
kcb = rnd_srch_kc.best_estimator_
kcb.fit(x_train,y_train)

KerasClassifier(
	model=<function clf_net at 0x186fb45f0>
	build_fn=None
	warm_start=False
	random_state=1
	optimizer=adam
	loss=binary_crossentropy
	metrics=AUC
	batch_size=None
	validation_batch_size=None
	verbose=0
	callbacks=None
	validation_split=0.0
	shuffle=True
	run_eagerly=False
	epochs=30
	optimizer__learning_rate=0.00038967931033848135
	n_hidden=1
	n_nodes=128
	dropout=0.34260975019837975
	class_weight=None
)

In [41]:
score_kcb = cross_val_score(kcb, x_train, y_train, cv=kfold, scoring="roc_auc")
#predict_kcb = cross_val_predict(kcb, x_train, y_train, cv=kfold, method="predict_proba")

print(score_kcb.mean()) #.8624 #'n_nodes': 232, 'n_hidden': 2, 'lrn_rate': 0.009133769442335437, 'drp_rt': 0.39647750153506484
print(score_kcb.std())


0.8570918610185625
0.05885763081323874


Not as much improvement as we would have hoped for. Our Neural Net still is lagging far behind the bagged trees, and well below the median score of participants that scored above the sample submission benchmark.

Perhaps Neural Networks have a bias problem in the setting of small imbalanced tabular data sets-- this is something to think about.

We finish by checking if the most famous boosting algorithm can come reasonably close to the most famous bagging one.

In [42]:
from sklearn.ensemble import GradientBoostingClassifier
gc = GradientBoostingClassifier(n_estimators=200, random_state=1)
gc.fit(x_train,y_train)




score_gc = cross_val_score(gc,x_train,y_train,cv=kfold,scoring='roc_auc')
print(score_gc.mean()) 
print(score_gc.std())

0.905233222275107
0.05798335083042643


In [43]:
param_space_gc =  {'loss':['deviance','exponential'], 'n_estimators': np.arange(200,501,50),
             'max_depth':np.arange(1,5), 'learning_rate': np.linspace(0,.1,20)
                }
              
              
rnd_srch_gc = RandomizedSearchCV(gc, param_space_gc, n_iter=30, 
                                 cv=kfold, scoring="roc_auc",
                                 verbose=0
                                ) 

                                 
rnd_srch_gc.fit(x_train,y_train)

RandomizedSearchCV(cv=StratifiedKFold(n_splits=10, random_state=1, shuffle=True),
                   estimator=GradientBoostingClassifier(n_estimators=200,
                                                        random_state=1),
                   n_iter=30,
                   param_distributions={'learning_rate': array([0.        , 0.00526316, 0.01052632, 0.01578947, 0.02105263,
       0.02631579, 0.03157895, 0.03684211, 0.04210526, 0.04736842,
       0.05263158, 0.05789474, 0.06315789, 0.06842105, 0.07368421,
       0.07894737, 0.08421053, 0.08947368, 0.09473684, 0.1       ]),
                                        'loss': ['deviance', 'exponential'],
                                        'max_depth': array([1, 2, 3, 4]),
                                        'n_estimators': array([200, 250, 300, 350, 400, 450, 500])},
                   scoring='roc_auc')

In [44]:
print(rnd_srch_gc.best_params_,"\n",rnd_srch_gc.best_score_) 
gcb = rnd_srch_gc.best_estimator_
# {'n_estimators': 250, 'max_depth': 3, 'loss': 'exponential', 'learning_rate': 0.021052631578947368} 0.9136387434554974

{'n_estimators': 250, 'max_depth': 3, 'loss': 'exponential', 'learning_rate': 0.021052631578947368} 
 0.9135292717753452


In [45]:
score_gcb = cross_val_score(gcb,x_train,y_train,cv=kfold,scoring="roc_auc")
print(score_gcb.mean()) 
print(score_gcb.std())

0.9135292717753452
0.052317718075555554


Not bad, and not great! This is squarely at about the 60th percentile of all participants above the sample benchmark, while our Random Forest Classifier is at about the 94th percentile. 


We should want to experiment with the Random Forest Classifier more, in order to achieve the best possible score, given the noise in our data.

Some discussion: 

Observing the variances of the Neural Net and the Gradient Boosting Classifier, it seems that both might benefit from a bit of bagging themselves. My suspicion is the Gradient Boosting Classifier would be more likely to benefit from this-- potentially making it competitive with the Random Forest Classifier-- while the Neural Net would directly trade off it's variance with bias.  