# IEEE-CIS Fraud Detection

## Background

The [ieee-fraud-detection](https://www.kaggle.com/c/ieee-fraud-detection/overview) is a competition hosted by kaggle for  [IEEE Computational Intelligence Society (IEEE-CIS)](https://cis.ieee.org/). The challenge requires participants to predict the probability that an online transaction is fraudulent, as denoted by the binary target isFraud.

## Dataset

The [data](https://www.kaggle.com/c/14242/download-all) is broken into two files: identity and transaction, which are joined by TransactionID; not all transactions have corresponding identity information.

### Import packages

In [1]:
from keras.callbacks import Callback
from keras.layers import Dense, Input, Dropout, BatchNormalization, Activation
from keras.models import Model
from keras.optimizers import Adam, Nadam
from keras.utils import plot_model
from keras.utils.generic_utils import get_custom_objects
from sklearn.metrics import roc_auc_score

import keras
import keras.backend as K
import numpy as np
import pandas as pd
import random
import tensorflow as tf
import warnings
warnings.filterwarnings("ignore")

np.random.seed(42) # NumPy
random.seed(42) # Python
tf.set_random_seed(42) # Tensorflow

Using TensorFlow backend.


### Data Loading and Preprocessing

After a brief investigation there isn't any significant improvement by using all features; hence, only the features from transaction table were used. The transaction data has categorical and numerical features: 

- for numerical features: 
    - log-transform was applied to skewed data to make it normally distributed
    - standardization was applied to make the training process well behaved because the numerical condition of the optimization problems is improved.
- for categorical features: 
    - OneHot transformation using only the top 50 categories per feature to reduce sparsity.

In [2]:
import dataproc
X_tr, X_val, X_test, y_tr, y_val, sub = dataproc.main(upsample=True)

### Modeling

Focal loss and binary cross entropy were used as loss functions

In [3]:
# Compatible with tensorflow backend
class roc_callback(Callback):
    def __init__(self,training_data,validation_data):
        self.x = training_data[0]
        self.y = training_data[1]
        self.x_val = validation_data[0]
        self.y_val = validation_data[1]


    def on_train_begin(self, logs={}):
        return

    def on_train_end(self, logs={}):
        return

    def on_epoch_begin(self, epoch, logs={}):
        return

    def on_epoch_end(self, epoch, logs={}):
        y_pred_val = self.model.predict(self.x_val)
        roc_val = roc_auc_score(self.y_val, y_pred_val)
        print('\rroc-auc_val: %s' % (str(round(roc_val,4))),end=100*' '+'\n')
        return

    def on_batch_begin(self, batch, logs={}):
        return

    def on_batch_end(self, batch, logs={}):
        return
    
def focal_loss(gamma=2., alpha=.25):
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        return -K.mean(alpha * K.pow(1. - pt_1, gamma) * K.log(K.epsilon()+pt_1))-K.mean((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0 + K.epsilon()))
    return focal_loss_fixed

def custom_gelu(x):
    return 0.5 * x * (1 + tf.tanh(tf.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))

get_custom_objects().update({'custom_gelu': Activation(custom_gelu)})
get_custom_objects().update({'focal_loss_fn': focal_loss()})

W1018 19:18:35.128804  3664 deprecation_wrapper.py:119] From C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.



The model was trained by CNN layers, the network is shown below:

In [4]:
def create_model(loss_fn):
    inps = Input(shape=(X_tr.shape[1],))
    x = Dense(512, activation=custom_gelu)(inps)
    x = BatchNormalization()(x)
    x = Dropout(0.3)(x)
    x = Dense(256, activation=custom_gelu)(x)
    x = BatchNormalization()(x)
    x = Dropout(0.2)(x)
    x = Dense(1, activation='sigmoid')(x)
    model = Model(inputs=inps, outputs=x)
    model.compile(
        optimizer=Nadam(),
        loss=[loss_fn]
    )
    model.summary()
    plot_model(model, to_file='cnn-cifar10.png', show_shapes=True)
    return model

In [5]:
model_focal = create_model('focal_loss_fn')
model_bce = create_model('binary_crossentropy')

W1018 19:18:35.176815  3664 deprecation_wrapper.py:119] From C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W1018 19:18:35.308843  3664 deprecation_wrapper.py:119] From C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W1018 19:18:35.620914  3664 deprecation_wrapper.py:119] From C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

W1018 19:18:35.684931  3664 deprecation.py:506] From C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 504)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               258560    
_________________________________________________________________
batch_normalization_1 (Batch (None, 512)               2048      
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 256)               131328    
_________________________________________________________________
batch_normalization_2 (Batch (None, 256)               1024      
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
__________

In [6]:
model_bce.fit(
    X_tr, y_tr, epochs=8, batch_size=2048, validation_data=(X_val, y_val), verbose=True, 
    callbacks=[roc_callback(training_data=(X_val, y_tr), validation_data=(X_val, y_val))]
)
model_focal.fit(
    X_tr, y_tr, epochs=8, batch_size=2048, validation_data=(X_val, y_val), verbose=True, 
    callbacks=[roc_callback(training_data=(X_val, y_tr), validation_data=(X_val, y_val))]
)

Train on 912022 samples, validate on 118108 samples
Epoch 1/8
roc-auc_val: 0.9367                                                                                                    
Epoch 2/8
roc-auc_val: 0.9532                                                                                                    
Epoch 3/8
roc-auc_val: 0.9565                                                                                                    
Epoch 4/8
roc-auc_val: 0.9565                                                                                                    
Epoch 5/8
roc-auc_val: 0.9592                                                                                                    
Epoch 6/8
roc-auc_val: 0.9618                                                                                                    
Epoch 7/8
roc-auc_val: 0.9599                                                                                                    
Epoch 8/8
roc-auc_val: 0.9581         

<keras.callbacks.History at 0x19ad78e45f8>

In [7]:
val_preds_bce = model_bce.predict(X_val).flatten()
val_preds_focal = model_focal.predict(X_val).flatten()

The same model with different loss functions used were ensembled using averaging appeared to have higher scores; even higher when using rankdata.

In [8]:
from scipy.stats import rankdata, spearmanr

print('BCE preds: ', roc_auc_score(y_val, val_preds_bce))
print('Focal preds: ',roc_auc_score(y_val, val_preds_focal))
print('Averaging: ', roc_auc_score(y_val, val_preds_bce + val_preds_focal))
print('Rank averaging: ', roc_auc_score(y_val, rankdata(val_preds_bce, method='dense') + rankdata(val_preds_focal, method='dense')))

BCE preds:  0.9580845100827509
Focal preds:  0.9580170076420838
Averaging:  0.961566422819819
Rank averaging:  0.9624829550385177


### Predicting

In [10]:
sub = sub.reset_index()
sub.loc[:, 'TransactionID'] = sub.loc[:, 'TransactionID'].astype(int)
sub.loc[:, 'isFraud'] = rankdata(model_bce.predict(X_test).flatten(), method='dense') + rankdata(model_focal.predict(X_test).flatten(), method='dense')
sub.loc[:, 'isFraud'] = sub.loc[:, 'isFraud']/sub.loc[:, 'isFraud'].max()
sub.to_csv('submission.csv', index=False)