## <font color='green'> Binary Classification (Dense Matrix) <font>

### <font color='green'> 1. Description<font>

This program shows creditcard fraud detection using decision tree. Please download the data from https://www.kaggle.com/mlg-ulb/creditcardfraud manually (registration required) and place the file (creditcard.csv) in `datasets` directory.

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that
occurred in two days, where we have 492 frauds out of 284,807 transactions. It contains only numerical input variables which are the
result of a PCA transformation.

### <font color='green'> 2. Data Preprocessing<font>

In [1]:
# prepare data
import numpy as np
import pandas as pd
df = pd.read_csv('./datasets/creditcard.csv')
class_names = {0:'Not Fraud', 1:'Fraud'}
print(df.Class.value_counts().rename(index = class_names))

data_features = df.drop(['Time', 'Class'], axis=1).values
data_target = df['Class'].values

from sklearn.model_selection import train_test_split
np.random.seed(123)
X_train, X_test, y_train, y_test = train_test_split(data_features, data_target, train_size=0.70, test_size=0.30, random_state=1)

Not Fraud    284315
Fraud           492
Name: Class, dtype: int64


### <font color='green'> 3. Implementation using Frovedis<font>

In [2]:
# train
import os, time
from frovedis.exrpc.server import FrovedisServer
from frovedis.mllib.tree import DecisionTreeClassifier as frovDecisionTreeClassifier
FrovedisServer.initialize("mpirun -np 8 {}".format(os.environ['FROVEDIS_SERVER']))

model = frovDecisionTreeClassifier(max_depth=8)
t1 = time.time()
model.fit(X_train, y_train)
t2 = time.time()
print ("train time: {:.3f} sec".format(t2-t1))

train time: 0.211 sec


In [3]:
# predict
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score, recall_score
pred = model.predict(X_test)
cmat = confusion_matrix(y_test, pred)
tpos = cmat[0][0]
fneg = cmat[1][1]
fpos = cmat[0][1]
tneg = cmat[1][0]
f1Score = round(f1_score(y_test, pred), 2)
recallScore = round(recall_score(y_test, pred), 2)
print('confusion matrix:')
print(cmat)
print('Accuracy: '+ str(np.round(100*float(tpos+fneg)/float(tpos+fneg + fpos + tneg),2))+'%')
print("Recall : {recall_score}".format(recall_score = recallScore))
print("F1 Score : {f1_score}".format(f1_score = f1Score))

FrovedisServer.shut_down()

confusion matrix:
[[85292    16]
 [   34   101]]
Accuracy: 99.94%
Recall : 0.75
F1 Score : 0.8


### <font color='green'> 3. Implementation using scikit-learn<font>

In [4]:
# train
import os, time
from sklearn.tree import DecisionTreeClassifier as skDecisionTreeClassifier

model = skDecisionTreeClassifier(max_depth=8)
t1 = time.time()
model.fit(X_train, y_train)
t2 = time.time()
print ("train time: {:.3f} sec".format(t2-t1))

train time: 5.762 sec


In [5]:
# predict
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score, recall_score
pred = model.predict(X_test)
cmat = confusion_matrix(y_test, pred)
tpos = cmat[0][0]
fneg = cmat[1][1]
fpos = cmat[0][1]
tneg = cmat[1][0]
f1Score = round(f1_score(y_test, pred), 2)
recallScore = round(recall_score(y_test, pred), 2)
print('confusion matrix:')
print(cmat)
print('Accuracy: '+ str(np.round(100*float(tpos+fneg)/float(tpos+fneg + fpos + tneg),2))+'%')
print("Recall : {recall_score}".format(recall_score = recallScore))
print("F1 Score : {f1_score}".format(f1_score = f1Score))

confusion matrix:
[[85295    13]
 [   37    98]]
Accuracy: 99.94%
Recall : 0.73
F1 Score : 0.8
