# Predicting credit card fraud with autoencoders

Credit card fraud is a complicated problem:

- The data set is hugely imbalanced.
- Low quality of labeled data => fraud patterns hard to find.

We will give it a try to a neural network technique called *autoencoders*.

Originally used for computer vision, autoencoders are 

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("./data/creditcard.csv")

FileNotFoundError: File b'./data/creditcard.csv' does not exist

In [None]:
df.head()

In [None]:
df.groupby(['Class'])['Class'].count()

We have 492 fraudulent transactions against 28k...

In [None]:
%matplotlib inline
df.groupby(['Class'])['Class'].hist()

In [None]:
fraud = df[df['Class']==1]
normal = df[df['Class']==0]

In [None]:
normal['Amount'].hist()

In [None]:
fraud['Amount'].hist()

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
scl = StandardScaler()

In [None]:
normal['Amount'] = scl.fit_transform(normal['Amount'].values.reshape(-1,1))

In [None]:
normal.head()

In [None]:
X = normal.drop(['Class','Time'], axis=1)

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test = train_test_split(X, test_size=0.2)

## Building the model

Let's use `keras` to create our autoencoder network.


In [None]:
from keras.layers import Input, Dense
from keras.models import Model

In [None]:
X_train.shape

In [None]:
input_dim = X_train.shape[1]

In [None]:
inner_dim = int(input_dim/2)

In [None]:
inner_dim

In [None]:
input_layer = Input(shape=(input_dim,))

In [None]:
encoder = Dense(inner_dim, activation = 'tanh')(input_layer)

In [None]:
decoder = Dense(input_dim, activation='relu')(encoder)

In [None]:
autoencoder = Model(inputs = input_layer, outputs=decoder)

In [None]:
autoencoder.summary()

In [None]:
autoencoder.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])

In [None]:
history = autoencoder.fit(X_train, X_train, 
                         epochs = 2, 
                         batch_size = 32).history

In [None]:
import numpy as np

In [None]:
fraud['Amount'] = scl.transform(fraud['Amount'].values.reshape(-1,1))

In [None]:
true_fraud = fraud['Class'].values

In [None]:
X_fraud = fraud.drop(['Class','Time'], axis=1)

In [None]:
preds = autoencoder.predict(X_fraud)

In [None]:
mse = np.mean(np.power(X_fraud-preds,2), axis=1)

In [None]:
mse.describe()

In [None]:
threshold = 0.4

In [None]:
suspicious = np.where(mse>=threshold, 1, 0)

In [None]:
suspicious

In [None]:
np.sum(suspicious!=true_fraud)/len(true_fraud)*100