# DETECTING CREDIT CARD FRAUD WITH A ONE-CLASS SVM
### Daniel Loden, April 2017

# Overview


----------
The motivation for this project is to apply anomaly detection to identify credit card fraud.  Data have been trained and tested using a large set of anonymised credit card transactions.  

Given the very low incidence of fraudulent transactions, the analysis utilises on One-Class SVM.   The model performed relatively well with regard to recall, but precision was a shortcoming.  However, it is expected that in practice, model performance would be improved through using identifiable cardholder level data.

# Data Processing


----------

Training data consisted of  284,807 transactions, which were evenly split into training and test sets.  Fraudulent transactions were then removed from the training data, to train the One-Class SVM.

Min-max normalisation was instantiated for the model pipeline.

In [None]:
# Load packages
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report
from sklearn.svm import OneClassSVM
from sklearn.pipeline import Pipeline

# Load dataset
data = pd.read_csv('../input/creditcard.csv')
data.drop('Time', axis=1, inplace=True)

# Split into features and target
X = data.ix[:, 0:29]
y = data.ix[:, 29]

# Split into training and testing sets (50%, 50%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.5, random_state = 2008)

# Remove fradulent transactions from training data for One-Class SVMs
X_train_good = X_train[y_train == 0]
y_train_good = y_train[y_train == 0]

# Instantiate normalisation 
nrm = MinMaxScaler()

# Checking Incidence of Fraud


----------
The incidence of fraud is extremely low.   As such, supervised learning will be avoided in favour of an anomaly detection approach.

In [None]:
print('Proportion of fraudulent transactions', round(np.mean(y_train), 3))

# Train One-Class SVM


----------

One-Class SVM was chosen due to its usefulness for detecting rare events.  

In [None]:
svm = OneClassSVM(random_state=2008, nu=0.2) # Nu set by trial and error
svm_pl = Pipeline([('Normalise', nrm),
                   ('SVM', svm)])
svm_pl.fit(X_train_good)

# Test Model


----------

The model detected most fraudulent transactions.  Whilst a large majority of legitimate transactions were correctly identified, the proportion flagged as fraudulent was nonetheless substantial.  

This model demonstrates the relevance of One-Class SVM to credit card fraud detection. In practice, higher levels of recall and precision would be necessary, and would be achieved through using identifiable cardholder data, which were not available for this project.

In [None]:
# Predict fradulent transactions in test set
y_test_pred = svm_pl.predict(X_test) # Outputs data in {-1, 1}
y_test_pred = ((y_test_pred * -1) + 1) / 2 # Convert to {1, 0}

# Evaluate
print(classification_report(y_test, y_test_pred))