#### Problem: Cluster-based Unsupervised Anomaly Detection 

Technique: One class SVM 

Language: Python

Library: svm.OneClassSVM (sklearn)

Source: https://www.kaggle.com/amarnayak/once-class-svm-to-detect-anomaly

Original paper of the technique: https://papers.nips.cc/paper/1723-support-vector-method-for-novelty-detection.pdf

Data at Kaggle: https://www.kaggle.com/mlg-ulb/creditcardfraud

### Import Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn import svm

### Load Data

In [2]:
BASE = 'D:\\ResearchDataGtx1060\\AnomalyDetectionData\\'

In [3]:
cc =  pd.read_csv(BASE+"creditcard.csv")
cc.head(5)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [4]:
#There can be a conflict in the name 'class'. If so, hanged the name to category
cc= cc.rename(columns={'Class': 'Category'})

In [5]:
# For convinience, divide the dataframe cc based on two labels. 
nor_obs = cc.loc[cc.Category==0]    #Data frame with normal observation
ano_obs = cc.loc[cc.Category==1]    #Data frame with anomalous observation

The given dataframe 'cc' is divided into three sets

Training set: train_features

Test observations/features: X_test

Test labels: Y_test

Once class SVM is trained with the observations of only one class. In this case, the algorithm is trained with first 200,000 observation of normal transactions. The remaining observations are merged with the anomalous observation to create a test set.

In [6]:
train_feature = nor_obs.loc[0:200000, :]
train_feature = train_feature.drop('Category', 1)
Y_1 = nor_obs.loc[200000:, 'Category']
Y_2 = ano_obs['Category']

In [7]:
# Creatng test observations/features
X_test_1 = nor_obs.loc[200000:, :].drop('Category',1)
X_test_2 = ano_obs.drop('Category',1)
X_test = X_test_1.append(X_test_2)

In [8]:
#Y_test is used to evaluste the model
# The remain data set is (after 200,000 observations) are 
# appended with anomalous observations
Y_1 = nor_obs.loc[200000:, 'Category']
Y_2 = ano_obs['Category']
Y_test= Y_1.append(Y_2)

### Train Model

In [9]:
# Setting the hyperparameters for Once Class SVM
# We used various combination of hyperparameters like 
# linear, rbf, poly, gamma- 0.001, 0.0001, nu- 0.25, 0.5, 0.75, 0.95
# This combination gaves the most accurate results.
oneclass = svm.OneClassSVM(kernel='linear', gamma=0.001, nu=0.95)

# Training the algorithm with the features. 
# This stage is very time consuming processes. 
# In my laptop it took half an hour to train for 200,000 observations. 
# For rbf, the time taken is even more.
oneclass.fit(train_feature)

OneClassSVM(cache_size=200, coef0=0.0, degree=3, gamma=0.001, kernel='linear',
            max_iter=-1, nu=0.95, shrinking=True, tol=0.001, verbose=False)

In [10]:
# Test the algorithm on the test set
fraud_pred = oneclass.predict(X_test)

In [11]:
# Check the number of outliers predicted by the algorithm
unique, counts = np.unique(fraud_pred, return_counts=True)
print (np.asarray((unique, counts)).T)

[[   -1   371]
 [    1 84821]]


In [12]:
#Convert Y-test and fraud_pred to dataframe for ease of operation
Y_test= Y_test.to_frame()
Y_test=Y_test.reset_index()
fraud_pred = pd.DataFrame(fraud_pred)
fraud_pred= fraud_pred.rename(columns={0: 'prediction'})

In [13]:
##Performance check of the model

TP = FN = FP = TN = 0
for j in range(len(Y_test)):
    if Y_test['Category'][j]== 0 and fraud_pred['prediction'][j] == 1:
        TP = TP+1
    elif Y_test['Category'][j]== 0 and fraud_pred['prediction'][j] == -1:
        FN = FN+1
    elif Y_test['Category'][j]== 1 and fraud_pred['prediction'][j] == 1:
        FP = FP+1
    else:
        TN = TN +1
print (TP,  FN,  FP,  TN)

84700 0 121 371


In [14]:
# Performance Matrix
accuracy = (TP+TN)/(TP+FN+FP+TN)
print (accuracy)
sensitivity = TP/(TP+FN)
print (sensitivity)
specificity = TN/(TN+FP)
print (specificity)

0.9985796788430839
1.0
0.7540650406504065


### Simple Explantation of one class SVM

https://stats.stackexchange.com/questions/99162/what-is-one-class-svm-and-how-does-it-work

I will assume you understand how a standard SVM works. To summarise, it separates two classes using a hyperplane with the largest possible margin.

One-Class SVM is similar, but instead of using a hyperplane to separate two classes of instances, it uses a hypersphere to encompass all of the instances. Now think of the "margin" as referring to the outside of the hypersphere -- so by "the largest possible margin", we mean "the smallest possible hypersphere".

That's about it. Note the following facts, true of SVM, still apply to One-Class SVM:

If we insist that there are no margin violations, by seeking the smallest hypersphere, the margin will end up touching a small number of instances. These are the "support vectors", and they fully determine the model. As long as they are within the hypersphere, all of the other instances can be changed without affecting the model.

We can allow for some margin violations if we don't want the model to be too sensitive to noise.

We can do this in the original space, or in an enlarged feature space (implicitly, using the kernel trick), which can result in a boundary with a complex shape in the original space.

Note: this is my account of the model as described here. I believe this is the version of One-Class SVM proposed by Tax and Duin. There are other approaches, such as that of Schölkopf et al, which is similar, but instead of using a small hypersphere, it uses a hyperplane which is far from the origin; this is the version implemented by LIBSVM and thus scikit-learn.