**Project Title: Credit Card Fraud Detection**<br>
File No: 02

**Support Vector Machine** using Gaussian kernel radial basis function & Polynomial kernel

This notebook applies Support Vector Machine (SVM) classifiers with polynomial and radial basis function (RBF) kernels to detect fraud. The dataset was balanced using Random Over Sampling, and models were trained on the transformed data. Both SVM kernels were evaluated, focusing on precision, recall, and f1-score metrics.

In [1]:
# import bsic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
# mounting google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#load dataset- creditcard.csv
df = pd.read_csv("/content/drive/MyDrive/Data Science Project/Credit Card Fraud Detection /creditcard.csv")


In [4]:
# view first 5 columns
pd.set_option("display.max_columns",None)   # To display all the columns
df.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [7]:
# Let's separate independent variable (input variable) X and dependent variable (output/ target variable) Y
X = df.drop("Class",axis=1)   #To store all inputs hold in  X variable apart from 'Class'

Y = df["Class"]               #To store only output feature

In [8]:
# Splitting our dataset in tarin & test set of 70% and 30%.
from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.3,random_state=1)


In [10]:
# Importing the required libraries
from imblearn.over_sampling import RandomOverSampler
from sklearn.svm import SVC
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.preprocessing import StandardScaler

In [13]:
# Checking value count
Y_train.value_counts()

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,199007
1,357


In [14]:
# Creating object of RandomOverSampler class
ros = RandomOverSampler()

In [16]:
# Performing over sampling and store the balanced data
X_train_sample,Y_train_sample = ros.fit_resample(X_train,Y_train)
X_test_sample,Y_test_sample = ros.fit_resample(X_test,Y_test)

In [17]:
# Again checking the value counts
pd.Series(Y_train_sample).value_counts()

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,199007
1,199007


In [18]:
pd.Series(Y_test).value_counts()

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,85308
1,135


In [19]:
pd.Series(Y_test_sample).value_counts()

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,85308
1,85308


In [21]:
# Create object of StandardScaler
ss = StandardScaler()

In [22]:
# Scaling the X train & X test
X_train_sample = ss.fit_transform(X_train_sample)
X_test_sample = ss.fit_transform(X_test_sample)

In [26]:
X_train_sample

array([[ 0.87356627,  0.41163965, -0.44411576, ..., -0.08703438,
        -0.04117112, -0.36150262],
       [ 0.26451388,  0.44500494, -0.22654503, ..., -0.16865643,
        -0.00692955, -0.38408455],
       [ 1.51533988,  0.80824859, -0.47831878, ..., -0.14323745,
        -0.19111325, -0.41360506],
       ...,
       [ 1.17226726,  0.09224896,  0.4573147 , ...,  0.38833026,
        -0.182813  , -0.41772615],
       [-0.94758853,  0.06801478, -0.21827471, ...,  0.32817363,
        -0.55093272,  2.95282592],
       [ 1.07818854,  0.03075061, -0.04364377, ...,  1.00595674,
         0.82172281,  2.63907593]])

In [34]:
# Creating X & Y train and test sample.
X_train1 = X_train_sample[:99503]
Y_train1 = Y_train_sample[:99503]
X_test1  = X_test_sample[:42654]
Y_test1  = Y_test_sample[:42654]

#! Expalin this

**Support vector Machine (Kernel: poly)**

In [35]:
# Creating object of Support vector Machine (Poly kernel) algorithm
svc_poly = SVC(random_state=1,kernel="poly")

In [36]:
# Creating a user defined function
def create_model(model):
  model.fit(X_train1,Y_train1)
  Y_pred = model.predict(X_test1)
  print(classification_report(Y_test1,Y_pred))
  print("Confusion Matrix: ")
  print(confusion_matrix(Y_test1,Y_pred))
  return model

In [37]:
# Calling the function
create_model(svc_poly)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     42585
           1       0.80      0.51      0.62        69

    accuracy                           1.00     42654
   macro avg       0.90      0.75      0.81     42654
weighted avg       1.00      1.00      1.00     42654

Confusion Matrix: 
[[42576     9]
 [   34    35]]


**Support vector Machine (Kernel: rbf)**

In [38]:
radial_svc = SVC(random_state=1,kernel="rbf")

# calling the function
create_model(radial_svc)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00     42585
           1       0.85      0.65      0.74        69

    accuracy                           1.00     42654
   macro avg       0.92      0.83      0.87     42654
weighted avg       1.00      1.00      1.00     42654

Confusion Matrix: 
[[42577     8]
 [   24    45]]


###**Support Vector Machine** algorithm is not a suitable algorithm for this dataset.

The results indicated that the SVM algorithm, even with kernel variations, was less suitable for this dataset, as it struggled with accuracy in identifying fraudulent cases. While high accuracy was achieved for non-fraudulent transactions, SVM's limitations on imbalanced datasets suggest alternative algorithms may be more effective for this task.

