<a href="https://colab.research.google.com/github/guilhermelaviola/PaymentFraudDetector/blob/main/FraudDetector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Fraud detection is the process of using tools and procedures to prevent the theft of money, information, and assets. It is a security barrier that protects against various forms of fraud, including minor infractions and felony crimes.

In [22]:
# Importing all the necessary libraries:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

In [23]:
# Importing the dataset:
df = pd.read_csv('payment-fraud.csv')
df.head(10)

Unnamed: 0,accountAgeDays,numItems,localTime,paymentMethod,paymentMethodAgeDays,label
0,29,1,4.745402,2,28.204861,0
1,725,1,4.742303,3,0.0,0
2,845,1,4.921318,1,0.0,0
3,503,1,4.886641,1,0.0,0
4,2000,1,5.040929,1,0.0,0
5,119,1,4.962055,2,0.0,0
6,2000,1,4.921349,2,0.0,0
7,371,1,4.876771,1,0.0,0
8,2000,1,4.748314,1,0.0,0
9,4,1,4.461622,1,0.0,0


In [24]:
# Splitting the data into training and test sets:
X_train, X_test, y_train, y_test = train_test_split(
    df.drop('label', axis = 1), df['label'],
    test_size = 0.33, random_state = 17)

In [25]:
df.head(10)

Unnamed: 0,accountAgeDays,numItems,localTime,paymentMethod,paymentMethodAgeDays,label
0,29,1,4.745402,2,28.204861,0
1,725,1,4.742303,3,0.0,0
2,845,1,4.921318,1,0.0,0
3,503,1,4.886641,1,0.0,0
4,2000,1,5.040929,1,0.0,0
5,119,1,4.962055,2,0.0,0
6,2000,1,4.921349,2,0.0,0
7,371,1,4.876771,1,0.0,0
8,2000,1,4.748314,1,0.0,0
9,4,1,4.461622,1,0.0,0


In [26]:
# This is a problem of binary classification, so we're gonna use a Logistic Regression algorithm
# to train the fraud detection model:
clf = LogisticRegression().fit(X_train, y_train)

In [27]:
# Making predictions on the test:
y_pred = clf.predict(X_test)
print(accuracy_score(y_pred, y_test))

1.0


The model has an accuracy of 100%. It seems that Logistic Regression is a really good thing to use for cases like this!

In [28]:
# Evaluating the performance of the model by using the confusion matrix algorithm
print(confusion_matrix(y_test, y_pred))

[[12753     0]
 [    0   190]]


We can conclude that 190 out of the 12943 transactions are correctly recognized as fraud. The rest are not fraudulent transactions.