### Project - Online Payments Fraud Detection 

In [None]:
- Online payment systems has helped a lot in the ease of payments. But, at the same time it increased in payment frauds. 
  Online payment frauds can happen with anyone using any payment system, especially while making payments using a credit card. 
  That is why detecting online payment fraud is very important for credit card companies to ensure that the customers are not getting 
  charged for the products and services they never had. 
  In this, We will do the task of online payments fraud detection with machine learning using Python.

In [1]:
import pandas as pd
import numpy as np

In [3]:
dtst = pd.read_csv("credit card.csv")
dtst.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


In [None]:
- To identify online payment fraud with machine learning, we need to train a machine learning model for classifying fraudulent and 
  non-fraudulent payments.
  Below are all the columns from the dataset we are using :
1. step           : represents a unit of time where 1 step equals 1 hour
2. type           : type of online transaction
3. amount         : the amount of the transaction
4. nameOrig       : customer starting the transaction
5. oldbalanceOrg  : balance before the transaction
6. newbalanceOrig : balance after the transaction
7. nameDest       : recipient of the transaction
8. oldbalanceDest : initial balance of recipient before the transaction
9. newbalanceDest : the new balance of recipient after the transaction
10.isFraud        : fraud transaction

In [4]:
dtst.shape

(6362620, 11)

In [5]:
# Check for any null values present in tne dataset
dtst.isnull().sum()

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    0
isFraud           0
isFlaggedFraud    0
dtype: int64

- So this dataset does not have any null values. 
- Lets check the type of transaction mentioned in the dataset :

In [6]:
dtst.type.value_counts()  # Gives Unique transaction types

type
CASH_OUT    2237500
PAYMENT     2151495
CASH_IN     1399284
TRANSFER     532909
DEBIT         41432
Name: count, dtype: int64

In [7]:
import matplotlib.pyplot as plt

In [8]:
type_counts = dtst["type"].value_counts()    # Gives Unique transaction types
transactions = type.index             # returns the unique values (categories) as an Index object.
quantity = type.values                # returns the count of each category as a NumPy array.

plot_df = pd.DataFrame({"Type": type_counts.index, "quantity": type_counts.values})

import plotly.express as px
figure = px.pie(plot_df, 
             values=quantity, 
             names=transactions,hole = 0.5, 
             title="Distribution of Transaction Type",
             width=700, height=400)
figure.show()

AttributeError: type object 'type' has no attribute 'index'

In [None]:
- Now let’s have a look at the correlation between the features of the data with the isFraud column :

In [None]:
dtst.dtypes

In [54]:
numeric_data = dtst.select_dtypes(include=[float, int])     # Select only numeric columns.

# Checking correlation 
correlation = numeric_data.corr()         # Compute the correlation matrix
print(correlation["isFraud"].sort_values(ascending=False))     # Sort by correlation with 'isFraud'.

# This helps identify which features are most strongly correlated with fraudulent transactions(is_fraud column).

isFraud           1.000000
amount            0.076688
isFlaggedFraud    0.044109
step              0.031578
oldbalanceOrg     0.010154
newbalanceDest    0.000535
oldbalanceDest   -0.005885
newbalanceOrig   -0.008148
Name: isFraud, dtype: float64


In [None]:
Now let’s transform the categorical features into numerical. Here we will also transform the values of the isFraud column 
into No Fraud and Fraud labels to have a better understanding of the output:

In [9]:
dtst["type"] = dtst["type"].map({"CASH_OUT":1,"PAYMENT":2, 
                                 "CASH_IN":3,"TRANSFER":4,
                                 "DEBIT":5})
dtst["isFraud"] = dtst["isFraud"].map({0:"No Fraud",1:"Fraud"})
print(dtst.head())

   step  type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1     2   9839.64  C1231006815       170136.0       160296.36   
1     1     2   1864.28  C1666544295        21249.0        19384.72   
2     1     4    181.00  C1305486145          181.0            0.00   
3     1     1    181.00   C840083671          181.0            0.00   
4     1     2  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest   isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0  No Fraud               0  
1  M2044282225             0.0             0.0  No Fraud               0  
2   C553264065             0.0             0.0     Fraud               0  
3    C38997010         21182.0             0.0     Fraud               0  
4  M1230701703             0.0             0.0  No Fraud               0  


### Online Payments Fraud Detection Model :

In [None]:
Now let’s train a classification model to classify fraud and non-fraud transactions. 
Before training the model, I will split the data into training and test sets:

In [13]:
import sklearn
from sklearn.model_selection import train_test_split

In [14]:
# splitting the data
# Define features (X) and target variable (y)

x = np.array(dtst[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])   # Features
y = np.array(dtst[["isFraud"]])   # Target variable (converted to a 1D array)

#### Now let’s train the online payments fraud detection model :

In [15]:
from sklearn.tree import DecisionTreeClassifier

In [16]:
# Split the dataset into training and testing sets

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)

In [20]:
# Print the shape of the split datasets

print(xtrain.shape, xtest.shape, ytrain.shape, ytest.shape)

(5090096, 4) (1272524, 4) (5090096, 1) (1272524, 1)


In [21]:
print(x)
print(x.dtype)  # Check the data type of the array

[[2.00000000e+00 9.83964000e+03 1.70136000e+05 1.60296360e+05]
 [2.00000000e+00 1.86428000e+03 2.12490000e+04 1.93847200e+04]
 [4.00000000e+00 1.81000000e+02 1.81000000e+02 0.00000000e+00]
 ...
 [1.00000000e+00 6.31140928e+06 6.31140928e+06 0.00000000e+00]
 [4.00000000e+00 8.50002520e+05 8.50002520e+05 0.00000000e+00]
 [1.00000000e+00 8.50002520e+05 8.50002520e+05 0.00000000e+00]]
float64


In [22]:
for col in range(x.shape[1]):    # x.shape[1] gives the number of columns in x (features).
    print(f"Column {col}:")      # Prints the column index
    print(x[:, col])         # prints all values in that column
    print(f"Data type: {x[:, col].dtype}")      # x[:, col].dtype retrieves the data type of that column (e.g., int, float)

Column 0:
[2. 2. 4. ... 1. 4. 1.]
Data type: float64
Column 1:
[9.83964000e+03 1.86428000e+03 1.81000000e+02 ... 6.31140928e+06
 8.50002520e+05 8.50002520e+05]
Data type: float64
Column 2:
[1.70136000e+05 2.12490000e+04 1.81000000e+02 ... 6.31140928e+06
 8.50002520e+05 8.50002520e+05]
Data type: float64
Column 3:
[160296.36  19384.72      0.   ...      0.        0.        0.  ]
Data type: float64


In [23]:
# training a machine learning model

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
model = DecisionTreeClassifier()    # creates a decision tree model.
model.fit(xtrain, ytrain)           # trains the model using the training data.
print(model.score(xtest, ytest))

# .score(xtest, ytest) calculates the accuracy of the model: Accuracy = Correct Predictions/Total Predictions

0.9997060959164621


#### Now let’s classify whether a transaction is a fraud or not by feeding about a transaction into the model :

In [24]:
# prediction
#features = [type, amount, oldbalanceOrg, newbalanceOrig]
features = np.array([[4, 9000.60, 9000.60, 0.0]])
print(model.predict(features))

['Fraud']
