# Understanding Dataset

### To identify online payment fraud with machine learning, we need to train a machine learning model for classifying fraudulent and non-fraudulent payments. For this, we need a dataset containing information about online payment fraud, so that we can understand what type of transactions lead to fraud.

### Below are all the columns from the dataset we are using here:

#### step: represents a unit of time where 1 step equals 1 hour
#### type: type of online transaction
#### amount: the amount of the transaction
#### nameOrig: customer starting the transaction
#### oldbalanceOrg: balance before the transaction
#### newbalanceOrig: balance after the transaction
#### nameDest: recipient of the transaction
#### oldbalanceDest: initial balance of recipient before the transaction
#### newbalanceDest: the new balance of recipient after the transaction
#### isFraud: fraud transaction

# ---------------------------------------------------------------------------------------

# Importing Liberaries & Dataset

In [1]:
import pandas as pd
import numpy as np
import pickle
import warnings
warnings.filterwarnings('ignore')
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import GradientBoostingRegressor

In [2]:

"""
with open("score.csv","+a") as file:
    writer = csv.writer(file)
    writer.writerow(["Module","Accuracy Score","Precision Score","Recall Score","F1 Score"])
"""

'\nwith open("score.csv","+a") as file:\n    writer = csv.writer(file)\n    writer.writerow(["Module","Accuracy Score","Precision Score","Recall Score","F1 Score"])\n'

In [3]:
data = pd.read_csv(r"C:\Users\ASUS\Desktop\Project\CC\onlinefraud.csv")
print(data.head())

   step      type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1   PAYMENT   9839.64  C1231006815       170136.0       160296.36   
1     1   PAYMENT   1864.28  C1666544295        21249.0        19384.72   
2     1  TRANSFER    181.00  C1305486145          181.0            0.00   
3     1  CASH_OUT    181.00   C840083671          181.0            0.00   
4     1   PAYMENT  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0        0               0  
1  M2044282225             0.0             0.0        0               0  
2   C553264065             0.0             0.0        1               0  
3    C38997010         21182.0             0.0        1               0  
4  M1230701703             0.0             0.0        0               0  


In [4]:
data.shape

(6362620, 11)

In [5]:
data.dtypes

step                int64
type               object
amount            float64
nameOrig           object
oldbalanceOrg     float64
newbalanceOrig    float64
nameDest           object
oldbalanceDest    float64
newbalanceDest    float64
isFraud             int64
isFlaggedFraud      int64
dtype: object

In [6]:
# Checking Null Values

print(data.isnull().sum())

step              0
type              0
amount            0
nameOrig          0
oldbalanceOrg     0
newbalanceOrig    0
nameDest          0
oldbalanceDest    0
newbalanceDest    0
isFraud           0
isFlaggedFraud    0
dtype: int64


In [7]:
print(data.type.value_counts())

type
CASH_OUT    2237500
PAYMENT     2151495
CASH_IN     1399284
TRANSFER     532909
DEBIT         41432
Name: count, dtype: int64


In [8]:
# Now letâ€™s transform the categorical features into numerical.
type_encod = LabelEncoder()
data["type"] = type_encod.fit_transform(data["type"])
print(data.head())

   step  type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1     3   9839.64  C1231006815       170136.0       160296.36   
1     1     3   1864.28  C1666544295        21249.0        19384.72   
2     1     4    181.00  C1305486145          181.0            0.00   
3     1     1    181.00   C840083671          181.0            0.00   
4     1     3  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0        0               0  
1  M2044282225             0.0             0.0        0               0  
2   C553264065             0.0             0.0        1               0  
3    C38997010         21182.0             0.0        1               0  
4  M1230701703             0.0             0.0        0               0  


In [9]:
# splitting the data

X = np.array(data[["type", "amount", "oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data[["isFraud"]])

In [10]:
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

# Gradient Boosting

In [12]:
model = GradientBoostingRegressor()

for fold, (train_index, val_index) in enumerate(skf.split(X, y), 1):
    X_train, X_test = X[train_index], X[val_index]
    y_train, y_test = y[train_index], y[val_index]

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    auc = roc_auc_score(y_test, y_pred)
    print(f"Fold {fold} ROC-AUC Score: {auc:.4f}")
    print("\n")
    


Fold 1 ROC-AUC Score: 0.9919


Fold 2 ROC-AUC Score: 0.9906


Fold 3 ROC-AUC Score: 0.9866


Fold 4 ROC-AUC Score: 0.9897


Fold 5 ROC-AUC Score: 0.9911


Fold 6 ROC-AUC Score: 0.9893


Fold 7 ROC-AUC Score: 0.9903


Fold 8 ROC-AUC Score: 0.9885


Fold 9 ROC-AUC Score: 0.9906


Fold 10 ROC-AUC Score: 0.9893




In [13]:
info = {"model" : model, "type_encod": type_encod}
with open("model.pkl","wb") as file : 
    pickle.dump(info, file)

In [14]:
with open("model.pkl","rb") as file : 
    info1 = pickle.load(file)

loaded_model = info1["model"]
type_encod1 = info1["type_encod"]

In [15]:
pred = loaded_model.predict(X)
pred

array([5.40684573e-05, 1.39831968e-05, 2.42917005e-02, ...,
       1.02834236e+00, 4.76670109e-01, 5.21223858e-01])