**Problem Statement**

The rise in online transactions has led to an increase in credit card fraud making it a significant concern for financial institutions and cardholders Identifying fraudulent transactions in real-time is crucial for minimizing financial losses and maintaining trust. The challenge lies in efficiently detecting fraudulent behavior while minimizing false positives and ensuring that legitimate transactions are not flagged incorrectly.

In [None]:
from google.colab import files
uploaded = files.upload()

# **Importing the Dependencies**

**Importing the libaries**

In [54]:
# Importing the necessary libaries

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

# **Exploratory Data Analysis**

**Reading and understanding the data**

In [55]:
# loading the dataset to a Pandas DataFrame
credit_card_data = pd.read_csv('/content/file.csv')

In [56]:
# print first 5 rows of the dataset
credit_card_data.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,0.090794,-0.5516,-0.617801,-0.99139,-0.311169,1.468177,-0.470401,0.207971,0.025791,0.403993,0.251412,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,-0.166974,1.612727,1.065235,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.069083,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,0.207643,0.624501,0.066084,0.717293,-0.165946,2.345865,-2.890083,1.109969,-0.121359,-2.261857,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,-0.054952,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.059647,-0.684093,1.965775,-1.232622,-0.208038,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.345852,-1.11967,0.175121,-0.451449,-0.237033,-0.038195,0.803487,0.408542,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [57]:
# checking the shape(no of rows and columns) of the dataset
credit_card_data.shape

(284807, 31)

In [58]:
# displaying the last 5 rows of the dataset
credit_card_data.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,4.35617,-1.593105,2.711941,-0.689256,4.626942,-0.924459,1.107641,1.991691,0.510632,-0.68292,1.475829,0.213454,0.111864,1.01448,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.05508,2.03503,-0.738589,0.868229,1.058415,0.02433,0.294869,0.5848,-0.975926,-0.150189,0.915802,1.214756,-0.675143,1.164931,-0.711757,-0.025693,-1.221179,-1.545556,0.059616,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.24964,-0.557828,2.630515,3.03126,-0.296827,0.708417,0.432454,-0.484782,0.411614,0.063119,-0.183699,-0.510602,1.329284,0.140716,0.313502,0.395652,-0.577252,0.001396,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.24044,0.530483,0.70251,0.689799,-0.377961,0.623708,-0.68618,0.679145,0.392087,-0.399126,-1.933849,-0.962886,-1.042082,0.449624,1.962563,-0.608577,0.509928,1.113981,2.897849,0.127434,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.0,0
284806,172792.0,-0.533413,-0.189733,0.703337,-0.506271,-0.012546,-0.649617,1.577006,-0.41465,0.48618,-0.915427,-1.040458,-0.031513,-0.188093,-0.084316,0.041333,-0.30262,-0.660377,0.16743,-0.256117,0.382948,0.261057,0.643078,0.376777,0.008797,-0.473649,-0.818267,-0.002415,0.013649,217.0,0


In [59]:
# dataset information
credit_card_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     28

In [60]:
# another method for checking the number of missing values in each column
credit_card_data.isnull().sum()

Unnamed: 0,0
Time,0
V1,0
V2,0
V3,0
V4,0
V5,0
V6,0
V7,0
V8,0
V9,0


**Checking the distribution of legit transactions and fraudulent transactions**

In [61]:
# distributions of legit transactions (0) and fraudulent transactions (1) in the 'Class' column
credit_card_data['Class'].value_counts()

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,284315
1,492


In highly imbalanced datasets like this one, where legitimate transactions far outnumber fraudulent ones, comparing the mean values of features for both classes (Class = 0 for legitimate, Class = 1 for fraud) helps us identify significant differences. These differences are crucial for understanding patterns in fraudulent transactions, enabling us to focus on key features that help distinguish between fraud and legitimate behavior. This comparison not only aids in feature selection but also improves model performance by overcoming the imbalance and ensuring the model captures important fraud-related characteristics.

In [62]:
# Separating the dataset into legitimate and fraudulent transactions for analysis

legit = credit_card_data[credit_card_data.Class == 0]
fraud = credit_card_data[credit_card_data.Class == 1]


In [63]:
# Printing the number of rows and columns for both legitimate and fraudulent transactions
print(legit.shape)
print(fraud.shape)

(284315, 31)
(492, 31)


In [64]:
# statistical measures of the data

legit.Amount.describe()

Unnamed: 0,Amount
count,284315.0
mean,88.291022
std,250.105092
min,0.0
25%,5.65
50%,22.0
75%,77.05
max,25691.16


In [65]:
# display statistical summaries of the Amount column
fraud.Amount.describe()

Unnamed: 0,Amount
count,492.0
mean,122.211321
std,256.683288
min,0.0
25%,1.0
50%,9.25
75%,105.89
max,2125.87


In [66]:
# Comparing the mean values of all features for both legitimate (Class = 0) and fraudulent (Class = 1) transactions
credit_card_data.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
0,94838.202258,0.008258,-0.006271,0.012171,-0.00786,0.005453,0.002419,0.009637,-0.000987,0.004467,0.009824,-0.006576,0.010832,0.000189,0.012064,0.000161,0.007164,0.011535,0.003887,-0.001178,-0.000644,-0.001235,-2.4e-05,7e-05,0.000182,-7.2e-05,-8.9e-05,-0.000295,-0.000131,88.291022
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,-5.676883,3.800173,-6.259393,-0.109334,-6.971723,-0.092929,-4.139946,-6.665836,-2.246308,0.680659,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


# **Under-Sampling**

Build a sample dataset containing similar distribution of normal transactions and Fraudulent transactions.

Number of Fraudulent transactions --> 492

In [67]:
# Taking a random sample of legitimate transactions to match the number of fraudulent transactions (492)
# This helps in balancing the dataset for training purposes by creating an equal number of legitimate and fraudulent samples.
legit_sample = legit.sample(n = 492)

Concatenating two DataFrames

In [68]:
# Concatenating the sampled legitimate transactions and all fraudulent transactions along the rows (axis = 0).
# This creates a new balanced dataset with an equal number of legitimate and fraudulent transactions for analysis.
new_dataset = pd.concat([legit_sample, fraud], axis = 0)

In [69]:
# Displaying the first few rows of the newly created balanced dataset to verify the structure and content.
new_dataset.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
166829,118332.0,2.078643,0.184265,-1.717599,0.40153,0.475511,-0.841439,0.210846,-0.255092,0.43934,-0.389501,-0.54993,0.333755,0.458698,-0.924179,0.166908,0.313962,0.432074,-0.337455,0.070343,-0.129117,-0.363577,-0.923079,0.328391,0.492517,-0.238034,0.173654,-0.060291,-0.030129,1.98,0
257054,157981.0,2.024362,-0.293731,-2.084112,0.121534,0.670296,-0.227201,0.271029,-0.08441,0.363953,0.289564,0.023846,0.039459,-1.506905,0.854262,-0.847141,-0.143876,-0.441334,-0.011684,0.669984,-0.203029,-0.021219,-0.059848,0.014009,0.054333,0.177284,0.553323,-0.11915,-0.083987,39.95,0
70945,54083.0,1.055734,0.236773,0.56867,1.293892,-0.276693,-0.535278,0.203764,-0.153475,-0.238325,-0.055648,0.355729,1.086714,1.27664,0.128883,1.073869,-0.226302,-0.165655,-0.87886,-0.934893,0.013343,0.043039,0.116361,-0.012126,0.418599,0.44233,-0.423407,0.035906,0.038097,60.0,0
281151,169969.0,-1.572974,0.829427,1.579634,4.158323,-1.114807,3.952307,0.534024,1.156335,-1.794404,1.04501,0.4029,-0.392637,-0.337976,0.363681,1.45286,0.201986,0.469545,0.106602,0.421161,0.210494,0.134758,0.488702,0.002805,-1.656156,-0.179709,0.526028,0.366859,-0.09065,379.29,0
267600,162844.0,1.973858,-0.210757,-2.016852,0.371836,0.535093,-0.948506,0.685969,-0.483203,0.270684,0.052118,-1.283418,0.529665,0.697786,0.282231,-0.459127,-0.517564,-0.263225,-0.796173,0.33197,-0.015969,0.021123,0.147091,-0.106894,-0.550986,0.303074,0.638074,-0.108649,-0.070326,85.8,0


In [70]:
# Displaying the last few rows of the newly created balanced dataset
new_dataset.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.88285,0.697211,-2.064945,-5.587794,2.115795,-5.417424,-1.235123,-6.665177,0.401701,-2.897825,-4.570529,-1.315147,0.391167,1.252967,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.29268,0.147968,390.0,1
280143,169347.0,1.378559,1.289381,-5.004247,1.41185,0.442581,-1.326536,-1.41317,0.248525,-1.127396,-3.232153,2.858466,-3.096915,-0.792532,-5.210141,-0.613803,-2.155297,-3.267116,-0.688505,0.737657,0.226138,0.370612,0.028234,-0.14564,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.2137,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.65225,-3.463891,1.794969,-2.775022,-0.41895,-4.057162,-0.712616,-1.603015,-5.035326,-0.507,0.266272,0.247968,0.751826,0.834108,0.190944,0.03207,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.39973,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,-5.245984,1.93352,-5.030465,-1.127455,-6.416628,0.141237,-2.549498,-4.614717,-1.478138,-0.03548,0.306271,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.2537,245.0,1
281674,170348.0,1.991976,0.158476,-2.583441,0.40867,1.151147,-0.096695,0.22305,-0.068384,0.577829,-0.888722,0.49114,0.728903,0.380428,-1.948883,-0.832498,0.519436,0.903562,1.197315,0.593509,-0.017652,-0.16435,-0.295135,-0.072173,-0.450261,0.313267,-0.289617,0.002988,-0.015309,42.53,1


In [71]:
# Checking the distribution of the 'Class' column in the new balanced dataset.
# This ensures that the dataset now contains an equal number of legitimate (Class = 0) and fraudulent (Class = 1) transactions.
new_dataset['Class'].value_counts()

Unnamed: 0_level_0,count
Class,Unnamed: 1_level_1
0,492
1,492


In [72]:
# Calculating the mean of all features for each class (legitimate and fraudulent) in the new balanced dataset.
# This helps in comparing feature values between legitimate (Class = 0) and fraudulent (Class = 1) transactions to identify any differences.
new_dataset.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
0,98132.705285,0.094448,-0.001166,-0.008342,-0.030896,-0.00835,-0.012984,0.004144,-0.004144,0.025728,0.055398,-0.032863,0.024221,0.00632,0.002075,-0.051124,-0.030744,-0.016372,0.044462,0.006144,-0.025083,-0.010762,0.017642,0.003222,-0.028923,0.020602,-0.001721,-0.048865,0.009382,80.003618
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,-5.676883,3.800173,-6.259393,-0.109334,-6.971723,-0.092929,-4.139946,-6.665836,-2.246308,0.680659,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


# **Separating the features (X) and the target variable (Y) in the new dataset.**

X contains all the columns except 'Class' (the features), while Y contains the 'Class' column (the labels indicating legitimate or fraudulent transactions).

In [73]:
X = new_dataset.drop(columns = 'Class', axis = 1)
Y = new_dataset['Class']

In [74]:
# Displaying the feature set (X) to verify that it contains all the relevant features for model training, excluding the 'Class' column.
print(X)

            Time        V1        V2        V3  ...       V26       V27       V28  Amount
166829  118332.0  2.078643  0.184265 -1.717599  ...  0.173654 -0.060291 -0.030129    1.98
257054  157981.0  2.024362 -0.293731 -2.084112  ...  0.553323 -0.119150 -0.083987   39.95
70945    54083.0  1.055734  0.236773  0.568670  ... -0.423407  0.035906  0.038097   60.00
281151  169969.0 -1.572974  0.829427  1.579634  ...  0.526028  0.366859 -0.090650  379.29
267600  162844.0  1.973858 -0.210757 -2.016852  ...  0.638074 -0.108649 -0.070326   85.80
...          ...       ...       ...       ...  ...       ...       ...       ...     ...
279863  169142.0 -1.927883  1.125653 -4.518331  ...  0.788395  0.292680  0.147968  390.00
280143  169347.0  1.378559  1.289381 -5.004247  ...  0.739467  0.389152  0.186637    0.76
280149  169351.0 -0.676143  1.126366 -2.213700  ...  0.471111  0.385107  0.194361   77.89
281144  169966.0 -3.113832  0.585864 -5.399730  ...  0.606116  0.884876 -0.253700  245.00
281674  17

In [75]:
# Displaying the target variable (Y) to verify the class labels (legitimate or fraudulent).
print(Y)

166829    0
257054    0
70945     0
281151    0
267600    0
         ..
279863    1
280143    1
280149    1
281144    1
281674    1
Name: Class, Length: 984, dtype: int64


#**Splitting the dataset into training and testing sets.**

The training data (X_train, Y_train) will be used to train the model, and the testing data (X_test, Y_test) will be used to evaluate the model's performance.

In [76]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, stratify = Y, random_state = 2)

In [77]:
print(X.shape, X_train.shape, X_test.shape)

(984, 30) (787, 30) (197, 30)


Model Training

Logistic Regression

In [78]:
model = LogisticRegression(max_iter=1000)

In [79]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
model.fit(X_train_scaled, Y_train)


In [85]:
# define the Logistic Regression class
class LogisticRegression():
    def __init__(self, lr, epochs):
        # initialize hyperparameters
        self.lr = lr
        self.epochs = epochs

    def fit(self, X, y):
        # get number of data and features
        self.num_of_data, self.features = X.shape

        # initialize the parameters
        self.W = np.zeros(self.features)
        self.b = 0

        self.X = X
        self.y = y

        for i in range(self.epochs):
            self.update_hyperparam()

    def update_hyperparam(self):
        # calculate y_hat
        y_hat = 1 / (1 + np.exp( - (self.X.dot(self.W) + self.b)))

        # derivative
        dw = (1/self.num_of_data) * np.dot(self.X.T, (y_hat - self.y))
        db = (1/self.num_of_data) * np.sum(y_hat - self.y)

        # update weight and bias
        self.W = self.W - self.lr*dw
        self.b = self.b - self.lr*db

    def predict(self, X):
        y_pred = 1 / (1 + np.exp(- (X.dot(self.W) + self.b)))
        y_pred = np.where(y_pred < 0.5, 0, 1)
        return y_pred

In [86]:
# initailize the standardScaler to standardize the feature set
scaler = StandardScaler()

# fit the scaler to the feature data (X) and transform it to standardized features
scaler.fit(X)
std_features = scaler.transform(X)

# Reassign the standardized features to X and the target variable to y
X = std_features
y = Y

print(X)
print(y)

[[ 0.5991571   0.8001174  -0.44679578 ... -0.11507468 -0.1624335
  -0.4590186 ]
 [ 1.42138398  0.79028558 -0.57805649 ... -0.17098368 -0.28284612
  -0.28319512]
 [-0.73321585  0.61483901 -0.43237676 ... -0.02369856 -0.0098995
  -0.19035181]
 ...
 [ 1.657171    0.30114619 -0.18808889 ...  0.30800088  0.33946627
  -0.10751057]
 [ 1.66992466 -0.14038962 -0.33651414 ...  0.78272071 -0.66227911
   0.66630721]
 [ 1.67784644  0.78441949 -0.45387767 ... -0.05496722 -0.12929998
  -0.2712482 ]]
166829    0
257054    0
70945     0
281151    0
267600    0
         ..
279863    1
280143    1
280149    1
281144    1
281674    1
Name: Class, Length: 984, dtype: int64


#**Model Evaluation & Accuracy Score**

Model evaluation and accuracy scoring are crucial for assessing a model's performance. After training, it's important to check how well the model performs on both training and test data to ensure it generalizes well and isn;t overfitting. This helps determine the model's reliability and effectiveness in making accurate predictions, especially for real-world applications like fraud detection.

In [88]:
# Split the dataset into training and testing sets with 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 2)

#printing the following details :

print("Features sized")
print(X.shape)
print(X_train.shape)
print(X_test.shape)

print("Target sized")
print(y.shape)
print(y_train.shape)
print(y_test.shape)

Features sized
(984, 30)
(787, 30)
(197, 30)
Target sized
(984,)
(787,)
(197,)


In [89]:
# Set the learning rate and the number of epochs for training the logistic regression model
lr = 0.01
epochs = 50000

# Initialize the custom Logistic Regression model with specified learning rate and epochs
classifier = LogisticRegression(lr = lr, epochs = epochs)

# Train the model using the training data
classifier.fit(X_train,y_train)

# make predictions on the training data
train_prediction = classifier.predict(X_train)

# calculate the accuracy of the model on training data
train_accuracy_score = accuracy_score(y_train, train_prediction)*100

# make predictions on the test data
test_prediction = classifier.predict(X_test)

# calculate the accuracy of the model on the test data
test_accuracy_score = accuracy_score(y_test, test_prediction)*100

print("\n========== My Custom Model Summary ==========\n")
# Display the learning rate and number of epochs used for training
print(f"Learnig Rate: {lr}")
print(f"Epochs: {epochs} \n")
# Display the accuracy on the training and test datasets
print(f"Accuracy on train data: {train_accuracy_score}%")
print(f"Accuracy on test data: {test_accuracy_score}%")



Learnig Rate: 0.01
Epochs: 50000 

Accuracy on train data: 94.91740787801778%
Accuracy on test data: 93.90862944162437%


The model evaluation reveals that the custom logistic regression model has achieved an accuracy of 94.92% on the training data and 93.91% on the test data. These results indicate that the model is performing well, with strong predictive capabilities on both the data it was trained on and unseen test data. The close similarity between the training and test accuracies suggests minimal overfitting, meaning the model is generalizing effectively to new data. Overall, the model evaluation shows promising performance, with high accuracy in detecting patterns, which is crucial for tasks such as fraud detection in imbalanced datasets.