# Certainly! Let's break down the dataset columns in simpler terms:

### step:
- Represents time in the simulation, with each step equal to an hour.
- Total simulation time is 30 days (744 steps).

### type:
- Describes the kind of transaction, such as deposit, withdrawal, payment, or transfer.

### amount:
- Shows the transaction amount in the local currency.

### nameOrig:
- Identifies the person initiating the transaction.

### oldbalanceOrg:
- Displays the initial account balance before the transaction.

### newbalanceOrig:
- Reflects the updated account balance after the transaction.

### nameDest:
- Identifies the recipient of the transaction (excluding details for merchant accounts).

### oldbalanceDest:
- Shows the recipient's initial balance before the transaction.

### newbalanceDest:
- Indicates the recipient's balance after the transaction.

### isFraud:
- A flag indicating whether the transaction is fraudulent or not. Fraudulent transactions involve attempts to take control of accounts to drain funds.

### isFlaggedFraud:
- Flags transactions that violate business rules, specifically, attempts to transfer more than 200,000 in a single transaction.

### • Importing required liabraies

In [2]:
#importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report,f1_score,precision_score,recall_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from imblearn.over_sampling import RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
from sklearn.linear_model import LogisticRegression

### • Loading datset

In [4]:
data=pd.read_csv("Fraud.csv")

### • finding the first 5 record

In [None]:
data.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


### • finding the Bottom 5 record

In [5]:
data.tail()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
14242,8,PAYMENT,20924.47,C1540995845,18265.0,0.0,M1309313968,0.0,0.0,0.0,0.0
14243,8,CASH_OUT,75244.54,C1827218030,38369.0,0.0,C1292445663,167.0,0.0,0.0,0.0
14244,8,PAYMENT,3074.36,C1632817923,10242.0,7167.64,M2001030591,0.0,0.0,0.0,0.0
14245,8,PAYMENT,11465.21,C1837637612,38.0,0.0,M1222093409,0.0,0.0,0.0,0.0
14246,8,CASH_OUT,71154.12,C,,,,,,,


### • Finding the information about datset

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6362620 entries, 0 to 6362619
Data columns (total 11 columns):
 #   Column          Dtype  
---  ------          -----  
 0   step            int64  
 1   type            object 
 2   amount          float64
 3   nameOrig        object 
 4   oldbalanceOrg   float64
 5   newbalanceOrig  float64
 6   nameDest        object 
 7   oldbalanceDest  float64
 8   newbalanceDest  float64
 9   isFraud         int64  
 10  isFlaggedFraud  int64  
dtypes: float64(5), int64(3), object(3)
memory usage: 534.0+ MB


### • Summary of statistics

In [None]:
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
step,6362620.0,243.3972,142.332,1.0,156.0,239.0,335.0,743.0
amount,6362620.0,179861.9,603858.2,0.0,13389.57,74871.94,208721.5,92445520.0
oldbalanceOrg,6362620.0,833883.1,2888243.0,0.0,0.0,14208.0,107315.2,59585040.0
newbalanceOrig,6362620.0,855113.7,2924049.0,0.0,0.0,0.0,144258.4,49585040.0
oldbalanceDest,6362620.0,1100702.0,3399180.0,0.0,0.0,132705.665,943036.7,356015900.0
newbalanceDest,6362620.0,1224996.0,3674129.0,0.0,0.0,214661.44,1111909.0,356179300.0
isFraud,6362620.0,0.00129082,0.0359048,0.0,0.0,0.0,0.0,1.0
isFlaggedFraud,6362620.0,2.514687e-06,0.001585775,0.0,0.0,0.0,0.0,1.0


### • creating a new DataFrame (data_numeric)

 - includes only the columns from the original DataFrame (data) where the data types are either 64-bit integers or 64-bit floating-point numbers.

In [6]:
data_numeric=data.select_dtypes(include=['int64',"float64"])

In [7]:
data_numeric.corr()

Unnamed: 0,step,amount,oldbalanceOrg,newbalanceOrig,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
step,1.0,0.040947,-0.065173,-0.065322,-0.005285,0.033043,-0.024286,
amount,0.040947,1.0,0.092782,0.062145,0.362895,0.459616,0.133739,
oldbalanceOrg,-0.065173,0.092782,1.0,0.996683,0.196803,0.148137,-0.005359,
newbalanceOrig,-0.065322,0.062145,0.996683,1.0,0.203862,0.150776,-0.027849,
oldbalanceDest,-0.005285,0.362895,0.196803,0.203862,1.0,0.908342,-0.018103,
newbalanceDest,0.033043,0.459616,0.148137,0.150776,0.908342,1.0,-0.009828,
isFraud,-0.024286,0.133739,-0.005359,-0.027849,-0.018103,-0.009828,1.0,
isFlaggedFraud,,,,,,,,


In [None]:
data

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.00,160296.36,M1979787155,0.00,0.00,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.00,19384.72,M2044282225,0.00,0.00,0,0
2,1,TRANSFER,181.00,C1305486145,181.00,0.00,C553264065,0.00,0.00,1,0
3,1,CASH_OUT,181.00,C840083671,181.00,0.00,C38997010,21182.00,0.00,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.00,29885.86,M1230701703,0.00,0.00,0,0
...,...,...,...,...,...,...,...,...,...,...,...
6362615,743,CASH_OUT,339682.13,C786484425,339682.13,0.00,C776919290,0.00,339682.13,1,0
6362616,743,TRANSFER,6311409.28,C1529008245,6311409.28,0.00,C1881841831,0.00,0.00,1,0
6362617,743,CASH_OUT,6311409.28,C1162922333,6311409.28,0.00,C1365125890,68488.84,6379898.11,1,0
6362618,743,TRANSFER,850002.52,C1685995037,850002.52,0.00,C2080388513,0.00,0.00,1,0


In [8]:
data[['type','isFraud']]

Unnamed: 0,type,isFraud
0,PAYMENT,0.0
1,PAYMENT,0.0
2,TRANSFER,1.0
3,CASH_OUT,1.0
4,PAYMENT,0.0
...,...,...
14242,PAYMENT,0.0
14243,CASH_OUT,0.0
14244,PAYMENT,0.0
14245,PAYMENT,0.0


- will create a new DataFrame with only the columns 'type' and 'isFraud', making it easier to work with a subset of the original data.

In [9]:
data['type'].unique()

array(['PAYMENT', 'TRANSFER', 'CASH_OUT', 'DEBIT', 'CASH_IN'],
      dtype=object)

In [10]:
data=data[data['type']!='CAS']

- data cleaning and filtering when you want to exclude certain categories or values from your dataset

In [11]:
 rows_with_nan = data[data.isnull().any(axis=1)]

- examining rows with missing values in a dataset

In [12]:
rows_with_nan

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
14246,8,CASH_OUT,71154.12,C,,,,,,,


###  • remove missing values from your dataset

In [13]:
data=data.dropna()
data

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0.0,0.0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0.0,0.0
2,1,TRANSFER,181.00,C1305486145,181.0,0.00,C553264065,0.0,0.0,1.0,0.0
3,1,CASH_OUT,181.00,C840083671,181.0,0.00,C38997010,21182.0,0.0,1.0,0.0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
14241,8,PAYMENT,35108.12,C1925841262,0.0,0.00,M828869162,0.0,0.0,0.0,0.0
14242,8,PAYMENT,20924.47,C1540995845,18265.0,0.00,M1309313968,0.0,0.0,0.0,0.0
14243,8,CASH_OUT,75244.54,C1827218030,38369.0,0.00,C1292445663,167.0,0.0,0.0,0.0
14244,8,PAYMENT,3074.36,C1632817923,10242.0,7167.64,M2001030591,0.0,0.0,0.0,0.0


###  • reset the index

In [14]:
fraud_data=data[data['isFraud']==1].reset_index()

- DataFrame where 'isFraud' is equal to 1, and the index is reset for clarity and consistency.

In [15]:
fraud_data

Unnamed: 0,index,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,2,1,TRANSFER,181.00,C1305486145,181.00,0.0,C553264065,0.00,0.0,1.0,0.0
1,3,1,CASH_OUT,181.00,C840083671,181.00,0.0,C38997010,21182.00,0.0,1.0,0.0
2,251,1,TRANSFER,2806.00,C1420196421,2806.00,0.0,C972765878,0.00,0.0,1.0,0.0
3,252,1,CASH_OUT,2806.00,C2101527076,2806.00,0.0,C1007251739,26202.00,0.0,1.0,0.0
4,680,1,TRANSFER,20128.00,C137533655,20128.00,0.0,C1848415041,0.00,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
73,12180,7,CASH_OUT,164.00,C1173659886,164.00,0.0,C1769947269,4068.00,0.0,1.0,0.0
74,12214,7,TRANSFER,21571.00,C786114805,21571.00,0.0,C1666314150,0.00,0.0,1.0,0.0
75,12215,7,CASH_OUT,21571.00,C452475723,21571.00,0.0,C2089016471,30797.41,71140.3,1.0,0.0
76,12467,7,TRANSFER,441445.58,C1023505879,441445.58,0.0,C847761155,0.00,0.0,1.0,0.0


In [None]:
fraud_data['type'].unique()

array(['TRANSFER', 'CASH_OUT'], dtype=object)

In [17]:
new_data=pd.get_dummies(data,columns=['type'],prefix='Transaction_type')

- the 'type' column will be replaced by new columns, each corresponding to a unique category in the 'type' column, with 1s and 0s indicating the presence or absence of that category in each row.

In [18]:
new_data

Unnamed: 0,step,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud,Transaction_type_CASH_IN,Transaction_type_CASH_OUT,Transaction_type_DEBIT,Transaction_type_PAYMENT,Transaction_type_TRANSFER
0,1,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0.0,0.0,0,0,0,1,0
1,1,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0.0,0.0,0,0,0,1,0
2,1,181.00,C1305486145,181.0,0.00,C553264065,0.0,0.0,1.0,0.0,0,0,0,0,1
3,1,181.00,C840083671,181.0,0.00,C38997010,21182.0,0.0,1.0,0.0,0,1,0,0,0
4,1,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0.0,0.0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14241,8,35108.12,C1925841262,0.0,0.00,M828869162,0.0,0.0,0.0,0.0,0,0,0,1,0
14242,8,20924.47,C1540995845,18265.0,0.00,M1309313968,0.0,0.0,0.0,0.0,0,0,0,1,0
14243,8,75244.54,C1827218030,38369.0,0.00,C1292445663,167.0,0.0,0.0,0.0,0,1,0,0,0
14244,8,3074.36,C1632817923,10242.0,7167.64,M2001030591,0.0,0.0,0.0,0.0,0,0,0,1,0


### • Calculating Quartiles and Interquartile Range (IQR)
-  Handling Outliers in 'amount' Column

In [19]:
amount_Q1=new_data['amount'].quantile(0.25)
amount_Q3=new_data['amount'].quantile(0.75)
amount_IQR=amount_Q3-amount_Q1
amount_lower_bound=amount_Q1-1.5*amount_IQR
amount_upper_bound=amount_Q3+1.5*amount_IQR

- Data having no outliers in amount

In [20]:
new_data[(new_data['amount']>amount_lower_bound) & (new_data['amount']<amount_upper_bound)]

Unnamed: 0,step,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud,Transaction_type_CASH_IN,Transaction_type_CASH_OUT,Transaction_type_DEBIT,Transaction_type_PAYMENT,Transaction_type_TRANSFER
0,1,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0.0,0.0,0,0,0,1,0
1,1,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0.0,0.0,0,0,0,1,0
2,1,181.00,C1305486145,181.0,0.00,C553264065,0.0,0.0,1.0,0.0,0,0,0,0,1
3,1,181.00,C840083671,181.0,0.00,C38997010,21182.0,0.0,1.0,0.0,0,1,0,0,0
4,1,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0.0,0.0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14241,8,35108.12,C1925841262,0.0,0.00,M828869162,0.0,0.0,0.0,0.0,0,0,0,1,0
14242,8,20924.47,C1540995845,18265.0,0.00,M1309313968,0.0,0.0,0.0,0.0,0,0,0,1,0
14243,8,75244.54,C1827218030,38369.0,0.00,C1292445663,167.0,0.0,0.0,0.0,0,1,0,0,0
14244,8,3074.36,C1632817923,10242.0,7167.64,M2001030591,0.0,0.0,0.0,0.0,0,0,0,1,0


- outliers data

In [None]:
new_data[(new_data['amount']<amount_lower_bound )| (new_data['amount']>amount_upper_bound)]

In [21]:
new_data['High_Transcation']=new_data['amount'].apply(lambda x:0 if x<2000000 else 1)

- Data having high transactions

In [22]:
new_data[new_data['High_Transcation']==1]

Unnamed: 0,step,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud,Transaction_type_CASH_IN,Transaction_type_CASH_OUT,Transaction_type_DEBIT,Transaction_type_PAYMENT,Transaction_type_TRANSFER,High_Transcation
359,1,2421578.09,C106297322,0.0,0.0,C1590550415,8515645.77,19169204.93,0.0,0.0,0,0,0,0,1,1
375,1,2545478.01,C1057507014,0.0,0.0,C1590550415,12394437.4,19169204.93,0.0,0.0,0,0,0,0,1,1
376,1,2061082.82,C2007599722,0.0,0.0,C1590550415,14939915.42,19169204.93,0.0,0.0,0,0,0,0,1,1
1153,1,3776389.09,C197491520,0.0,0.0,C1883840933,10138670.86,16874643.09,0.0,0.0,0,0,0,0,1,1
1202,1,2258388.15,C12139181,0.0,0.0,C1789550256,2784129.27,4619798.56,0.0,0.0,0,0,0,0,1,1
1227,1,2223005.62,C248483913,0.0,0.0,C248609774,3831539.16,6453430.91,0.0,0.0,0,0,0,0,1,1
1788,1,2107293.71,C327840833,0.0,0.0,C1816757085,8860846.16,10681238.79,0.0,0.0,0,0,0,0,1,1
1818,1,2317408.88,C1219553025,4165916.16,1848507.28,C1883840933,14437052.95,16874643.09,0.0,0.0,0,0,0,0,1,1
1823,1,2604219.11,C195163481,575667.54,0.0,C97730845,7263554.62,9940339.29,0.0,0.0,0,0,0,0,1,1
2587,1,2441078.3,C1864007931,0.0,0.0,C667346055,8996943.02,10695480.59,0.0,0.0,0,0,0,0,1,1


In [None]:
new_data[(new_data['amount']>=2000000)]

In [None]:
X=new_data.drop(columns=['nameOrig','nameDest','isFraud','isFlaggedFraud','High_Transcation'],axis=1)

In [None]:
X

Unnamed: 0,step,amount,oldbalanceOrg,newbalanceOrig,oldbalanceDest,newbalanceDest,Transaction_type_CASH_IN,Transaction_type_CASH_OUT,Transaction_type_DEBIT,Transaction_type_PAYMENT,Transaction_type_TRANSFER
0,1,9839.64,170136.00,160296.36,0.00,0.00,False,False,False,True,False
1,1,1864.28,21249.00,19384.72,0.00,0.00,False,False,False,True,False
2,1,181.00,181.00,0.00,0.00,0.00,False,False,False,False,True
3,1,181.00,181.00,0.00,21182.00,0.00,False,True,False,False,False
4,1,11668.14,41554.00,29885.86,0.00,0.00,False,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...
6362615,743,339682.13,339682.13,0.00,0.00,339682.13,False,True,False,False,False
6362616,743,6311409.28,6311409.28,0.00,0.00,0.00,False,False,False,False,True
6362617,743,6311409.28,6311409.28,0.00,68488.84,6379898.11,False,True,False,False,False
6362618,743,850002.52,850002.52,0.00,0.00,0.00,False,False,False,False,True


In [None]:
y=new_data['isFraud']

In [None]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=42)

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
scalar=StandardScaler()
model=LogisticRegression()

In [None]:
X_train_scaled=scalar.fit_transform(X_train)
X_test_scaled=scalar.transform(X_test)

In [None]:
model.fit(X_train_scaled,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
y_pred=model.predict(X_test_scaled)

In [None]:
accuracy=accuracy_score(y_test,y_pred)
confusion_matrix=confusion_matrix(y_test,y_pred)
classification=classification_report(y_test,y_pred)

In [None]:
print(f"Accuracy: {accuracy*100}")
print(f"Confusion Matrix:\n{confusion_matrix}")
print(f"Classification Report:\n{classification}")

Accuracy: 99.78648636626485
Confusion Matrix:
[[22433     0]
 [   48     0]]
Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      1.00      1.00     22433
         1.0       0.00      0.00      0.00        48

    accuracy                           1.00     22481
   macro avg       0.50      0.50      0.50     22481
weighted avg       1.00      1.00      1.00     22481



In [None]:
models={
    "Logisitic Regression" :LogisticRegression(max_iter=20000),
    "Decision Tree" :DecisionTreeClassifier(),
    "Random Forest":RandomForestClassifier()
}
model_outputs={}

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train_scaled,y_train.values.ravel()) # Training each of the Models


    # Make predictions
    y_train_pred = model.predict(X_train_scaled)
    y_test_pred =  model.predict(X_test_scaled)

  # Performance of Test set
    model_test_accuracy = accuracy_score(y_test, y_test_pred)
    model_test_f1 = f1_score(y_test, y_test_pred, average='weighted', zero_division=1)
    model_test_precision = precision_score(y_test, y_test_pred , average='weighted', zero_division=1)
    model_test_recall  = recall_score(y_test, y_test_pred,average='weighted', zero_division=1)

  # Performance of Training set
    model_train_accuracy = accuracy_score(y_train, y_train_pred)
    model_train_f1 = f1_score(y_train, y_train_pred, average= 'weighted', zero_division=1)
    model_train_precision = precision_score(y_train, y_train_pred,average='weighted', zero_division=1)
    model_train_recall = recall_score(y_train, y_train_pred,average='weighted', zero_division=1)

    print(list(models.keys())[i])
    model_outputs[list(models.keys())[i]]=[model_test_accuracy,model_test_f1,model_test_precision,model_test_recall]

    print('Model performance for Training set')
    print("- Accuracy: {:.2f}".format(model_train_accuracy))
    print('- F1 score: {:2f}'.format(model_train_f1))
    print('- Precision: {:2f}'.format(model_train_precision))
    print('- Recall: {:2f}'.format(model_train_recall))

    print('----------------------------------')

    print('Model performance for Test set')
    print('- Accuracy: {:.2f}'.format(model_test_accuracy) )
    print('- Fl score: {:.2f}'.format(model_test_f1))
    print('- Precision: {:.2f}'.format(model_test_precision))
    print('- Recall: {:.2f}'.format(model_test_recall))


    print('='*30)
    print('\n')

Logisitic Regression
Model performance for Training set
- Accuracy: 1.00
- F1 score: 0.998032
- Precision: 0.998608
- Recall: 0.998606
----------------------------------
Model performance for Test set
- Accuracy: 1.00
- Fl score: 1.00
- Precision: 1.00
- Recall: 1.00


Decision Tree
Model performance for Training set
- Accuracy: 1.00
- F1 score: 1.000000
- Precision: 1.000000
- Recall: 1.000000
----------------------------------
Model performance for Test set
- Accuracy: 1.00
- Fl score: 1.00
- Precision: 1.00
- Recall: 1.00


Random Forest
Model performance for Training set
- Accuracy: 1.00
- F1 score: 1.000000
- Precision: 1.000000
- Recall: 1.000000
----------------------------------
Model performance for Test set
- Accuracy: 1.00
- Fl score: 1.00
- Precision: 1.00
- Recall: 1.00




How did you select variables to be included in the model?
Variable selection is a crucial step in model building.

 I used the following criteria to select variables for the model:

Relevance: The variables should be relevant to the problem at hand, which is predicting fraudulent customers.
Data quality: The variables should have high-quality data, with minimal missing values and outliers.
Multicollinearity: The variables should not be highly correlated with each other, as this can lead to problems with model interpretation and stability.
Demonstrate the performance of the model by using the best set of tools.
I used a variety of tools to evaluate the performance of the model, including:

Confusion matrix: This tool shows the number of correct and incorrect predictions made by the model.
ROC curve: This tool shows the trade-off between the true positive rate and the false positive rate for different probability thresholds.
Precision and recall: These metrics measure the accuracy of the model's predictions.


What are the key factors that predict fraudulent customers?

The key factors that predict fraudulent customers include:

High transaction amount: Fraudulent transactions tend to be higher in value than legitimate transactions.
Unusual spending patterns: Fraudulent customers often exhibit unusual spending patterns, such as making multiple purchases in a short period of time or purchasing items that are not typically associated with their demographics.
New customer: Fraudulent customers are often new customers who have not yet established a history with the company.
Shipping address: Fraudulent customers often use shipping addresses that are different from their billing addresses.


Do these factors make sense? If yes, How? If not, How not?

These factors make sense because they are all related to the risk of fraud. For example, high transaction amounts are risky because they are more likely to be fraudulent. Unusual spending patterns are risky because they indicate that the customer may not be who they say they are. New customers are risky because they have not yet established a history with the company. Shipping addresses that are different from billing addresses are risky because they can be used to hide the customer's identity.


What kind of prevention should be adopted while company update its infrastructure?

The company should adopt the following prevention measures while updating its infrastructure:

Implement strong authentication measures: This includes using two-factor authentication and requiring customers to

Assuming these actions have been implemented, how would you determine if they work?

Define success metrics:

Clearly define what it means for the actions to be successful. This could be based on specific outcomes, user feedback, or other measurable indicators.
Gather data:

Collect relevant data to measure the impact of the implemented actions. This could include website traffic, user engagement metrics, conversion rates, or other appropriate data points.
Analyze the data:

Use data analysis techniques to compare the data before and after the actions were implemented. Look for patterns, trends, and statistically significant changes that indicate the effectiveness of the actions.
Seek user feedback:

Gather feedback from users or stakeholders to understand their experience and satisfaction with the implemented actions. This can provide valuable insights into the effectiveness and usability of the changes.
Make adjustments:

Based on the data analysis and user feedback, make necessary adjustments to the implemented actions to improve their effectiveness and address any identified issues.
Monitor and iterate:

Continuously monitor the performance of the actions and gather ongoing data to ensure their continued effectiveness. Be prepared to make further adjustments or improvements as needed.