## Datafest 2023

#####  Dataset Content

The dataset encompasses a wide range of features, including but not limited to:

1. **Transaction Data**: Transaction ID, User ID, Transaction Amount, Transaction Date and Time, Merchant ID, Payment Method, Country Code, and Transaction Type.

2. **User Data**: User Age, User Gender, User Account Status, User's Transaction History, User's Credit Score, and User's Email Domain.

3. **Merchant Data**: Merchant Category and Merchant's Reputation Score.

4. **Transaction Details**: Transaction Status, Location Distance, Time Taken for Transaction, and Transaction Currency.

5. **Device Information**: Device Type, IP Address, Browser Type, and Operating System.

6. **Additional Context**: Transaction Purpose and User's Device Location.

In [1]:
# Importing the neccessary tools
import pandas as pd
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
# Transform the categorial columns and the string columns into numerical representation
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import warnings 
warnings.simplefilter('ignore')



In [2]:
df= pd.read_csv('Fraud Detection Dataset.csv')
df

Unnamed: 0,Transaction ID,User ID,Transaction Amount,Transaction Date and Time,Merchant ID,Payment Method,Country Code,Transaction Type,Device Type,IP Address,...,User's Transaction History,Merchant's Reputation Score,User's Device Location,Transaction Currency,Transaction Purpose,User's Credit Score,User's Email Domain,Merchant's Business Age,Transaction Authentication Method,Fraudulent Flag
0,51595306,9822,163.08,2023-01-02 07:47:54,4044,ACH Transfer,KOR,Charity,GPS Device,42.23.223.120,...,26,2.71,United Kingdom,NOK,Consultation Fee,343,cox.co.uk,3,Bluetooth Authentication,0
1,85052974,4698,430.74,2021-09-12 15:15:41,4576,2Checkout,VNM,Cashback,Medical Device,39.52.212.120,...,60,3.95,Mexico,EGP,Cashback Reward,688,gmail.com,13,NFC Tag,1
2,23954324,8666,415.74,2023-01-12 17:25:58,4629,Google Wallet,MEX,Reward,Vehicle Infotainment System,243.180.236.29,...,81,3.81,Qatar,MXN,Acquisition,371,rocketmail.com,7,Token,1
3,44108303,9012,565.89,2021-02-27 11:31:00,3322,Check,SGP,Purchase,Kiosk,212.186.227.14,...,18,2.67,Spain,CLP,Loan Repayment,687,roadrunner.co.uk,15,Time-Based OTP,1
4,66622683,5185,955.49,2022-09-24 04:06:38,7609,Worldpay,HKG,Acquisition,Smart Mirror,166.113.10.199,...,98,3.19,Israel,RUB,Dividend Reinvestment,605,protonmail.co.uk,17,Password,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5999995,61037029,7480,448.99,2021-10-20 15:56:32,3346,Discover,SGP,Scholarship,Server,255.134.160.201,...,34,2.78,Russia,CHF,Invoice Payment,679,aim.com,14,Retina Scan,0
5999996,56515851,5636,841.39,2021-06-14 02:10:00,8415,Alipay,ZAF,Loan,Digital Camera,48.190.84.14,...,80,2.60,Malaysia,HUF,Membership,706,cox.net,10,Social Media Login,1
5999997,66863972,5554,197.28,2021-11-06 22:33:19,4231,Afterpay,CAN,Service Charge,Barcode Scanner,7.21.196.39,...,12,1.35,Egypt,HKD,Admission,310,live.co.uk,14,Mobile App Notification,0
5999998,13449701,1275,358.33,2022-03-13 15:02:35,9614,JCB,UK,Fine,Robot,211.202.242.100,...,57,1.29,China,AED,Expense Reimbursement,460,rediffmail.com,16,Authentication App,0


In [3]:
df.dtypes

Transaction ID                         int64
User ID                                int64
Transaction Amount                   float64
Transaction Date and Time             object
Merchant ID                            int64
Payment Method                        object
Country Code                          object
Transaction Type                      object
Device Type                           object
IP Address                            object
Browser Type                          object
Operating System                      object
Merchant Category                     object
User Age                               int64
User Occupation                       object
User Income                          float64
User Gender                           object
User Account Status                   object
Transaction Status                    object
Location Distance                    float64
Time Taken for Transaction           float64
Transaction Time of Day               object
User's Tra

In [4]:
df['Transaction Date and Time']=pd.to_datetime(df['Transaction Date and Time'])

In [5]:
df.isnull().sum()

Transaction ID                       0
User ID                              0
Transaction Amount                   0
Transaction Date and Time            0
Merchant ID                          0
Payment Method                       0
Country Code                         0
Transaction Type                     0
Device Type                          0
IP Address                           0
Browser Type                         0
Operating System                     0
Merchant Category                    0
User Age                             0
User Occupation                      0
User Income                          0
User Gender                          0
User Account Status                  0
Transaction Status                   0
Location Distance                    0
Time Taken for Transaction           0
Transaction Time of Day              0
User's Transaction History           0
Merchant's Reputation Score          0
User's Device Location               0
Transaction Currency     

In [6]:
fraud_data=df.copy()

In [7]:
fraud_data.sort_values(by=['Transaction Date and Time'])

Unnamed: 0,Transaction ID,User ID,Transaction Amount,Transaction Date and Time,Merchant ID,Payment Method,Country Code,Transaction Type,Device Type,IP Address,...,User's Transaction History,Merchant's Reputation Score,User's Device Location,Transaction Currency,Transaction Purpose,User's Credit Score,User's Email Domain,Merchant's Business Age,Transaction Authentication Method,Fraudulent Flag
5422292,80535648,5364,668.78,2021-01-01 00:00:34,7640,Apple Pay,CHN,Purchase,Wearable Device,95.105.165.10,...,62,1.08,China,PLN,Expense Reimbursement,478,cox.co.uk,6,Transaction Confirmation Number,1
4245261,70167155,6554,262.40,2021-01-01 00:00:35,5704,E-check,UAE,Interest,IoT Device,148.0.220.63,...,32,1.76,Greece,MYR,Utility Payment,814,rocketmail.co.uk,9,QR Code,1
5799703,83247394,6492,90.94,2021-01-01 00:01:20,6931,Bitcoin,NLD,Payment,GPS Device,182.192.110.127,...,67,1.61,Vietnam,QAR,Invoice Payment,572,yahoo.co.uk,4,Biometric Scan,1
4569860,32723085,3530,709.43,2021-01-01 00:01:44,4980,Cash,ESP,Acquisition,Home Automation Hub,205.54.50.122,...,89,3.09,United Kingdom,CZK,Admission,519,hotmail.com,6,Email Verification,0
4577615,59130569,3756,796.16,2021-01-01 00:01:53,2642,NFC Payment,AUT,Tax,Digital Camera,117.210.185.41,...,82,1.76,Taiwan,ZAR,Donation to Nonprofit,414,cox.net,17,Password,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3275650,40945224,5866,93.80,2023-07-30 23:59:18,2227,Discover,ISR,Invoice,Smart Appliance,239.70.146.80,...,10,1.86,Qatar,KES,Insurance Premium,576,yahoo.com,14,Behavioral Analytics,0
1346978,94104426,7192,102.68,2023-07-30 23:59:31,7291,Diners Club,EGY,Rental,Smart Thermostat,16.155.235.250,...,32,4.90,Switzerland,RUB,Payout,736,live.co.uk,12,SMS Code,1
2305893,30981055,7211,553.03,2023-07-30 23:59:34,5728,Amazon Pay,ISR,Invoice,Smart Doorbell,200.58.155.181,...,85,4.32,Thailand,EUR,Dividend Reinvestment,605,tutanota.co.uk,17,USB Security Key,0
538894,68625426,4051,399.75,2023-07-30 23:59:35,9237,Cash,ISR,Royalty,Virtual Reality Headset,144.208.232.43,...,94,2.44,Mexico,CNY,Invoice Payment,436,yandex.co.uk,16,Mobile App Notification,0


In [8]:
# Feature Engineering the Transaction Date and Time column
fraud_data['Transaction Day']=fraud_data['Transaction Date and Time'].dt.day
fraud_data['Transaction Month']=fraud_data['Transaction Date and Time'].dt.month
fraud_data['Transaction Year']=fraud_data['Transaction Date and Time'].dt.year
fraud_data.drop('Transaction Date and Time',axis=1,inplace=True)

In [9]:
fraud_data.nunique()

Transaction ID                       5805013
User ID                                 9000
Transaction Amount                     99901
Merchant ID                             9000
Payment Method                            40
Country Code                              40
Transaction Type                          38
Device Type                               38
IP Address                           5995699
Browser Type                              39
Operating System                          40
Merchant Category                         40
User Age                                  63
User Occupation                           26
User Income                          4498549
User Gender                                7
User Account Status                       18
Transaction Status                        40
Location Distance                       9901
Time Taken for Transaction              5901
Transaction Time of Day                    3
User's Transaction History               100
Merchant's

In [10]:
for label,content in fraud_data.items():
    if pd.api.types.is_string_dtype(fraud_data[label]):
        print(label)

Payment Method
Country Code
Transaction Type
Device Type
IP Address
Browser Type
Operating System
Merchant Category
User Occupation
User Gender
User Account Status
Transaction Status
Transaction Time of Day
User's Device Location
Transaction Currency
Transaction Purpose
User's Email Domain
Transaction Authentication Method


In [11]:
for labels,contents in fraud_data.items():
    if pd.api.types.is_numeric_dtype(fraud_data[labels]):
        print(labels)

Transaction ID
User ID
Transaction Amount
Merchant ID
User Age
User Income
Location Distance
Time Taken for Transaction
User's Transaction History
Merchant's Reputation Score
User's Credit Score
Merchant's Business Age
Fraudulent Flag
Transaction Day
Transaction Month
Transaction Year


In [12]:
X=fraud_data.drop('Fraudulent Flag',axis=1)
y=fraud_data['Fraudulent Flag']