<a id='top'></a>
<p style="background-color:#6A5ACD;font-family:Tahoma, Geneva, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;"> 🚗 Fastag Fraud Detection System 🛣️ </p>

### `Problem Statement`

<p style="font-family: 'Segoe UI', Tahoma, Geneva, sans-serif; font-size: 16px;">
This internship project focuses on leveraging machine learning classification techniques to develop an
effective fraud detection system for Fastag transactions. The dataset comprises key features such as
transaction details, vehicle information, geographical location, and transaction amounts. The goal is to
create a robust model that can accurately identify instances of fraudulent activity, ensuring the integrity
and security of Fastag transactions.
</p>

### `Dataset Description`

<h4 style="color:orange">Columns in Dataset:</h4>
<ol style="font-family: 'Segoe UI', Tahoma, Geneva, sans-serif; font-size: 16px;">
    <li><strong>Transaction_ID:</strong> Unique identifier for each transaction.</li>
    <li><strong>Timestamp:</strong> Date and time of the transaction.</li>
    <li><strong>Vehicle_Type:</strong> Type of vehicle involved in the transaction.</li>
    <li><strong>FastagID:</strong> Unique identifier for Fastag.</li>
    <li><strong>TollBoothID:</strong> Identifier for the toll booth.</li>
    <li><strong>Lane_Type:</strong> Type of lane used for the transaction.</li>
    <li><strong>Vehicle_Dimensions:</strong> Dimensions of the vehicle.</li>
    <li><strong>Transaction_Amount:</strong> Amount associated with the transaction.</li>
    <li><strong>Amount_paid:</strong> Amount paid for the transaction.</li>
    <li><strong>Geographical_Location:</strong> Location details of the transaction.</li>
    <li><strong>Vehicle_Speed:</strong> Speed of the vehicle during the transaction.</li>
    <li><strong>Vehicle_Plate_Number:</strong> License plate number of the vehicle.</li>
    <li><strong>Fraud_indicator:</strong> Binary indicator of fraudulent activity (target variable).</li>
</ol>

<!-- .......................................................................................................................... -->

<p style="background-color:#20B2AA;font-family:Arial, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;border:2px solid #FFFFFF;"> 🎯 Project Objectives 📝 </p>

<ul style="font-size: 18px; font-family: 'Segoe UI';">
    <li><strong>Data Exploration:</strong> Explore the dataset to understand the distribution of features and the prevalence of fraud indicators.</li>
    <li><strong>Feature Engineering:</strong> Identify and engineer relevant features that contribute to fraud detection accuracy.</li>
    <li><strong>Model Development:</strong>
        <ul>
            <li>Build a machine learning classification model to predict and detect Fastag transaction fraud.</li>
            <li>Evaluate and fine-tune model performance using appropriate metrics.</li>
        </ul>
    </li>
    <li><strong>Real-time Fraud Detection:</strong> Explore the feasibility of implementing the model for real-time Fastag fraud detection.</li>
    <li><strong>Explanatory Analysis:</strong> Provide insights into the factors contributing to fraudulent transactions.</li>
</ul>

<!-- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -->

<p style="background-color: #FF6347; font-family: Arial, sans-serif; color: #ffffff; font-size: 24px; text-align: center; padding: 10px; border-radius: 10px;">🚧 Challenges 🚧</p>

<ul style="font-size: 18px; font-family: 'Segoe UI';">
    <li>Imbalanced dataset issues due to the likely low occurrence of fraud.</li>
    <li>Feature engineering to capture nuanced patterns indicative of fraud.</li>
</ul>

<!-- ................................................................................................................................................. -->

<p style="background-color:#4682B4;font-family:Arial, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;border:2px solid #FFFFFF;">📊 Evaluation Criteria 📊</p>

<ul style="font-size: 18px; font-family: 'Segoe UI';">
    <li>Model performance assessed using metrics such as precision, recall, F1 score, and accuracy.</li>
</ul>

<!-- ................................................................................................................................................. -->

<p style="background-color:#32CD32;font-family:Arial, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;border:2px solid #FFFFFF;">📁 Deliverables 📁</p>

<ul style="font-size: 18px; font-family: 'Segoe UI';">
    <li>Trained machine learning model for Fastag fraud detection.</li>
    <li>Evaluation metrics and analysis report.</li>
    <li>Documentation on relevant features and their impact on fraud detection.</li>
</ul>

<!-- ................................................................................................................................................. -->

<p style="background-color:#FFD700;font-family:Arial, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;border:2px solid #FFFFFF;">🎉 Expected Outcome 🎉</p>

<ul style="font-size: 18px; font-family: 'Segoe UI';">
    <li>An effective and scalable Fastag fraud detection system capable of minimizing financial losses and ensuring the security of digital toll transactions.</li>
</ul>




# <p style="background-color:#682F2F;font-family:newtimeroman;color:#FFF9ED;font-size:80%;text-align:center;border-radius:10px; border: 2px solid #FFA500; padding: 10px;"><b>1|</b> 📚 IMPORTING LIBRARIES 📚</p>

In [None]:
import numpy as np
import pandas as pd
import sklearn
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report,accuracy_score, confusion_matrix, precision_score, f1_score, recall_score



# <p style="background-color: #20B2AA; font-family: Arial, sans-serif; color: #FFFFFF; font-size: 80%; text-align: center; border-radius: 10px; padding: 15px; border: 4px solid yellow; border-style: dashed;"><b>2|</b> 🔄 LOADING DATASET 🤖 </p>




In [None]:
data = pd.read_csv("/kaggle/input/fastagfruddetection/FastagFraudDetection.csv")

# <div style="color: #FFA500; display: inline-block; border-radius: 15px; background-color: #FFEFD5; font-family: 'Arial', sans-serif; overflow: hidden; border: 5px solid #FFA500;width:60%"><p style="padding: 15px; color: #FFA500; overflow: hidden; font-size: 70%; letter-spacing: 1px; margin: 0; width: 750%;"><b> 2) Checks Data </b></p>
</div>
  <ul style="border: 2px solid #4CAF50; border-radius: 8px; margin-top: 10px; width:57%">
    <li>Check Head</li>
    <li>Check Tail</li>
    <li>Check data type</li>
    <li>Check info</li>
    <li>Check Describe</li>
    <li>Check Columns</li>
    <li>Check Size</li>
    <li>Check Shape</li>
  </ul>

In [None]:
styled_df = data.head()
styled_df = styled_df.style.set_table_styles([
    {"selector": "th", "props": [("color", 'black'), ("background-color", "#FF00CC")]}
])
styled_df

Unnamed: 0,Transaction_ID,Timestamp,Vehicle_Type,FastagID,TollBoothID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Geographical_Location,Vehicle_Speed,Vehicle_Plate_Number,Fraud_indicator
0,1,1/6/2023 11:20,Bus,FTG-001-ABC-121,A-101,Express,Large,350,120,"13.059816123454882, 77.77068662374292",65,KA11AB1234,Fraud
1,2,1/7/2023 14:55,Car,FTG-002-XYZ-451,B-102,Regular,Small,120,100,"13.059816123454882, 77.77068662374292",78,KA66CD5678,Fraud
2,3,1/8/2023 18:25,Motorcycle,,D-104,Regular,Small,0,0,"13.059816123454882, 77.77068662374292",53,KA88EF9012,Not Fraud
3,4,1/9/2023 2:05,Truck,FTG-044-LMN-322,C-103,Regular,Large,350,120,"13.059816123454882, 77.77068662374292",92,KA11GH3456,Fraud
4,5,1/10/2023 6:35,Van,FTG-505-DEF-652,B-102,Express,Medium,140,100,"13.059816123454882, 77.77068662374292",60,KA44IJ6789,Fraud


In [None]:
styled_df = data.tail()
styled_df = styled_df.style.set_table_styles([
    {"selector": "th", "props": [("color", 'black'), ("background-color", "#FF00CC")]}
])
styled_df

Unnamed: 0,Transaction_ID,Timestamp,Vehicle_Type,FastagID,TollBoothID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Geographical_Location,Vehicle_Speed,Vehicle_Plate_Number,Fraud_indicator
4995,4996,1/1/2023 22:18,Truck,FTG-445-EDC-765,C-103,Regular,Large,330,330,"13.21331620748757, 77.55413526894684",81,KA74ST0123,Not Fraud
4996,4997,1/17/2023 13:43,Van,FTG-446-LMK-432,B-102,Express,Medium,125,125,"13.21331620748757, 77.55413526894684",64,KA38UV3456,Not Fraud
4997,4998,2/5/2023 5:08,Sedan,FTG-447-PLN-109,A-101,Regular,Medium,115,115,"13.21331620748757, 77.55413526894684",93,KA33WX6789,Not Fraud
4998,4999,2/20/2023 20:34,SUV,FTG-458-VFR-876,B-102,Express,Large,145,145,"13.21331620748757, 77.55413526894684",57,KA35YZ0123,Not Fraud
4999,5000,3/10/2023 0:59,Bus,FTG-459-WSX-543,C-103,Regular,Large,330,125,"13.21331620748757, 77.55413526894684",86,KA37AB3456,Fraud


In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 13 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Transaction_ID         5000 non-null   int64 
 1   Timestamp              5000 non-null   object
 2   Vehicle_Type           5000 non-null   object
 3   FastagID               4451 non-null   object
 4   TollBoothID            5000 non-null   object
 5   Lane_Type              5000 non-null   object
 6   Vehicle_Dimensions     5000 non-null   object
 7   Transaction_Amount     5000 non-null   int64 
 8   Amount_paid            5000 non-null   int64 
 9   Geographical_Location  5000 non-null   object
 10  Vehicle_Speed          5000 non-null   int64 
 11  Vehicle_Plate_Number   5000 non-null   object
 12  Fraud_indicator        5000 non-null   object
dtypes: int64(4), object(9)
memory usage: 507.9+ KB


In [None]:
# df.describe()
styled_df = data.describe()
styled_df = styled_df.style.set_table_styles([
    {"selector": "th", "props": [("color", 'black'), ("background-color", "#FF00CC")]}
])
styled_df

Unnamed: 0,Transaction_ID,Transaction_Amount,Amount_paid,Vehicle_Speed
count,5000.0,5000.0,5000.0,5000.0
mean,2500.5,161.062,141.261,67.8512
std,1443.520003,112.44995,106.480996,16.597547
min,1.0,0.0,0.0,10.0
25%,1250.75,100.0,90.0,54.0
50%,2500.5,130.0,120.0,67.0
75%,3750.25,290.0,160.0,82.0
max,5000.0,350.0,350.0,118.0


In [None]:
data.shape

(5000, 13)

In [None]:
data.size

65000

In [None]:
data.columns

Index(['Transaction_ID', 'Timestamp', 'Vehicle_Type', 'FastagID',
       'TollBoothID', 'Lane_Type', 'Vehicle_Dimensions', 'Transaction_Amount',
       'Amount_paid', 'Geographical_Location', 'Vehicle_Speed',
       'Vehicle_Plate_Number', 'Fraud_indicator'],
      dtype='object')

In [None]:
data.isnull().sum()

Transaction_ID             0
Timestamp                  0
Vehicle_Type               0
FastagID                 549
TollBoothID                0
Lane_Type                  0
Vehicle_Dimensions         0
Transaction_Amount         0
Amount_paid                0
Geographical_Location      0
Vehicle_Speed              0
Vehicle_Plate_Number       0
Fraud_indicator            0
dtype: int64

<p style = "color: #19bd7c;
            font: bold 18px arial;
            padding: 15px;
            background-color: #111;
            border: 3px solid lightgreen;
            border-radius: 8px">
    ♣ Feature Engineering 👨‍
</p>

<div style="border: 2px solid #1E90FF; border-radius: 10px; margin-top: 10px; width:50%">
    <ol style="list-style-type: none; padding: 10px;">
        <li>Identify and engineer relevant features that contribute to fraud detection accuracy.</li>
    </ol>
</div>


> <span style='font-size:15px; font-family:Verdana;color: #FF00CC;'><b>OneHot Encoding</b></span>



In [None]:
data['Vehicle_Type'].unique()

array(['Bus ', 'Car', 'Motorcycle', 'Truck', 'Van', 'Sedan', 'SUV'],
      dtype=object)

In [None]:
Lane_order=['Express', 'Regular']
Vehicle_Dimensions_order=['Large', 'Small', 'Medium']
Fraud_indicator_order=['Not Fraud','Fraud']

In [None]:
ohe = OneHotEncoder()
encode0 = ohe.fit_transform(data[['Vehicle_Type']]).toarray()

In [None]:
feature_labels = ohe.categories_
np.array(feature_labels).ravel()

array(['Bus ', 'Car', 'Motorcycle', 'SUV', 'Sedan', 'Truck', 'Van'],
      dtype=object)

In [None]:
feature_labels = np.array(feature_labels).ravel()
print(feature_labels)

['Bus ' 'Car' 'Motorcycle' 'SUV' 'Sedan' 'Truck' 'Van']


In [None]:
features = pd.DataFrame(encode0, columns = feature_labels)

In [None]:
df_new = pd.concat([data, features], axis=1)

In [None]:
new_dataset=df_new.drop(['Timestamp','FastagID','Vehicle_Type','TollBoothID','Geographical_Location','Vehicle_Plate_Number'], axis=1)

In [None]:
styled_df = new_dataset.head()
styled_df = styled_df.style.set_table_styles([
    {"selector": "th", "props": [("color", 'black'), ("background-color", "#FF00CC")]}
])
styled_df

Unnamed: 0,Transaction_ID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Vehicle_Speed,Fraud_indicator,Bus,Car,Motorcycle,SUV,Sedan,Truck,Van
0,1,Express,Large,350,120,65,Fraud,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Regular,Small,120,100,78,Fraud,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,3,Regular,Small,0,0,53,Not Fraud,0.0,0.0,1.0,0.0,0.0,0.0,0.0
3,4,Regular,Large,350,120,92,Fraud,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,5,Express,Medium,140,100,60,Fraud,0.0,0.0,0.0,0.0,0.0,0.0,1.0


> <span style='font-size:15px; font-family:Verdana;color: #FF00CC;'><b>Ordinal Encoding</b></span>



In [None]:
encode1 = OrdinalEncoder(categories=[Lane_order])
encode2 = OrdinalEncoder(categories=[Vehicle_Dimensions_order])
encode3 = OrdinalEncoder(categories=[Fraud_indicator_order])

In [None]:
encode1.fit(new_dataset[['Lane_Type']])
encode2.fit(new_dataset[['Vehicle_Dimensions']])
encode3.fit(new_dataset[['Fraud_indicator']])

In [None]:
new_lane=pd.DataFrame(encode1.transform(new_dataset[['Lane_Type']]))
new_dimensuions=pd.DataFrame(encode2.transform(new_dataset[['Vehicle_Dimensions']]))
new_fraud_indicator=pd.DataFrame(encode3.transform(new_dataset[['Fraud_indicator']]))

In [None]:
new_dataset['Lane_Type']= new_lane
new_dataset['Vehicle_Dimensions']= new_dimensuions
new_dataset['Fraud_indicator']=new_fraud_indicator

In [None]:
styled_df = new_dataset.head()
styled_df = styled_df.style.set_table_styles([
    {"selector": "th", "props": [("color", 'black'), ("background-color", "#FF00CC")]}
])
styled_df

Unnamed: 0,Transaction_ID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Vehicle_Speed,Fraud_indicator,Bus,Car,Motorcycle,SUV,Sedan,Truck,Van
0,1,0.0,0.0,350,120,65,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,1.0,1.0,120,100,78,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,3,1.0,1.0,0,0,53,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
3,4,1.0,0.0,350,120,92,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,5,0.0,2.0,140,100,60,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


> <span style='font-size:15px; font-family:Verdana;color: #FF00CC;'><b>Handling Missing values</b></span>

In [None]:
# handlling missing values in dataset with dropping method
data = new_dataset.dropna(how='any')

In [None]:
# checking the missing values of each column
data.isnull().sum()

Transaction_ID        0
Lane_Type             0
Vehicle_Dimensions    0
Transaction_Amount    0
Amount_paid           0
Vehicle_Speed         0
Fraud_indicator       0
Bus                   0
Car                   0
Motorcycle            0
SUV                   0
Sedan                 0
Truck                 0
Van                   0
dtype: int64

dristibution of legit and fraud transactions


<div style="text-align: center; background:  #FF00CC; font-family: 'Trebuchet MS', Arial, sans-serif; color: white; padding: 15px; font-size: 26px; font-weight: bold; line-height: 1; border-radius: 50% 0 50% 0 / 40px; margin-bottom: 20px; box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);">dristibution of legit and fraud transactions</div>

In [None]:
data['Fraud_indicator'] = data['Fraud_indicator'].replace({'Non-Fraud':0 , 'Fraud':1})


  data['Fraud_indicator'] = data['Fraud_indicator'].replace({'Non-Fraud':0 , 'Fraud':1})


In [None]:
data['Fraud_indicator'].value_counts()

Fraud_indicator
0    4017
1     983
Name: count, dtype: int64

<h2 style= "background-color: #222;
            padding: 10px;
            font: bold 26px tahoma;
            text-align:center;
            color:gold;
            border: 2px solid tomato;
            border-radius: 5px;">   
    😍🥰Highly Unblanced dataset🤩📊
</h2>

<div style="border: 2px solid #1E90FF; border-radius: 10px; margin-top: 10px; width:50%">
    <ol style="list-style-type: none; padding: 10px;">
        <li>0-> normal transaction</li>
        <li>1-> Fraud transaction</li>
    </ol>
</div>

In [None]:
# separating data for analysis
legit = data[data.Fraud_indicator == 0]
fraud = data[data.Fraud_indicator == 1]

In [None]:
print(legit.shape)
print(fraud.shape)

(4017, 14)
(983, 14)


In [None]:
#statistical method of the data
legit.Transaction_Amount.describe()

count    4017.000000
mean      153.110530
std       114.435986
min         0.000000
25%        90.000000
50%       125.000000
75%       290.000000
max       350.000000
Name: Transaction_Amount, dtype: float64

In [None]:
fraud.Transaction_Amount.describe()

count    983.000000
mean     193.555443
std       97.465586
min       60.000000
25%      120.000000
50%      145.000000
75%      300.000000
max      350.000000
Name: Transaction_Amount, dtype: float64

In [None]:
#compare the values for both transactions
data.groupby('Fraud_indicator').mean()

Unnamed: 0_level_0,Transaction_ID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Vehicle_Speed,Bus,Car,Motorcycle,SUV,Sedan,Truck,Van
Fraud_indicator,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,2618.24695,0.588748,0.86582,153.11053,153.11053,67.731392,0.13418,0.147374,0.177745,0.131939,0.137665,0.138412,0.132686
1,2019.330621,0.501526,0.819939,193.555443,92.83825,68.340793,0.180061,0.12411,0.0,0.187182,0.163784,0.160732,0.18413


<h2 style= "background-color: #222;
            padding: 10px;
            font: bold 26px tahoma;
            text-align:center;
            color:gold;
            border: 2px solid tomato;
            border-radius: 5px;">   
    😍Under Sampling📊
</h2>



In [None]:
legit_sample = legit.sample(n=983)

> <span style='font-size:15px; font-family:Verdana;color: #FF00CC;'><b>Concatenating two DataFrames</b></span>

In [None]:
new_dataset = pd.concat([legit_sample, fraud], axis=0)

In [None]:
new_dataset.head()

Unnamed: 0,Transaction_ID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Vehicle_Speed,Fraud_indicator,Bus,Car,Motorcycle,SUV,Sedan,Truck,Van
2320,2321,0.0,0.0,330,330,57,0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
624,625,0.0,1.0,60,60,55,0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
716,717,1.0,1.0,0,0,55,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4956,4957,0.0,0.0,145,145,58,0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2073,2074,1.0,1.0,90,90,79,0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [None]:
#dristibution of legit and fraud transactions in new dataset
new_dataset['Fraud_indicator'].value_counts()

Fraud_indicator
0    983
1    983
Name: count, dtype: int64

In [None]:
new_dataset.groupby('Fraud_indicator').mean()

Unnamed: 0_level_0,Transaction_ID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Vehicle_Speed,Bus,Car,Motorcycle,SUV,Sedan,Truck,Van
Fraud_indicator,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,2600.619532,0.5941,0.885046,152.772126,152.772126,68.324517,0.134283,0.141404,0.178026,0.131231,0.158698,0.132248,0.12411
1,2019.330621,0.501526,0.819939,193.555443,92.83825,68.340793,0.180061,0.12411,0.0,0.187182,0.163784,0.160732,0.18413


<p style="background-color: #4cbb17; font-family: Arial, sans-serif; color: #ffffff; font-size: 24px; text-align: center; padding: 10px; border-radius: 10px;">📋 Splitting the data into features and targets 📋</p>



In [None]:
X = new_dataset.drop(columns='Fraud_indicator', axis=1)
Y = new_dataset['Fraud_indicator']

<p style="background-color:#6A5ACD;font-family:Tahoma, Geneva, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;"> 📊 Split data into training and testing 📈 </p>


In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, train_size=0.8, stratify=Y, random_state=2)

In [None]:
print(X.shape, X_train.shape, X_test.shape)

(1966, 13) (1572, 13) (394, 13)


In [None]:


styled_df = pd.DataFrame(X_train).head()
styled_df = styled_df.style.set_table_styles([
    {"selector": "th", "props": [("color", 'black'), ("background-color", "#FF00CC")]}
])
styled_df

Unnamed: 0,Transaction_ID,Lane_Type,Vehicle_Dimensions,Transaction_Amount,Amount_paid,Vehicle_Speed,Bus,Car,Motorcycle,SUV,Sedan,Truck,Van
4618,4619,0.0,2.0,125,125,55,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3758,3759,0.0,2.0,120,100,57,0.0,0.0,0.0,0.0,1.0,0.0,0.0
2181,2182,1.0,2.0,125,90,95,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2543,2544,1.0,1.0,120,120,91,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3318,3319,0.0,0.0,130,130,67,0.0,0.0,0.0,1.0,0.0,0.0,0.0


<p style="background-color:#20B2AA;font-family:Arial, sans-serif;color:#FFFFFF;font-size:150%;text-align:center;border-radius:10px;padding:10px;border:2px solid #FFFFFF;"> 🔄 Model Development 🤖 </p>



In [None]:
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

# Assume X_train, X_test, Y_train, Y_test are defined and split appropriately

Models = {
    "Decision Tree": DecisionTreeClassifier(),
    "Random Forest": RandomForestClassifier(),
    "Logistic Regression": LogisticRegression(),
    "SVM Classification": SVC()
}

# Initialize lists to store results
model_names = []
train_accuracies = []
train_precisions = []
train_recalls = []
train_f1_scores = []
test_accuracies = []
test_precisions = []
test_recalls = []
test_f1_scores = []

# Evaluate each model and store results
for name, model in Models.items():
    # Train Model
    model.fit(X_train, Y_train)

    # Make predictions
    Y_train_pred = model.predict(X_train)
    Y_test_pred = model.predict(X_test)

    # Training Performance
    train_accuracy = accuracy_score(Y_train, Y_train_pred)
    train_precision = precision_score(Y_train, Y_train_pred)
    train_recall = recall_score(Y_train, Y_train_pred)
    train_f1 = f1_score(Y_train, Y_train_pred, average='weighted')

    # Testing Performance
    test_accuracy = accuracy_score(Y_test, Y_test_pred)
    test_precision = precision_score(Y_test, Y_test_pred)
    test_recall = recall_score(Y_test, Y_test_pred)
    test_f1 = f1_score(Y_test, Y_test_pred, average='weighted')

    # Append results to lists
    model_names.append(name)
    train_accuracies.append(train_accuracy)
    train_precisions.append(train_precision)
    train_recalls.append(train_recall)
    train_f1_scores.append(train_f1)
    test_accuracies.append(test_accuracy)
    test_precisions.append(test_precision)
    test_recalls.append(test_recall)
    test_f1_scores.append(test_f1)

# Create DataFrame from the results
results_df = pd.DataFrame({
    'Model': model_names,
    'Train Accuracy': train_accuracies,
    'Train Precision': train_precisions,
    'Train Recall': train_recalls,
    'Train F1 Score': train_f1_scores,
    'Test Accuracy': test_accuracies,
    'Test Precision': test_precisions,
    'Test Recall': test_recalls,
    'Test F1 Score': test_f1_scores
})

# Display the results
# print(results_df)
# Apply styling to the entire DataFrame
styled_results = results_df.style\
    .set_properties(**{'font-size': '12pt', 'font-weight': 'bold', 'color': '#6A5ACD'})\
    .background_gradient(cmap='Pastel1')\
    .set_caption('Model Evaluation Results')\
    .set_table_styles([{
        'selector': 'caption',
        'props': [('color', '#6A5ACD'), ('font-size', '16pt'), ('font-weight', 'bold')]
    }])

# Display the styled results
styled_results

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Unnamed: 0,Model,Train Accuracy,Train Precision,Train Recall,Train F1 Score,Test Accuracy,Test Precision,Test Recall,Test F1 Score
0,Decision Tree,1.0,1.0,1.0,1.0,0.992386,1.0,0.984772,0.992385
1,Random Forest,1.0,1.0,1.0,1.0,0.979695,1.0,0.959391,0.979687
2,Logistic Regression,0.976463,1.0,0.952926,0.97645,0.984772,1.0,0.969543,0.984768
3,SVM Classification,0.696565,0.746411,0.59542,0.693429,0.708122,0.753086,0.619289,0.7058


<div style="text-align: center; background:  #FF00CC; font-family: 'Trebuchet MS', Arial, sans-serif; color: white; padding: 15px; font-size: 26px; font-weight: bold; line-height: 1; border-radius: 50% 0 50% 0 / 40px; margin-bottom: 20px; box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);">Outcomes</div>


<span style="font-size: 14px; font-family: Verdana; background-color: #F5F5F5; border: 2px solid #ccc; padding: 10px; border-radius: 10px; display: inline-block;">
  <strong>Conclusion:</strong> In our evaluation of various classification algorithms, we observed the following key insights:

  - Decision Tree and Random Forest have a 100% in Training and Test data accuracy than Logistic Regression of 99% and an SVC of 69.09%
  - When comparing precision & recall for 4 models, Here the Decision tree and Random forest performed much better than the Logistric Regression and SVC as we can see that the detection of fraud cases is around 100 % and 98 %, and Logistric Regression and SVC of 72% and 62%.
  - So overall Decision tree and Random Forest Method performed much better in determining the fraud cases which is 100%

  - We can also improve on this accuracy by increasing the sample size or use deep learning algorithms however at the cost of computational expense.We can also use complex anomaly detection models to get better accuracy in determining more fraudulent cases

  
</span>