## Fraudulent Transactions Analysis and Visualization

This notebook performs an analysis of fraudulent transactions to uncover patterns and insights. The analysis is divided into several key questions, each answered through data aggregation and visualization. The dataset used contains information on various transactions, including their financial impact and the circumstances under which they occurred.


### 1. Setup and Data Loading

**Importing Libraries**: Import necessary libraries (`pandas`, `numpy`, `plotly.express`) and defined functions `line_graph` and `bar_graph` for line plots and bar plots for uniformity and convenience.

In [1]:
from typing import List, Optional
import pandas as pd
import numpy as np
from plotly import express as px
from plotly import graph_objects as go
from plotly import offline as pyo
import calendar

In [2]:
def line_graph(
        dataset: pd.DataFrame, x: str, y: str,
        title: Optional[str] = None,
        xlabel: Optional[str] = None, ylabel: Optional[str] = None,
        xticks: Optional[List] = None, yticks: Optional[List] = None,
        color: Optional[str] = None, marker: bool = True
):
    if not color:
        fig = px.line(data_frame= dataset, x= x, y= y, title= title, markers= marker)
    fig = px.line(data_frame= dataset, x= x, y= y, title= title, color= color, markers= marker)
    if xlabel:
        fig.update_layout(xaxis_title= xlabel)
    if ylabel:
        fig.update_layout(yaxis_title= ylabel)
    if xticks:
        fig.update_xaxes(
            tickvals= xticks[0] if len(xticks) > 0 else None,
            ticktext= xticks[1] if len(xticks) > 1 else None,
            tickangle= xticks[2] if len(xticks) > 2 else None
        )
    if yticks:
        fig.update_yaxes(
            tickvals= yticks[0] if len(yticks) > 0 else None,
            ticktext= yticks[1] if len(yticks) > 1 else None,
            tickangle= yticks[2] if len(yticks) > 2 else None
        )
    fig.show()

In [3]:
def bar_graph(
        dataset: pd.DataFrame, x: str, y: str,
        title: Optional[str] = None,
        xlabel: Optional[str] = None, ylabel: Optional[str] = None,
        xticks: Optional[List] = None, yticks: Optional[List] = None,
        color: Optional[str] = None  
):
    if not color:
        fig = px.bar(data_frame= dataset, x= x, y= y, title= title)
    fig = px.bar(data_frame= dataset, x= x, y= y, title= title, color= color)
    if xlabel:
        fig.update_layout(xaxis_title= xlabel)
    if ylabel:
        fig.update_layout(yaxis_title= ylabel)
    if xticks:
        fig.update_xaxes(
            tickvals= xticks[0] if len(xticks) > 0 else None,
            ticktext= xticks[1] if len(xticks) > 1 else None,
            tickangle= xticks[2] if len(xticks) > 2 else None
        )
    if yticks:
        fig.update_yaxes(
            tickvals= yticks[0] if len(yticks) > 0 else None,
            ticktext= yticks[1] if len(yticks) > 1 else None,
            tickangle= yticks[2] if len(yticks) > 2 else None
        )
    fig.show()

**Loading Data**: Load the fraudulent transactions dataset from a CSV file and display the first few rows for an initial overview.

In [4]:
file_path = r"datasets/fraudulent_data.csv"
fraudulent_data = pd.read_csv(file_path)
pd.set_option('display.max_columns', 50)
fraudulent_data.drop(columns='index', inplace= True)
fraudulent_data.head()

Unnamed: 0,year,month,hour,txn_type,txn_status,error_code,remitter_bank,beneficiary_bank,payer_handle,payer_app,payee_handle,payee_app,payee_requested_amount,payee_settlement_amount,difference_amount,payer_state,payee_state,cred_type,cred_subtype,time_of_day
0,2020,2,0,Reversal,Successful,0,Utkarsh Small Finance Bank,Lakshmi Vilas Bank,MAHB,BHIM Maha UPI(Bank of Maharashtra),CITI,CITI Bank (Mobile Banking App),10282,10282,0,Uttar Pradesh,Jammu and Kashmir,Home Loan,Adjustable-Rate Mortgage (ARM),LateNight
1,2020,8,0,Reversal,Successful,0,Allahabad Bank,Canara Bank,RBL,BHIM RBL Pay,APMAHESH,APMAHESH,42022,42022,0,West Bengal,Goa,Auto Loan,New Car Loan,LateNight
2,2020,2,0,Reversal,Successful,0,Utkarsh Small Finance Bank,Lakshmi Vilas Bank,MAHB,BHIM Maha UPI(Bank of Maharashtra),CITI,CITI Bank (Mobile Banking App),10282,10282,0,Uttar Pradesh,Jammu and Kashmir,Home Loan,Adjustable-Rate Mortgage (ARM),LateNight
3,2020,8,0,Reversal,Successful,0,Allahabad Bank,Canara Bank,RBL,BHIM RBL Pay,APMAHESH,APMAHESH,42022,42022,0,West Bengal,Goa,Auto Loan,New Car Loan,LateNight
4,2020,8,0,Reversal,Successful,0,Allahabad Bank,Canara Bank,RBL,BHIM RBL Pay,APMAHESH,APMAHESH,42022,42022,0,West Bengal,Goa,Auto Loan,New Car Loan,LateNight


### 2. Analysis and Visualization

#### Q1. How does the total loss amount (payee settlement amount) vary with (payee) state?
- **Data Aggregation**:
    - Group data by `payee_state` and calculate the total loss amount.
    - Convert the amounts to lakhs for readability.
- **Visualization**:
     - Create a bar plot to show the top 10 states by loss amount.


In [5]:
df1 = fraudulent_data.groupby('payee_state')['payee_settlement_amount'].sum().reset_index()
df1['loss_amount_(in_lakhs)'] = np.round(df1.payee_settlement_amount/1e5, 2)
df1.drop(columns='payee_settlement_amount', inplace= True)
df1.sort_values(by= 'loss_amount_(in_lakhs)', ascending= False, inplace= True)
df1

Unnamed: 0,payee_state,loss_amount_(in_lakhs)
10,Kerala,184.83
5,Haryana,169.06
13,Odisha,157.91
3,Goa,152.4
1,Bihar,151.39
17,Tamil Nadu,137.24
21,West Bengal,135.43
0,Andhra Pradesh,130.5
16,Rajasthan,119.79
12,Maharashtra,98.95


In [6]:
bar_graph(dataset= df1,
          x= 'loss_amount_(in_lakhs)',
          y= 'payee_state',
          color= 'loss_amount_(in_lakhs)',
          title= 'Loss Amount By States',
          xlabel= 'Loss Amount (Lakh Rs.)',
          ylabel= 'Payee State')

In [7]:
df1 = df1.nlargest(n= 10, columns='loss_amount_(in_lakhs)').sort_values(by= 'loss_amount_(in_lakhs)', ascending= False)

bar_graph(
    dataset= df1,
    x= 'loss_amount_(in_lakhs)',
    y= 'payee_state',
    color= 'loss_amount_(in_lakhs)',
    title= 'Top 10 States By Loss Amount',
    xlabel= 'Loss Amount (Lakh Rs.)',
    ylabel= 'Payee State'
)

#### Q2. How have fraud incidents fluctuated over the years?
- **Yearly Loss Amount**:
    - Group data by `year` and calculate the total loss amount.
    - Plot a line chart to show the trend over the years.
- **Monthly Loss Amount**:
    - Group data by `month` and calculate the total loss amount.
    - Plot a line chart to show the monthly trend.
- **Yearly Fraud Counts**:
    - Group data by `year` and count the number of fraudulent incidents.
    - Plot a line chart to show the yearly trend.
- **Monthly Fraud Counts**:
    - Group data by `month` and count the number of fraudulent incidents.
    - Plot a line chart to show the monthly trend.


In [8]:
df2 = fraudulent_data.groupby(['year', 'month'])['payee_settlement_amount'].sum().reset_index()
df2['loss_amt_(in_lakhs)'] = np.round(df2['payee_settlement_amount']/1e5, 2)
df2.drop(columns='payee_settlement_amount', inplace= True)
df2.sort_values(by= 'loss_amt_(in_lakhs)', ascending= False)
df2.head(10)

Unnamed: 0,year,month,loss_amt_(in_lakhs)
0,2019,1,22.6
1,2019,2,19.08
2,2019,3,22.87
3,2019,4,30.94
4,2019,5,30.56
5,2019,6,29.31
6,2019,7,27.52
7,2019,8,27.94
8,2019,9,31.86
9,2019,10,41.75


In [9]:
line_graph(
    dataset= df2,
    x= 'month', y= 'loss_amt_(in_lakhs)',
    title= 'Loss Amount Monthly Trend (for each Year)',
    xlabel= 'Month', ylabel= 'Loss Amount (Lakh Rs.)',
    xticks= [df2['month'], calendar.month_abbr[1:13]],
    color= 'year'
)

In [10]:
df3 = fraudulent_data.groupby('month')['payee_settlement_amount'].sum().reset_index()
df3['loss_amt_(in_lakhs)'] = np.round(df3['payee_settlement_amount']/1e5, 2)
df3.drop(columns='payee_settlement_amount', inplace= True)
df3.sort_values(by= 'loss_amt_(in_lakhs)', ascending= False)
df3

Unnamed: 0,month,loss_amt_(in_lakhs)
0,1,137.08
1,2,136.77
2,3,160.69
3,4,159.44
4,5,165.87
5,6,135.32
6,7,170.85
7,8,161.68
8,9,144.6
9,10,162.84


In [11]:
line_graph(
    dataset= df3,
    x= 'month', y= 'loss_amt_(in_lakhs)',
    title= 'Loss Amount Monthly Trend (Cumulative)',
    xlabel= 'Month', ylabel= 'Loss Amount (Lakh Rs.)',
    xticks= [df3['month'], calendar.month_abbr[1:13]]
)

In [12]:
df4 = fraudulent_data.groupby(['year', 'month']).size().reset_index(name='fraud_counts')
df5 = fraudulent_data.groupby('month').size().reset_index(name='fraud_counts')

print(df4, "\n", df5)

    year  month  fraud_counts
0   2019      1            51
1   2019      2            57
2   2019      3            45
3   2019      4            54
4   2019      5            63
5   2019      6            57
6   2019      7            60
7   2019      8            63
8   2019      9            66
9   2019     10            72
10  2019     11            42
11  2019     12            51
12  2020      1            45
13  2020      2            66
14  2020      3            81
15  2020      4            60
16  2020      5            69
17  2020      6            48
18  2020      7            57
19  2020      8            81
20  2020      9            63
21  2020     10            66
22  2020     11            42
23  2020     12            72
24  2021      1            63
25  2021      2            66
26  2021      3            69
27  2021      4            87
28  2021      5            63
29  2021      6            42
30  2021      7            54
31  2021      8            48
32  2021  

In [13]:
line_graph(
    dataset= df4,
    x= 'month', y= 'fraud_counts',
    title= 'Fraud Incidents Monthly Trend (for each Year)',
    xlabel= 'Months', ylabel= 'Fraud Incidents',
    xticks= [df4['month'], calendar.month_abbr[1:13]],
    color= 'year'
    )

In [14]:
line_graph(dataset= df5,
           x= 'month', y= 'fraud_counts',
           title= 'Fraud Incidents Monthly Trend (Cumulative)',
           xlabel= 'Month', ylabel= 'Fraud Incidents',
           xticks=[df5['month'], calendar.month_abbr[1:13]])

#### Q3. Which types of Credit fraud are causing the most financial damage?
- **Data Aggregation**:
    - Group data by `cred_type` and calculate the total loss amount.
    - Convert the amounts to lakhs for readability.
- **Visualization**:
    - Create a bar plot to show the loss amount by credit type.


In [15]:
df6 = fraudulent_data.groupby('cred_type')['payee_settlement_amount'].sum().reset_index()
df6['loss_amt_(in_lakhs)'] = np.round(df6.payee_settlement_amount/1e5, 2)
df6.drop(columns= 'payee_settlement_amount', inplace= True)
df6

Unnamed: 0,cred_type,loss_amt_(in_lakhs)
0,Auto Loan,249.19
1,Credit Card,348.98
2,Debit Card,267.67
3,Home Loan,230.67
4,Line of Credit,228.97
5,Overdraft,275.05
6,Personal Loan,280.06


In [16]:
bar_graph(
    dataset= df6,
    x= 'cred_type', y= 'loss_amt_(in_lakhs)',
    xlabel= 'Credit Account Type', ylabel= 'Loss Amount (Lakh Rs.)',
    title= 'Loss Amount by Credit type',
    color= 'loss_amt_(in_lakhs)'
)

#### Q4. How does user behavior vary by time of day in terms of the number of fraudulent transactions and fraudulent transaction amount?
- **Segmentation of Time of Day**:
  - The time of day is segmented based on the hour of the transaction completion time:
    - **LateNight**: 0:00 - 2:59
    - **EarlyMorning**: 3:00 - 5:59
    - **Morning**: 6:00 - 8:59
    - **LateMorning**: 9:00 - 11:59
    - **Afternoon**: 12:00 - 14:59
    - **LateAfternoon**: 15:00 - 17:59
    - **Evening**: 18:00 - 20:59
    - **Night**: 21:00 - 23:59
- **Data Aggregation**:
  - Group data by `time_of_day` and calculate the total loss amount and total difference amount.
  - Convert the amounts to lakhs and sort by custom-defined time segments.
- **Visualization**:
  - Create bar plots to show the loss amount and difference amount by time of day.


In [17]:
segments = ['EarlyMorning', 'Morning', 'LateMorning', 'Afternoon', 'LateAfternoon', 'Evening', 'Night', 'LateNight']
df7 = fraudulent_data.groupby('time_of_day')['payee_settlement_amount'].sum().reset_index()
df7['loss_amt_(in_lakhs)'] = np.round(df7.payee_settlement_amount/1e5, 2)
df7.drop(columns='payee_settlement_amount', inplace= True)
df7['time_of_day'] = pd.Categorical(df7['time_of_day'], categories=segments, ordered=True)
df7.sort_values('time_of_day', inplace= True)
df7

Unnamed: 0,time_of_day,loss_amt_(in_lakhs)
1,EarlyMorning,171.5
6,Morning,342.23
4,LateMorning,133.67
0,Afternoon,333.16
3,LateAfternoon,179.96
2,Evening,308.09
7,Night,198.92
5,LateNight,213.09


In [18]:
bar_graph(
    dataset= df7,
    x= 'time_of_day', y= 'loss_amt_(in_lakhs)',
    title= 'Fraudulent Transactions by Time of Day',
    xlabel= 'Time Of Day', ylabel= 'Loss Amount (Lakh Rs.)',
    xticks= [segments, segments],
    color= 'loss_amt_(in_lakhs)'
)

In [19]:
segments = ['EarlyMorning', 'Morning', 'LateMorning', 'Afternoon', 'LateAfternoon', 'Evening', 'Night', 'LateNight']
df8 = fraudulent_data.groupby('time_of_day')['difference_amount'].sum().reset_index()
df8['diff_amt_(in_lakhs)'] = np.round(df8['difference_amount'] / 1e5, 2)
df8.drop(columns='difference_amount', inplace=True)
df8['time_of_day'] = pd.Categorical(df8['time_of_day'], categories=segments, ordered=True)
df8.sort_values('time_of_day', inplace=True)
df8

Unnamed: 0,time_of_day,diff_amt_(in_lakhs)
1,EarlyMorning,-10.86
6,Morning,-20.47
4,LateMorning,-4.4
0,Afternoon,-18.26
3,LateAfternoon,-8.37
2,Evening,-5.72
7,Night,-9.11
5,LateNight,-8.57


In [20]:
bar_graph(
    dataset= df8,
    x= 'time_of_day', y= 'diff_amt_(in_lakhs)',
    title= 'Difference Amount by Time of Day',
    xlabel= 'Time Of Day', ylabel= 'Difference Amount (Lakh Rs.)',
    xticks= [segments, segments],
    color= 'diff_amt_(in_lakhs)'
)

In [21]:
underpayments = fraudulent_data[fraudulent_data.difference_amount > 0].groupby('time_of_day')['difference_amount'].sum().reset_index()
underpayments['difference_amount_(in_lakhs)'] = np.round(underpayments.difference_amount / 1e5, 3)
underpayments.drop(columns= ['difference_amount'], inplace= True)

overpayments = fraudulent_data[fraudulent_data.difference_amount < 0].groupby('time_of_day')['difference_amount'].sum().reset_index()
overpayments['difference_amount_(in_lakhs)'] = np.round(overpayments.difference_amount / 1e5, 3)
overpayments.drop(columns= ['difference_amount'], inplace= True)

underpayments.rename(columns={'difference_amount_(in_lakhs)': 'underpayments'}, inplace=True)
overpayments.rename(columns={'difference_amount_(in_lakhs)': 'overpayments'}, inplace=True)

df9 = pd.merge(underpayments, overpayments, on= 'time_of_day', how= 'outer').fillna(0)

df9['time_of_day'] = pd.Categorical(df9['time_of_day'], categories=segments, ordered=True)
df9.sort_values('time_of_day', inplace=True)

df9

Unnamed: 0,time_of_day,underpayments,overpayments
1,EarlyMorning,0.018,-10.879
6,Morning,0.698,-21.17
4,LateMorning,0.183,-4.583
0,Afternoon,0.161,-18.417
3,LateAfternoon,0.088,-8.454
2,Evening,2.106,-7.822
7,Night,0.71,-9.818
5,LateNight,0.416,-8.989


In [22]:
trace1 = go.Bar(
    x= df9['time_of_day'],
    y= df9['underpayments'],
    name= 'underpayments in Lakhs',
    marker= dict(color= 'red')
)
trace2 = go.Bar(
    x= df9['time_of_day'],
    y= df9['overpayments'],
    name= 'overpayments in lakhs',
    marker= dict(color= 'blue')
)
layout = go.Layout(
    title='Underpayments and Overpayments by Time of Day',
    xaxis=dict(title='Time of Day'),
    yaxis=dict(title='Amount (in Lakhs)'),
    barmode='group'
)
# Create the figure
fig = go.Figure(data=[trace1, trace2], layout=layout)

# Show the plot
pyo.iplot(fig)

## Conclusion
This notebook provided a comprehensive analysis of fraudulent transactions, focusing on various aspects such as loss amounts by state, yearly and monthly trends in fraudulent incidents, the impact of different credit types, and user behavior based on the time of day.

### Key Insights:

### Loss Amount by State:
- Certain states experience significantly higher financial losses due to fraudulent transactions.
- Kerala tops the list with 184.83 lakhs in losses, indicating a concentration of fraudulent activity in this region.

### Yearly and Monthly Trends:

- *Yearly Loss Amounts*:
    - Total loss amounts fluctuate over the years, reaching a peak of 398.56 lakhs in 2022.
    - This indicates a dynamic landscape of fraudulent activity that organizations need to adapt to.
- *Monthly Loss Amounts*:
    - Monthly trends highlight peaks in March and November, suggesting potential seasonal patterns in fraudulent behavior.
- *Fraudulent Incidents*:
    - Despite variations, the number of fraud incidents remains consistently high, emphasizing the persistent threat posed by fraudulent activities.
- *Credit Type Impact*:
    - Credit card-related fraud accounts for the highest loss amount, with 348.98 lakhs, indicating the significant financial impact of credit card fraud compared to other credit types.

### Time of Day Analysis:

- *Loss Amounts*:
    - Morning and evening periods witness the highest loss amounts, indicating potential peak periods of fraudulent activity.
- *Difference Amounts*:
    - Negative difference amounts during morning and afternoon periods suggest a tendency for overpayments in transactions during these times.
### Next Steps:
- *Deep Dive into High-Risk States*: Further analysis of high-loss states like Kerala could reveal specific factors driving the elevated levels of fraud in these regions.
- *Seasonal and Monthly Patterns*: Investigating peaks in March and November could provide insights into seasonal variations in fraudulent behavior.
- *Credit Type Specific Strategies*: Given the high impact of credit card fraud, implementing targeted prevention measures for credit card transactions could help mitigate financial losses.
- *Behavioral Analysis*: Analyzing user behavior during peak fraudulent periods could uncover patterns that can be used to enhance fraud detection and prevention strategies.
---
This analysis underscores the importance of proactive monitoring and adaptation in combating financial fraud. By understanding the patterns and trends highlighted in this analysis, organizations can better protect themselves against fraudulent activities.

# END OF DOCUMENT