# Fraud Detection Analysis

**With this dataset I want to understand what factors indicate that a transaction is fraud and find any trends within all these fraud transactions, also comparing it with transactions that are not fraud**

In [None]:
# Importing Libraries

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
df = pd.read_csv('/kaggle/input/fraud-detection-example/fraud_dataset_example.csv')

In [None]:
df.head(5)

Understanding the columns:

1. step is the unit of time which according to this data source is 1 hour
2. type is the transaction type (CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER)
3. amount is the transaction amount
4. nameOrig is the transaction originator, this may indicate the person sending the payment
5. oldbalanceOrg is the initial balance (before transaction)
6. newbalanceOrig is the new balance (after transaction)
7. nameDest is the transaction recipient, person receiving the payment
8. oldbalanceDest is the initial balance before transaction.
9. newbalanceDest is the new balance after transaction.
10. isFraud is when a Fraud agent takes control of customers accounts and attempts to empty it by transferring to another account and then cashing out.
11. isFlaggedFraud is an illegal attempt to transfer massive amount of money in a single transaction.

In [None]:
# Identifying any null values
df.isnull().sum().sum()

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
sns.heatmap(df.corr(), annot=True)

isFraud seems to have more correlation with the amount and that is true since most frauds are detected based on how large the transacion amount is

In [None]:
sns.catplot(x="isFraud", y="amount", col="type", data=df)

0 = Not Fraud

1 = Fraud

with this graph we can see that most transactions detected fraud are not just large transactions, many small amounts are also detected as fraud, but there seems to be many abnormalities within this trend, two transactions are very high.
But we can see that normal transactions have a very linear trend and all fraud transactions are either a Transfer or a cash out.

So with this information, I think there are other factors that influence this detection and not just how large the amount is.

According to the heatmap, the next thing after amount that has more correlation with isfraud is newbalanceOrig and then oldbalanceOrg

In [None]:
sns.lmplot(x="oldbalanceOrg", y="amount", hue="isFraud", col="type", data=df)
sns.lmplot(x="newbalanceOrig", y="amount", hue="isFraud", col="type", data=df)

In [None]:
sns.lmplot(x="oldbalanceDest", y="amount", hue="isFraud", col="type", data=df)
sns.lmplot(x="newbalanceDest", y="amount", hue="isFraud", col="type", data=df)

We can see more abnormalities in the cash out type so we can say this category is also influencing the fraud

**We will take look at the highest transaction for both fraud and normal transaction**

In [None]:
df[df['isFraud'] == 0]['amount'].nlargest(10)

The top 10 normal transactions have a close value

In [None]:
df[df['isFraud'] == 1]['amount'].nlargest(10)

While the top 10 fraud transactions have a very diverse range

In [None]:
fraud = df[df['isFraud'] == 1]['amount'].max()
not_fraud = df[df['isFraud'] == 0]['amount'].max()

print("Highest amount for fraud transaction:", fraud)
print("Highest amount for noraml transaction:", not_fraud)

print("\nHigest Fraud Transaction is", fraud - not_fraud, "times more than the highest normal transaction")

Now lets compare their balances

In [None]:
df[df['amount'] == fraud]

In both these cases, the amount was very high

In the second row, we can see that all the balance was send as a transaction, leaving nothing behind

In [None]:
df[df['amount'] == not_fraud]

After transaction, there is still a good amount left, not showing anything suspicious

# Summary of Analysis

What indicates whether a transaction is fraud:
1. An abnormally large transaction amount 
2. If the balance after transaction is very low or close to none
3. CASH_OUT and TRANSFER type of transaction is more likely to be a fraud