
# Fraud Transaction Detection EDA

This notebook contains exploratory data analysis (EDA) for the **Fraud.csv** dataset as per the Accredian assessment.

---

## Dataset Description
- **step** — maps a unit of time in the real world. 1 step = 1 hour. Total steps: 744 (30 days).
- **type** — Transaction type: `CASH-IN`, `CASH-OUT`, `DEBIT`, `PAYMENT`, `TRANSFER`.
- **amount** — Transaction amount (local currency).
- **nameOrig** — Originating customer ID.
- **oldbalanceOrg** — Initial balance before transaction.
- **newbalanceOrig** — New balance after transaction.
- **nameDest** — Destination customer ID.
- **oldbalanceDest** — Initial balance of recipient (NaN for merchants).
- **newbalanceDest** — New balance of recipient (NaN for merchants).
- **isFraud** — Transactions made by fraudulent agents.
- **isFlaggedFraud** — Flags illegal attempts (> 200,000 in a single transaction).


In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams["figure.figsize"] = (10, 6)
sns.set_style("whitegrid")

df = pd.read_csv("Fraud.csv")
df.head()


In [None]:
df.info()

In [None]:
df.describe().T

In [None]:
df.isnull().sum()

In [None]:

df['type'].value_counts().plot(kind='bar', color="#4C72B0")
plt.title("Transaction Type Distribution")
plt.ylabel("Count")
plt.show()


In [None]:

fraud_counts = df['isFraud'].value_counts()
fraud_percentage = (fraud_counts[1] / len(df)) * 100

plt.bar(['Non-Fraud', 'Fraud'], fraud_counts.values, color=["#4C72B0", "#DD8452"])
plt.title(f"Fraud vs Non-Fraud Transactions ({fraud_percentage:.4f}% Fraud)")
plt.ylabel("Count")
plt.show()


In [None]:

fraud_by_type = df.groupby('type')['isFraud'].mean().sort_values(ascending=False) * 100
sns.barplot(x=fraud_by_type.index, y=fraud_by_type.values, color="#55A868")
plt.title("Fraud Rate by Transaction Type (%)")
plt.ylabel("Fraud Rate (%)")
plt.show()


In [None]:
df['isFlaggedFraud'].value_counts()

In [None]:

sns.boxplot(data=df[df['isFraud'] == 1], x='type', y='amount', color="#C44E52")
plt.title("Fraudulent Transaction Amounts")
plt.ylim(0, df['amount'].quantile(0.99))
plt.show()


In [None]:

df[(df['type'] == 'TRANSFER') & (df['amount'] > 200000)]


In [None]:

fraud_trend = df.groupby('step')['isFraud'].sum()
plt.plot(fraud_trend.index, fraud_trend.values, marker='o', color="#8172B3")
plt.title("Fraudulent Transactions Over Time")
plt.xlabel("Step (Hour)")
plt.ylabel("Number of Fraud Cases")
plt.show()
