# Fraud Detection Model – Accredian Assignment

## Objective
The objective of this project is to build a machine learning model to detect fraudulent
financial transactions and derive actionable business insights to help prevent fraud
in a financial company.


## Data Dictionary

- **step**: Unit of time where 1 step equals 1 hour. Dataset simulates 744 hours (30 days).
- **type**: Type of transaction (CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER).
- **amount**: Transaction amount in local currency.
- **nameOrig**: Customer initiating the transaction.
- **oldbalanceOrg**: Origin account balance before transaction.
- **newbalanceOrig**: Origin account balance after transaction.
- **nameDest**: Recipient of the transaction.
- **oldbalanceDest**: Destination balance before transaction (not available for merchants).
- **newbalanceDest**: Destination balance after transaction (not available for merchants).
- **isFraud**: Indicates fraudulent transaction.
- **isFlaggedFraud**: Rule-based flag for transfers above 200,000.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score, roc_curve
from sklearn.ensemble import RandomForestClassifier

sns.set_style("whitegrid")

In [None]:
# Load dataset
df = pd.read_csv("fraud.csv")
df.head()

In [None]:
df.shape
df.info()

## Data Cleaning

- No missing values were observed.
- Outliers were retained as they represent fraud behavior.


In [None]:
df.isnull().sum()

In [None]:
plt.figure(figsize=(6,3))
sns.boxplot(x=df['amount'])
plt.title("Transaction Amount Distribution")
plt.show()

## Feature Engineering

Customer identifiers were removed as they do not generalize fraud patterns.
Transaction type was one-hot encoded.


In [None]:
df = df.drop(['nameOrig', 'nameDest'], axis=1)
df = pd.get_dummies(df, columns=['type'], drop_first=True)
df.head()

## Train-Test Split

In [None]:
X = df.drop('isFraud', axis=1)
y = df['isFraud']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

## Model Building – Random Forest

In [None]:
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
y_prob = rf.predict_proba(X_test)[:,1]

print(classification_report(y_test, y_pred))
print("ROC-AUC:", roc_auc_score(y_test, y_prob))

In [None]:
fpr, tpr, _ = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, label="Random Forest")
plt.plot([0,1],[0,1],'k--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()

## Key Fraud Factors

Fraud is mainly driven by:
- High transaction amounts
- TRANSFER and CASH-OUT transaction types
- Sudden balance changes

These align with the simulation design of fraudulent agents.


In [None]:
pd.Series(rf.feature_importances_, index=X.columns)
.sort_values(ascending=False).head(10)

## Fraud Prevention Recommendations

- Real-time transaction monitoring
- Flagging large transfers
- Multi-factor authentication
- Behavioral analytics


## Measuring Effectiveness

- Reduction in fraud rate
- Improved ROC-AUC score
- Decrease in false positives
- Customer complaint trends
