🛡️ FraudShield — Credit Card Fraud Detection System

A machine learning-based fraud detection system designed to identify anomalous financial transactions using both supervised and unsupervised learning approaches. The project includes model training, evaluation, explainability, and a live Streamlit dashboard for real-time fraud prediction.

🚀 Overview

Fraud detection is a highly imbalanced classification problem where fraudulent transactions represent a very small portion of total activity (~0.17%). This project aims to build a robust system capable of detecting fraud with high recall while maintaining precision.

📊 Dataset

Source: Kaggle Credit Card Fraud Dataset
Download creditcard.csv from Kaggle and place it inside the data/ folder.
Total Records: 284,807
Fraud Cases: 492 (~0.17%)
Features:
- V1–V28: PCA-transformed anonymized features
- Time, Amount
- Target: Class (0 = Normal, 1 = Fraud)

🧱 Project Architecture

ALLANAI.FraudShield/ ├── dashboard/ │ └── app.py # Streamlit dashboard ├── models/ │ ├── xgboost_fraud_model.pkl │ └── scaler.pkl ├── notebooks/ │ └── 01_fraudshield_eda_model.ipynb ├── requirements.txt └── README.md

🔍 Exploratory Data Analysis (EDA)

Key insights:

Dataset is highly imbalanced (~0.17% fraud)
Most transactions are low-value with long-tail distribution
Fraud patterns are not linearly separable
PCA features require non-linear models

🤖 Models Implemented

1. XGBoost (Supervised Learning)

Handles imbalanced data using scale_pos_weight
Captures complex non-linear relationships

Performance:

Precision: 0.71
Recall: 0.87
F1 Score: 0.78

2. Isolation Forest (Unsupervised Learning)

Detects anomalies without labelled data
Useful as baseline anomaly detector

Performance:

Precision: 0.29
Recall: 0.30
F1 Score: 0.30

⚖️ Model Comparison

Model	Precision	Recall	F1 Score
XGBoost	0.71	0.87	0.78
Isolation Forest	0.29	0.30	0.30

Key Insight

Supervised learning significantly outperforms unsupervised anomaly detection in this dataset due to availability of labelled fraud data.

🔍 Model Explainability (SHAP)

SHAP (SHapley Additive exPlanations) is used to interpret model predictions by identifying the contribution of each feature.

This improves:

Transparency
Trust in predictions
Debugging capability

🖥️ Streamlit Dashboard

An interactive dashboard allows real-time fraud prediction.

Features:

Input transaction details
Predict fraud probability
Display risk level
Show model performance metrics

Run locally:

streamlit run dashboard/app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ FraudShield — Credit Card Fraud Detection System

🚀 Overview

📊 Dataset

🧱 Project Architecture

🔍 Exploratory Data Analysis (EDA)

🤖 Models Implemented

1. XGBoost (Supervised Learning)

2. Isolation Forest (Unsupervised Learning)

⚖️ Model Comparison

Key Insight

🔍 Model Explainability (SHAP)

🖥️ Streamlit Dashboard

Features:

Run locally:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dashboard		dashboard
models		models
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🛡️ FraudShield — Credit Card Fraud Detection System

🚀 Overview

📊 Dataset

🧱 Project Architecture

🔍 Exploratory Data Analysis (EDA)

🤖 Models Implemented

1. XGBoost (Supervised Learning)

2. Isolation Forest (Unsupervised Learning)

⚖️ Model Comparison

Key Insight

🔍 Model Explainability (SHAP)

🖥️ Streamlit Dashboard

Features:

Run locally:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages