This repository contains a complete end-to-end Machine Learning project for detecting fraudulent insurance claims using a Flask-based web application. The project includes data ingestion, validation, preprocessing, clustering, model training (with multiple algorithms including XGBoost), model selection, and prediction via a web interface.
This is a fraud detection system built for insurance claims data. The pipeline handles:
- Raw data validation
- Data preprocessing (handling missing values, feature engineering)
- Clustering (K-Means for grouping similar data)
- Training multiple ML models
- Selecting the best model using cross-validation
- Saving the trained model
- Batch and single prediction via Flask API
- Monitoring dashboard integration
The application runs locally and provides a simple web form to upload CSV files for prediction.
Here is a step-by-step summary of what was performed to set up and run this project (as executed on January 3, 2026):
-
Navigated to Project Directory
-
cd Desktop cd fraud
-
-
Created a Dedicated Conda Environment
- Accepted Anaconda Terms of Service for required channels.
- Created a new environment with Python 3.8:
conda create -n fraud-env python=3.8 -y conda activate fraud-env
-
Installed Dependencies
- Installed all required packages from
requirements.txt(includes Flask, scikit-learn, XGBoost, pandas, numpy, Flask-MonitoringDashboard, etc.):pip install -r requirements.txt
- Installed all required packages from
-
Initialized Git Repository
git init git add . git commit -m "Initial commit - Fraud detection Flask ML project" git branch -M main
-
Pushed to GitHub
- Added remote origin and pushed the code:
git remote add origin https://github.com/sujith52/fraud.git git push -u origin main
- Added remote origin and pushed the code:
-
Ran the Flask Application
python main.py
- Started the server on
http://127.0.0.1:5001 - Scheduler and monitoring dashboard initialized
- Successfully served the main page and handled prediction requests via
/predictendpoint
- Started the server on
- Miniconda/Anaconda installed
- Git installed
-
Clone the repository:
git clone https://github.com/sujith52/fraud.git cd fraud -
Create and activate conda environment:
conda create -n fraud-env python=3.8 -y conda activate fraud-env
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
python main.py
-
Open browser and go to:
http://127.0.0.1:5001 -
Use the web form to upload a CSV file (in the expected format) for fraud prediction.
- Access performance monitoring at:
http://127.0.0.1:5001/dashboard - Uses Flask-MonitoringDashboard for tracking API usage and performance.
Key directories and files:
data/- Contains training dataset (insuranceFraud.csv)models/- Stores trained model filesmain.py- Entry point for Flask apppredictFromModel.py- Prediction logictrainingModel.py- Model training pipelinetemplates/index.html- Web interfacerequirements.txt- All Python dependencies (pinned to compatible versions)schema_training.json/schema_prediction.json- Data schema validation rules
The model was trained on insurance claims data with features leading to a target column indicating fraud (fraud_reported or similar).
- Python 3.8
- Flask (Web Framework)
- scikit-learn, XGBoost (ML Models)
- Pandas, NumPy (Data Processing)
- K-Means Clustering (Data Grouping)
- Flask-MonitoringDashboard (App Monitoring)
- Git & GitHub (Version Control)
- This project uses older versions of libraries (e.g., Flask 1.1.1, scikit-learn 0.22.1) for compatibility.
- For production, use Gunicorn + Nginx instead of Flask's development server.
- The
Procfilesuggests readiness for deployment on platforms like Heroku.
@sujith52
Feel free to fork, improve, or raise issues!