Skip to content

The project consists of implementing an autoencoder-based fraud detector

License

Notifications You must be signed in to change notification settings

konkinit/fraud_detection

Repository files navigation

Fraud Detection

The project consists of implementing an autoencoder-based fraud detector on customers' data

GitHub Workflow Status (with event)
GitHub Python Version
GitHub repo size Docker Image Size (tag)

To-Do

  • Cache s3 access keys in docker image

Description

One phenomenon businesses face undoubtedly is fraud. It is a situation where a customer has an irregular pattern of events (transactions, visits, ...) with a business. Two factions of customers emerge : the atypical or frauder and the typical customers. It is important to notice that fraud is rare event that is to say in a sample of 1000 customers, up to 5 appear to have a fraudulent behaviours. Gather, in a customer base, a large number of typical customers is then realistic conequently train a model aiming to identify regular behaviours and reconstruct a typical customer profile is possible. It turns out that AutoEncoders perform this task.

Model

The model detecting fraud is an autoencoder trained on a group of customers labelled as typical ones on their closed relationship basis.

Experiments

To get hands on the project there are two ways.

  1. Through repo cloning: this way allows model train or weights updating
  • Clone the repo, get in the directory fraud_detection/
git clone https://github.com/konkinit/fraud_detection.git

cd ./fraud_detection

pip install -r requirements.txt --upgrade
  • Create a .env file
S3_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
S3_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
S3_REGION=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  • For training a new model, run the following command with the tuned args
python training.py --help
python training.py --mode 'train' --idmodel 'simulated_data' --trainingdatapath "data/fraudulent_obs_data.gzip" --cutoffevaldatapath "data/non_fraudulent_obs_data.gzip" --splitfrac 0.6 0.2 0.2 --codedim 5 --hiddendim 15 --lr 1e-3 --nepochs 50
  • For updating weigths of an existing model (ensure the dimensions passed through the args are the same as the current model dimensions):
python training.py --mode 'retrain' --idmodel 'simulated_data' --trainingdatapath 'data/fraudulent_obs_data.gzip' --splitfrac 0.6 0.2 0.2 --codedim 5 --hiddendim 15 --lr 1e-3 --nepochs 50
  • After training or retraining a model, inference on instances is done by running:
uvicorn production:app --port 8800 --reload

The endpoint looks like /customer_id/{customer}?model={model_id} where {customer} refers to an identifier of a customer and {model_id} is the deployed fraud detector model.

  1. Through Docker image by running the following commands
docker pull kidrissa/fraud_detector_app:latest

docker run kidrissa/fraud_detector_app:latest -p 8800:8800

In a web navigator, connect to <container-ip>:8800

References & Citations

About

The project consists of implementing an autoencoder-based fraud detector

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published