Fraud Detection

The project consists of implementing an autoencoder-based fraud detector on customers' data

To-Do

Cache s3 access keys in docker image

Description

One phenomenon businesses face undoubtedly is fraud. It is a situation where a customer has an irregular pattern of events (transactions, visits, ...) with a business. Two factions of customers emerge : the atypical or frauder and the typical customers. It is important to notice that fraud is rare event that is to say in a sample of 1000 customers, up to 5 appear to have a fraudulent behaviours. Gather, in a customer base, a large number of typical customers is then realistic conequently train a model aiming to identify regular behaviours and reconstruct a typical customer profile is possible. It turns out that AutoEncoders perform this task.

Model

The model detecting fraud is an autoencoder trained on a group of customers labelled as typical ones on their closed relationship basis.

Experiments

To get hands on the project there are two ways.

Through repo cloning: this way allows model train or weights updating

Clone the repo, get in the directory fraud_detection/

git clone https://github.com/konkinit/fraud_detection.git

cd ./fraud_detection

pip install -r requirements.txt --upgrade

Create a .env file

S3_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
S3_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
S3_REGION=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

For training a new model, run the following command with the tuned args

python training.py --help

python training.py --mode 'train' --idmodel 'simulated_data' --trainingdatapath "data/fraudulent_obs_data.gzip" --cutoffevaldatapath "data/non_fraudulent_obs_data.gzip" --splitfrac 0.6 0.2 0.2 --codedim 5 --hiddendim 15 --lr 1e-3 --nepochs 50

For updating weigths of an existing model (ensure the dimensions passed through the args are the same as the current model dimensions):

python training.py --mode 'retrain' --idmodel 'simulated_data' --trainingdatapath 'data/fraudulent_obs_data.gzip' --splitfrac 0.6 0.2 0.2 --codedim 5 --hiddendim 15 --lr 1e-3 --nepochs 50

After training or retraining a model, inference on instances is done by running:

uvicorn production:app --port 8800 --reload

The endpoint looks like /customer_id/{customer}?model={model_id} where {customer} refers to an identifier of a customer and {model_id} is the deployed fraud detector model.

Through Docker image by running the following commands

docker pull kidrissa/fraud_detector_app:latest

docker run kidrissa/fraud_detector_app:latest -p 8800:8800

In a web navigator, connect to <container-ip>:8800

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
data		data
models		models
notebooks		notebooks
src		src
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
production.py		production.py
requirements.txt		requirements.txt
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection

To-Do

Description

Model

Experiments

References & Citations

About

Releases

Packages

Languages

License

konkinit/fraud_detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection

To-Do

Description

Model

Experiments

References & Citations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages