The project consists of implementing an autoencoder-based fraud detector on customers' data
- Cache s3 access keys in docker image
One phenomenon businesses face undoubtedly is fraud. It is a situation where a customer has an irregular pattern of events (transactions, visits, ...) with a business. Two factions of customers emerge : the atypical or frauder and the typical customers. It is important to notice that fraud is rare event that is to say in a sample of 1000 customers, up to 5 appear to have a fraudulent behaviours. Gather, in a customer base, a large number of typical customers is then realistic conequently train a model aiming to identify regular behaviours and reconstruct a typical customer profile is possible. It turns out that AutoEncoders perform this task.
The model detecting fraud is an autoencoder trained on a group of customers labelled as typical ones on their closed relationship basis.
To get hands on the project there are two ways.
- Through repo cloning: this way allows model train or weights updating
- Clone the repo, get in the directory
fraud_detection/
git clone https://github.com/konkinit/fraud_detection.git
cd ./fraud_detection
pip install -r requirements.txt --upgrade
- Create a
.env
file
S3_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
S3_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
S3_REGION=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
- For training a new model, run the following command with the tuned args
python training.py --help
python training.py --mode 'train' --idmodel 'simulated_data' --trainingdatapath "data/fraudulent_obs_data.gzip" --cutoffevaldatapath "data/non_fraudulent_obs_data.gzip" --splitfrac 0.6 0.2 0.2 --codedim 5 --hiddendim 15 --lr 1e-3 --nepochs 50
- For updating weigths of an existing model (ensure the dimensions passed through the args are the same as the current model dimensions):
python training.py --mode 'retrain' --idmodel 'simulated_data' --trainingdatapath 'data/fraudulent_obs_data.gzip' --splitfrac 0.6 0.2 0.2 --codedim 5 --hiddendim 15 --lr 1e-3 --nepochs 50
- After training or retraining a model, inference on instances is done by running:
uvicorn production:app --port 8800 --reload
The endpoint looks like /customer_id/{customer}?model={model_id}
where {customer}
refers to an identifier of a customer and {model_id}
is the deployed fraud detector model.
- Through Docker image by running the following commands
docker pull kidrissa/fraud_detector_app:latest
docker run kidrissa/fraud_detector_app:latest -p 8800:8800
In a web navigator, connect to <container-ip>:8800