Skip to content


Repository files navigation

Fraud Case Study


Our team of three people was given 2 days to create a model and user interface to detect possible fraud. The system pulls json files from a API and classifies them storing the result in a mongoDB. A web user interface is then accessible for client review of activity.

Files: : This is used to build a pickle the tested model for use. It also contains the class FraudModel which incapsulates the functionality for the API's and Data Cleaning / preprocessing. : A class that executes the required proposing for the data before being modeled. Also cleans any new data before the model predicts the probability of fraud.

model.pkl : The stored model used in analysis of new data. a helper file used to connect to the db and send/receive data this launches the web server ( then it runs a function that hits the endpoint every second continuously updating the database


We used a RandomForest model on 12 fields. The classifier has three possible outcomes. The Model was trained with the idea that the 'acct_tyep' field will translate to the below.

0 : Not Fraud - 'premium' 1 : Maybe Fraud - 'spammer_warn', 'spammer_limited', 'spammer_noinvite', 'locked', 'tos_lock', 'tos_warn', 'fraudster_att', 'spammer_web', 'spammer' 2 : Fraud - 'fraudster_event', 'fraudster'


The current model has a cross validation score of 92%.


The following steps should still be taken: 1) Modifications to the Model. a) More analysis on the fields to find importance. b) Adding a TF-IDF classification on the Description field.