A Comparison Between Models and Sampling Methods for Imbalanced Fraudulent Credit Card Data:
Given that credit card fraud in the UK has risen 55.7% during the years 2016 – 2020, the need for better fraud detection solutions is paramount. The aim of this paper is to find an answer to this predicament. Credit card transactional data by nature is imbalanced, biased and skewed due to the instances of fraud being much lower than that of non-fraud. To counter this bias, this research paper uses resampling methods to redistribute the data, in an attempt to build robust solutions. First, a comparison between Logistic Regression, Random Forest and Sequential DNN models will deduce which model generalizes best on imbalanced data. Synthetic Minority Oversampling Technique (SMOTE ), Random Under-Sampling (RUS), class weighting, and a hybrid approach, will then be utilised on the best performing model, in an effort to provide maximum MCC, Recall and F1 performance, with an emphasis on recall.