Skip to content

luisosorio3214/Credit-Card-Fraud-Detection-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Credit Card Fraud Detection

Badge Source

Authors

Table of Contents

Business Problem

Credit card frauds have always been a major concern for banking and financial institutions which result in unnecessary fees taken. Fraudster have several methods in achieving these unauthorized transactions and we wish to identify key components that help identify a fraudulent transaction. The idea is if we can correctly identify a fraudulent transaction we can stop the transaction from going through and save the money from being taken. In order to achieve this goal we need a policy that helps determine fraudulent transaction at the expense of not misclassifying a transaction as fraudulent which can also increase costs. We need an efficient system that balances the identification of normal and fraudulent transactions.

Data Source

Methods

  • Exploratory Data Analysis
  • Multivariate Analysis
  • Visualizations
  • Modeling
  • App Deployment

Tech Stack

  • Python (Machine Learning Modeling and App preparation)
  • AWS S3 (Model Storage)
  • Gradio (Interface for app)
  • Hugging Face (App Deployment)

Quick Glance at the Results

Correlation Matrix between numeric features.

Confusion Matrix of XGBoost Classifier (Testing Set).

XGboost Feature Importance Plot.

Top 3 models on the testing set (with default parameters)

Model Best Threshold F1 Score Accuracy Recall Precision
Logistic Regression 0.315789 79.48% 96.5% 76.67% 82.5%
Random Forest 0.315789 99.9% 99.9% 99.9% 100%
XGBoost 0.105263 99.9% 99.9% 99.9% 99.9%

  • Final Model used: XGBoost Classifier
  • Metric used: Recall, Precision, and Accuracy
  • What is the meaning behind the threshold?: The threshold is the probability set in order for us to classify a transaction as a fraudulent one. Therefore if the probability provided by our XGboost Classifier for being a fraudulent transaction was greater than 10.5%, then the transaction would be classified as a fraudulent one. This threshold was computed by checking random probabilities and picking the ones that returned the highest metrics when it came down to F1 score, accuracy, recall and precision. Remember we want to find a balance between correctly identifying fraudulent transaction and not misclassifying normal transaction as being fraud since this misidentification could cost a company a lot of money. Therefore, in terms of our metrics this means we want the best scores when it came down to only recall, precision, and accuracy. Where recall is the score that correctly identifies fraudulent transaction. Precision is the score in correctly identifying fraudulent transaction divided by the total number labelled as fraudulent transaction even the ones that weren't. Accuracy is the score of our model that correctly identifying the transactions correctly.

Lessons Learned and Recommendation

  • In this project I learned how to leverage feature importance using our Random Forest and Gradient Boosting models to determine what influences our response the most. Its important to note that some features might provide a negative influence to our response variable or a positive one. Since the goal for this project is to increase satisfactory level, we want to identify not only the top important features but also the ones that provide positive influence. For example, Cleanliness is a feature given and some logic would say as we decrease cleanliness so would satisfaction levels. This can also be said in reverse if we increase cleanliness then you would expect customers to be more satisfied with their experience, thus this feature provides a positive influence. A negative influence would say if we increase a feature then satisfaction level would decrease or vice-versa.

Limitation and what can be Improved

  • Note: Based on the feature importance plot we can see that ratio to median purchase price had one of the greatest influence in predicting a fraudulent transaction. Now the ratio is computed by taking the purchase price and dividing it by the median purchase price used on the card. Therefore, we must know the transaction price that fraudster performed, limiting the model from stopping the transaction before it even occurs. Also we must have a median purchase price on the card, but what if theres no history on the card. What do we use as the median purchase price for the card?
  • In order to improve our model we would require more attributes for our data. If you take a close look we are only using 7 predictors in order to determine our target/response of a fraudulent transaction. We can take more information on the card and the owner to determine more spending habits or calculate more probabilities that the transaction was in fact the owner.
  • Also in order to implement a system that stops a transaction before going through when the system believes it is a fraudulent one would require a whole new transactional system across all stores. Currently when you buy or receive a refund from a store, it may be instant to you but in the backend the process takes days. This leads to issues since we would want to stop transaction from going through therefore we would need some fast transactional system to achieve this. Just recently the U.S. government approved such systems which they have been testing since 2019, I believe it will go live this month in July 2023.

Run Locally

First, Open your Command line or Terminal and head to a directory where you want to save the project.

Initialize git

          
          git init
          
        

Clone the Project

          
          git clone https://github.com/luisosorio3214/Credit-Card-Fraud-Detection-.git
          
        

Head to project directory

          
          cd Credit-Card-Fraud-Detection-
          
        

Create a virtual environment using venv

          
          python -m venv "env_name"
          
        

Activate virtual environment

          For Window Users
          
            env_name\Scripts\activate
          
          For Mac Users
          
            source env_name/bin/activate
          
        

Install required dependencies from requirements.txt file

          
          pip install -r requirements.txt
          
        

Run the Python app

          
          python run app.py
          
        

A local link would be generated on your machine. Click the link and it would lead you to the app.

If you are having issues with Gradio, please follow the documentation here.

Explore the Jupyter Notebook

To explore the notebook click here.

Deployment on Gradio

To deploy this project on Gradio app, We will use Hugging Face - Spaces:
Create a Hugging Face Account, then you have three methods to deploy your Gradio app to Hugging Face Spaces.

  1. From terminal: run gradio deploy in your app directory. The CLI will gather some basic metadata and then launch your app. To update your space, you can re-run this command or enable the Github Actions option to automatically update the Spaces on git push.
  2. From your browser: Drag and drop a folder containing your Gradio model and all related files here.
  3. Connect Spaces with your Git repository and Spaces will pull the Gradio app from there. See this guide how to host on Hugging Face Spaces for more information.

App deployed on Gradio

Video to gif tool

Contribution

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change or contribute.

License

MIT License

Copyright (c) 2022 Stern Semasuka

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Learn more about MIT license