# Credit Card Fraud Detection 

This gnn-xgboost-credit-card-fraud-detection folder contains notebooks for data preprocessing, a GNN-based XGBoost training model, and an inference notebook for the credit card fraud detection use case.

Original source - https://github.com/nv-morpheus/morpheus-experimental/tree/branch-24.10/ai-credit-fraud-workflow


## Background

Background 
Transaction fraud is expected to exceed [$43B by 2026](https://nilsonreport.com/articles/card-fraud-losses-worldwide/) and poses a significant challenge upon financial institutions to detect and prevent sophisticated fraudulent activities. Traditionally, financial institutions have relied upon rules based techniques which are reactive in nature and result in higher false positives and lower fraud detection accuracy. As data volumes and attacks have become more sophisticated, accelerated machine and graph learning techniques become mandatory and is a more proactive approach. AI for fraud detection uses multiple machine learning models to detect anomalies in customer behaviors and connections as well as patterns of accounts and behaviors that fit fraudulent characteristics.

Traditional data science pipelines lack the necessary acceleration to handle the volumes of data involved in fraud detection, resulting in slower processing times, which limits real-time data analysis and detection of fraud. To efficiently manage large-scale datasets and deliver real-time performance for AI in production, financial institutions must shift from legacy infrastructure to accelerated computing.

This Fraud Detection AI workflow offers enterprises an end-to-end solution using the NVIDIA accelerated computing platform for GPU-accelerated data processing and AI deployment, enabling real-time analysis and detection of credit card transaction fraudulent activities.


## Key Features
- Data Preprocessing – Clean, preprocess, and prepare the datasets required for training the GNN and XGBoost models.
- Train GNN-based XGBoost Model – Train the GNN model to extract transaction node embeddings and use those embeddings to train the XGBoost model for classification.
- Inference for GNN-based XGBoost Model with Triton – Deploy the trained models to the NVIDIA Triton Inference Server for scalable deployment in production.

## Architectural Diagram

### Training Pipeline

![training_pipeline](https://gitlab-master.nvidia.com/sdp/ps-service-packages/-/blob/credit-card-fraud-detection/rag-jumpstart-nim-cybersecurity/images/credit_fraud_training_pipeline.png)

### Inference Pipeline

![inference_pipeline](https://gitlab-master.nvidia.com/sdp/ps-service-packages/-/blob/credit-card-fraud-detection/rag-jumpstart-nim-cybersecurity/images/credit_fraud_inference_pipeline.png)


## Technology Stack:

- [cuDF for data processing](https://github.com/rapidsai/cudf)
    - cuDF is designed for data processing on a single GPU. If you want to scale, you can try Dask cuDF for distributed data processing.
- [cuML for ML algorithms](https://github.com/rapidsai/cuml) 
- [cuGraph-pyg for graph analysis (creating graph, neighbor sampling)](https://github.com/rapidsai/cugraph)
    - Sampling with cuGraphpyg is around 3X - 6X faster over state-of-the-art approaches
- py-xgboost-gpu 
- [NVIDIA Triton Inference Server](https://github.com/triton-inference-server)
    - Explore more configuration options for inference [here](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html#default-max-batch-size-and-dynamic-batcher) such as dynamic batching, ModelInstanceGroup to specify number of GPUs instances, ratelimiter for resource management. 
    - You can also create ensemble pipelines for more complex workflows.


## How to Run the Use Case 
- Clone the repository (gnn-xgboost-credit-card-fraud-detection) and make sure that your shell or command prompt is pointing to gnn-xgboost-credit-fraud-workflow before running the following command 
    - ```mamba env create -f conda/fraud_conda_env.yaml```
- After installing, activate the conda environment using the following command.
    - ```conda activate fraud_conda_env```
- Go to the notebooks folder, run ```jupyter notebook```, and start executing the notebooks (.ipynb files) in the following order, step by step:
    - Data Preprocessing
    - Train GNN-based XGBoost Model
    - Inference for GNN-based XGBoost Model with Triton


## Upcoming Features:
The NIM based GNN will be released early next year. 


## Projects

- Explore the ai-credit-card-workflow and other cybersecurity related use cases in the morpheus-experimental repo [here](https://github.com/nv-morpheus/morpheus-experimental/tree/branch-25.02)
- For more information on scaling data processing, check out [cuDF](https://github.com/rapidsai/cudf) and [Dask cuDF](https://docs.rapids.ai/api/dask-cudf/stable/)
- To learn more about machine learning algorithms and GNNs, look into [cuML](https://github.com/rapidsai/cuml) and [cuGraph-GNN](https://github.com/rapidsai/cugraph-gnn), respectively
- If you're aiming for high-throughput and low-latency inference deployments, explore the [NVIDIA Triton Inference Server](https://github.com/triton-inference-server) and adjust the configuration according to your use case in the inference notebook. 