Skip to content

IBM/Multi-GNN

Repository files navigation

Multi-GNN

This repository contains all models and adaptations needed to run Multi-GNN for Anti-Money Laundering. The repository consists of four Graph Neural Network model classes (GIN, GAT, PNA, RGCN) and the below-described model adaptations utilized for financial crime detection in Egressy et al.. Note that this repository solely focuses on the Anti-Money Laundering use case. This repository has been created for experiments in Provably Powerful Graph Neural Networks for Directed Multigraphs [AAAI 2024] and Realistic Synthetic Financial Transactions for Anti-Money Laundering Models [NeurIPS 2023].

Setup

To use the repository, you first need to install the conda environment via

conda env create -f env.yml

Then, the data needed for the experiments can be found on Kaggle. To use this data with the provided training scripts, you first need to perform a pre-processing step for the downloaded transaction files (e.g. HI-Small_Trans.csv):

python format_kaggle_files.py /path/to/kaggle-files/HI-Small_Trans.csv

Make sure to change the filepaths in the data_config.json file. The aml_data path should be changed to wherever you stored the formatted_transactions.csv file generated by the pre-processing step.

Usage

To run the experiments you need to run the main.py function and specify any arguments you want to use. There are two required arguments, namely --data and --model. For the --data argument, make sure you store the different datasets in different folders. Then, specify the folder name, e.g --data Small_HI. The --model parameter should be set to any of the model classed that are available, i.e. to one of --model [gin, gat, rgcn, pna]. Thus, to run a standard GNN, you need to run, e.g.:

python main.py --data Small_HI --model gin

Then you can add different adaptations to the models by selecting the respective arguments from:

Argument Description
--emlps Edge updates via MLPs
--reverse_mp Reverse Message Passing
--ego Ego ID's to the center nodes
--ports Port Numberings for edges
Thus, to run Multi-GIN with edge updates, you would run the following command:
python main.py --data Small_HI --model gin --emlps --reverse_mp --ego --ports

Additional functionalities

There are several arguments that can be set for additional functionality. Here's a list with them:

Argument Description
--tqdm Displays a progress bar during training and inference.
--save_model Saves the best model to the specified model_to_save path in the data_config.json file. Requires argment --unique_name to be specified.
--finetune Loads a previously trained model (with name given by --unique_name and stored in model_to_load path in the data_config.json) to be finetuned.
--inference Loads a previously trained model (with name given by --unique_name and stored in model_to_load path in the data_config.json) to do inference only.

Licence

Apache License Version 2.0, January 2004