### Installing packages

The packages specified in the paper's GitHub repository are:

pytorch 1.9.0

dgl 0.8.1

sympy

argparse

scikit-learn

PyTorch 1.9.0 requires an older version of Python to be installed, therefore we install Python 3.8

In [None]:
!apt-get install python3.8 python3.8-dev python3-pip -y


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib mailcap mime-support
  python3-setuptools python3-wheel python3.8-minimal
Suggested packages:
  python-setuptools-doc python3.8-venv binfmt-support
The following NEW packages will be installed:
  libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib mailcap mime-support
  python3-pip python3-setuptools python3-wheel python3.8 python3.8-dev python3.8-minimal
0 upgraded, 12 newly installed, 0 to remove and 49 not upgraded.
Need to get 13.5 MB of archives.
After this operation, 53.0 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 mailcap all 3.70+nmu1ubuntu1 [23.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 mime-support all 3.66 [3,696 B]
Get:3 http://archive.ubuntu.com/ubuntu jammy-updat

Installing PyTorch 1.9.0 gave many problems, we increased the version until there was one that did not give installation problem and allowed the script to work without hitches.

In [None]:

!pip install torch==1.11.0+cu102  -f https://download.pytorch.org/whl/torch_stable.html

# Install DGL 0.8.1 (compatible with PyTorch 1.9.0 and CUDA 11.1)
!pip install dgl==0.8.1 -f https://data.dgl.ai/wheels/repo.html

# Install other dependencies
!pip install sympy scikit-learn


Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.11.0+cu102
  Downloading https://download.pytorch.org/whl/cu102/torch-1.11.0%2Bcu102-cp310-cp310-linux_x86_64.whl (750.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m750.6/750.6 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 2.5.1+cu121
    Uninstalling torch-2.5.1+cu121:
      Successfully uninstalled torch-2.5.1+cu121
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
peft 0.13.2 requires torch>=1.13.0, but you have torch 1.11.0+cu102 which is incompatible.
torchaudio 2.5.1+cu121 requires torch==2.5.1, but you have torch 1.11.0+cu102 which is incompatible.
torchvision 0.20.1+cu121 requires torch==2.5.1, but you have torch 1.11.0+cu102 which is 

### Importing the dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


This is an Object defined by the authors of the paper, we are only interested in being able to use it as a wrapper for the **tfinance** dataset.

In [None]:
#from dgl.data import FraudYelpDataset, FraudAmazonDataset
from dgl.data.utils import load_graphs, save_graphs
import dgl
import numpy as np
import torch


class Dataset:
    def __init__(self, name='tfinance', homo=True, anomaly_alpha=None, anomaly_std=None):
        self.name = name
        graph = None
        if name == 'tfinance':
            graph, label_dict = load_graphs('/content/drive/My Drive/tfinance')
            graph = graph[0]
            graph.ndata['label'] = graph.ndata['label'].argmax(1)

            if anomaly_std:
                graph, label_dict = load_graphs('/content/drive/My Drive/tfinance')
                graph = graph[0]
                feat = graph.ndata['feature'].numpy()
                anomaly_id = graph.ndata['label'][:,1].nonzero().squeeze(1)
                feat = (feat-np.average(feat,0)) / np.std(feat,0)
                feat[anomaly_id] = anomaly_std * feat[anomaly_id]
                graph.ndata['feature'] = torch.tensor(feat)
                graph.ndata['label'] = graph.ndata['label'].argmax(1)

            if anomaly_alpha:
                graph, label_dict = load_graphs('/content/drive/My Drive/tfinance')
                graph = graph[0]
                feat = graph.ndata['feature'].numpy()
                anomaly_id = list(graph.ndata['label'][:, 1].nonzero().squeeze(1))
                normal_id = list(graph.ndata['label'][:, 0].nonzero().squeeze(1))
                label = graph.ndata['label'].argmax(1)
                diff = anomaly_alpha * len(label) - len(anomaly_id)
                import random
                new_id = random.sample(normal_id, int(diff))
                # new_id = random.sample(anomaly_id, int(diff))
                for idx in new_id:
                    aid = random.choice(anomaly_id)
                    # aid = random.choice(normal_id)
                    feat[idx] = feat[aid]
                    label[idx] = 1  # 0

        elif name == 'tsocial':
            graph, label_dict = load_graphs('dataset/tsocial')
            graph = graph[0]

        elif name == 'yelp':
            dataset = FraudYelpDataset()
            graph = dataset[0]
            if homo:
                graph = dgl.to_homogeneous(dataset[0], ndata=['feature', 'label', 'train_mask', 'val_mask', 'test_mask'])
                graph = dgl.add_self_loop(graph)
        elif name == 'amazon':
            dataset = FraudAmazonDataset()
            graph = dataset[0]
            if homo:
                graph = dgl.to_homogeneous(dataset[0], ndata=['feature', 'label', 'train_mask', 'val_mask', 'test_mask'])
                graph = dgl.add_self_loop(graph)
        else:
            print('no such dataset')
            exit(1)

        graph.ndata['label'] = graph.ndata['label'].long().squeeze(-1)
        graph.ndata['feature'] = graph.ndata['feature'].float()
        print(graph)

        self.graph = graph

Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable.  Valid options are: pytorch, mxnet, tensorflow (all lowercase)


DGL backend not selected or invalid.  Assuming PyTorch for now.


We instatiate the *Dataset* object with the **tfinance** dataset.

In [None]:
data = Dataset()

Graph(num_nodes=39357, num_edges=42445086,
      ndata_schemes={'label': Scheme(shape=(), dtype=torch.int64), 'feature': Scheme(shape=(10,), dtype=torch.float32)}
      edata_schemes={})


## Preliminary analysis

First we assess what data is available.

It seems that there is only Node features and not Edge features.

In [None]:
if data.graph.ndata:
    print("Node features:", data.graph.ndata.keys())

# Check edge features
if data.graph.edata:
    print("Edge features:", data.graph.edata.keys())

Node features: dict_keys(['label', 'feature'])
