<a href="https://colab.research.google.com/github/realfolkcode/GraphDiffusionAnomaly/blob/main/notebooks/gda_benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Anomaly Detection in Networks via Score-Based Generative Models

[Paper](https://arxiv.org/abs/2306.15324) | [Github](https://github.com/realfolkcode/GraphDiffusionAnomaly)

This notebook demonstrates how our paper can be reproduced. First, GDSS models are trained with randomly chosen hyperparameters. It uses *matrix distances* as a dissimilarity measure to calculate anomaly scores.

**Remark:** We recommend to run this notebook in Google Colab. To run it locally, modify/append the paths to import our modules and do not clone the repository in the Setup section.

## Setup

In [None]:
!pip install  dgl -f https://data.dgl.ai/wheels/cu117/repo.html
!pip install  dglgo -f https://data.dgl.ai/wheels-test/repo.html

In [None]:
!pip install torch_geometric
!pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html

In [None]:
!git clone https://github.com/realfolkcode/GraphDiffusionAnomaly

In [5]:
%cd GraphDiffusionAnomaly/

/content/GraphDiffusionAnomaly


In [None]:
!python -m pip install -r requirements.txt

## Training and Inference

Restart the runtime

In [1]:
%cd GraphDiffusionAnomaly

/content/GraphDiffusionAnomaly


Let us run the benchmark on the Disney dataset. To do this, we choose the corresponding config file and name our experiment (`exp_name`) after the dataset (checkpoints will be named accordingly).

In [None]:
!python run_benchmark.py --config GDSS/config/disney.yaml --exp_name disney --trajectory_sample 4 --num_sample 3 --radius 1

## Evaluation

In [8]:
import numpy as np
import torch
import pygod
from pygod.utils import load_data
from pygod.metric.metric import *

import matplotlib.pyplot as plt

from utils import calculate_snr
from GDSS.parsers.config import get_config

In [9]:
dataset_name = 'disney'

y = load_data(dataset_name).y.bool()
k = sum(y)

In [10]:
exp_name = f'{dataset_name}'

Unweighted anomaly scores

In [13]:
auc, ap, rec = [], [], []

for i in range(20):
    with open(f'{exp_name}_{i}_final_scores.npy', 'rb') as f:
        x_scores = np.load(f).sum(axis=-1)
        adj_scores = np.load(f).sum(axis=-1)

    for alpha in [0.2, 0.5, 0.8]:
        score = (1 - alpha) * x_scores + alpha * adj_scores
        score = torch.from_numpy(np.nan_to_num(score))
        auc.append(eval_roc_auc(y, score))
        ap.append(eval_average_precision(y, score))
        rec.append(eval_recall_at_k(y, score, k))

In [14]:
print("AUC: {:.4f}±{:.4f} ({:.4f})\t"
      "AP: {:.4f}±{:.4f} ({:.4f})\t"
      "Recall: {:.4f}±{:.4f} ({:.4f})".format(np.mean(auc), np.std(auc),
                                              np.max(auc), np.mean(ap),
                                              np.std(ap), np.max(ap),
                                              np.mean(rec), np.std(rec),
                                              np.max(rec)))

AUC: 0.6514±0.1124 (0.7980)	AP: 0.1445±0.0686 (0.3075)	Recall: 0.1361±0.1198 (0.3333)


SNR weighted anomaly scores

In [15]:
auc, ap, rec = [], [], []

trajectory_sample = 4
T_lst = np.linspace(0, 1, trajectory_sample + 2, endpoint=True)[1:-1]
config = get_config(f'GDSS/config/{dataset_name}.yaml', 0)
time_penalties = np.sqrt(calculate_snr(T_lst, config.sde.x))
print(T_lst)
print(time_penalties)

for i in range(20):
    with open(f'{exp_name}_{i}_final_scores.npy', 'rb') as f:
        x_scores = np.load(f)
        adj_scores = np.load(f)
        x_scores = np.dot(x_scores, time_penalties)
        adj_scores = np.dot(adj_scores, time_penalties)

    for alpha in [0.2, 0.5, 0.8]:
        score = (1 - alpha) * x_scores + alpha * adj_scores
        score = torch.from_numpy(np.nan_to_num(score))
        auc.append(eval_roc_auc(y, score))
        ap.append(eval_average_precision(y, score))
        rec.append(eval_recall_at_k(y, score, k))

[0.2 0.4 0.6 0.8]
[5.08122887 2.90480548 2.00573859 1.49932927]


In [16]:
print("AUC: {:.4f}±{:.4f} ({:.4f})\t"
      "AP: {:.4f}±{:.4f} ({:.4f})\t"
      "Recall: {:.4f}±{:.4f} ({:.4f})".format(np.mean(auc), np.std(auc),
                                              np.max(auc), np.mean(ap),
                                              np.std(ap), np.max(ap),
                                              np.mean(rec), np.std(rec),
                                              np.max(rec)))

AUC: 0.6316±0.1194 (0.7811)	AP: 0.1385±0.0650 (0.3355)	Recall: 0.1306±0.1101 (0.3333)
