# PyGOD Demo on TigerGraph ML Workbench
This notebook demonstrates how to run Python Graph Outlier Detection (PyGOD) package on TigerGraph Database and TigerGraph ML workbench. Please install the TigerGraph server (https://docs.tigergraph.com/tigergraph-server/current/intro/) on your local machine or remote server first, read the data ingestion tutorial from Tigergraph (https://github.com/TigerGraph-DevLabs/mlworkbench-docs/tree/main/tutorials/basics) and download necessary data files.
We use the Cora data for demo.

In [3]:
# install the following packages on your environment
#!pip install torch
#!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cpu.html
#!pip install pyTigerGraph[gds]
#!pip install pygod

## Data Ingestion


In [3]:
import pyTigerGraph as tg

conn = tg.TigerGraphConnection(
    host="http://127.0.0.1", # Change the address to your database server's
    username="tigergraph",
    password="tigergraph"
)

In [8]:
print(conn.gsql("CREATE GRAPH Cora()"))

conn.graphname = "Cora"
# Create and run schema change job
print(conn.gsql(open("./mlworkbench-docs-main/tutorials/basics/data/cora/gsql/schema.gsql", "r").read()))

# Create loading job
print(conn.gsql(open("./mlworkbench-docs-main/tutorials/basics/data/cora/gsql/load.gsql", "r").read()))

Stopping GPE GSE RESTPP
Successfully stopped GPE GSE RESTPP in 1.533 seconds
Starting GPE GSE RESTPP
Successfully started GPE GSE RESTPP in 0.090 seconds
The graph Cora is created.
Using graph 'Cora'
Successfully created schema change jobs: [Cora_job].
Kick off schema change job Cora_job
Doing schema change on graph 'Cora' (current version: 0)
Trying to add local vertex 'Paper' to the graph 'Cora'.
Trying to add local edge 'Cite' to the graph 'Cora'.

Graph Cora updated to new version 1
The job Cora_job completes in 5.404 seconds!
Using graph 'Cora'
Successfully created loading jobs: [load_cora_data].


In [9]:
# Load data
conn.runLoadingJobWithFile("./mlworkbench-docs-main/tutorials/basics/data/cora/nodes.csv", "node_csv", "load_cora_data")
conn.runLoadingJobWithFile("./mlworkbench-docs-main/tutorials/basics/data/cora/edges.csv", "edge_csv", "load_cora_data")

[{'sourceFileName': 'Online_POST',
  'statistics': {'validLine': 10556,
   'rejectLine': 0,
   'failedConditionLine': 0,
   'notEnoughToken': 0,
   'invalidJson': 0,
   'oversizeToken': 0,
   'vertex': [],
   'edge': [{'typeName': 'Cite',
     'validObject': 10556,
     'noIdFound': 0,
     'invalidAttribute': 0,
     'invalidVertexType': 0,
     'invalidPrimaryId': 0,
     'invalidSecondaryId': 0,
     'incorrectFixedBinaryLength': 0}],
   'deleteVertex': [],
   'deleteEdge': []}}]

## Connect to TigerGraph Database

In [10]:
conn = tg.TigerGraphConnection(
    host="http://127.0.0.1", # Change the address to your database server's
    graphname="Cora",
    username="tigergraph",
    password="tigergraph",
    useCert=False
)

## Install UDF

In [11]:
ExprFunctions="https://tg-mlworkbench.s3.us-west-1.amazonaws.com/udf/1.0/ExprFunctions.hpp"  # For enterprise users, please use the link you received.
ExprUtil=""  # For enterprise users, please use the link you received.
conn.installUDF(ExprFunctions, ExprUtil)

ExprFunctions installed succesfully


In [12]:
conn.getVertexCount('*')

{'Paper': 2708}

In [13]:
conn.getEdgeCount()

{'Cite': 10556}

## Load the PyG Graph object from TigerGraph DB

In [14]:
graph_loader = conn.gds.graphLoader(
    v_in_feats=["x"],
    v_out_labels=["y"],
    v_extra_feats=["train_mask", "val_mask", "test_mask"],
    num_batches=1,
    output_format="PyG",
    shuffle=False
)

Installing and optimizing queries. It might take a minute if this is the first time you use this loader.
Query installation finished.


In [15]:
data = graph_loader.data

data

Data(edge_index=[2, 10556], x=[2708, 1433], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

In [16]:
import torch
data.x = data.x.to(torch.float)

## Import PyGOD and Inject Outliers

In [17]:
# train a dominant detector

from pygod.generator import gen_structural_outliers, gen_contextual_outliers

# data, y_outlier = gen_structural_outliers(data, 20, 5)
data, y_outlier = gen_contextual_outliers(data, 100, 50)

## Initialize the model

In [18]:
from pygod.models import DOMINANT
model = DOMINANT(num_layers=4, epoch=10, verbose=True)  # hyperparameters can be set here



## Fit the data

In [19]:
model.fit(data, y_outlier)  # data is a Pytorch Geometric data object

Epoch 0000: Loss 4.4854 | AUC 0.5460
Epoch 0001: Loss 11.2645 | AUC 0.5892
Epoch 0002: Loss 2.8404 | AUC 0.6133
Epoch 0003: Loss 2.7395 | AUC 0.6118
Epoch 0004: Loss 2.9122 | AUC 0.6065
Epoch 0005: Loss 2.7683 | AUC 0.6094
Epoch 0006: Loss 2.6530 | AUC 0.6135
Epoch 0007: Loss 2.5622 | AUC 0.6078
Epoch 0008: Loss 2.5093 | AUC 0.6065
Epoch 0009: Loss 2.4759 | AUC 0.6038


DOMINANT(act=<function relu at 0x10c860a60>, alpha=tensor(0.2518),
     batch_size=2708, contamination=0.1, dropout=0.3, epoch=10, gpu=None,
     hid_dim=64, lr=0.005, num_layers=4, num_neigh=-1, verbose=True,
     weight_decay=0.0)

## Get outlier scores on the input data

In [20]:
outlier_scores = model.decision_scores_ # raw outlier scores on the input data
print(outlier_scores)

[2.09913087 2.01937318 2.58339119 ... 1.40805268 1.57680655 3.41978168]


## Get outlier scores on the new data

In [21]:
outlier_scores = model.decision_function(data) # raw outlier scores on the input data  # predict raw outlier scores on test
print(outlier_scores)

[2.06214285 1.99151468 2.56358337 ... 1.38927293 1.55375409 3.40630579]
