Click here for the arxiv version of Bagel Benchmark.
For node classification, we measure RDT-Fidelity, Sparsity and Correctness
### the pypi package is still in test phase, We will release it soon!!!
pip install bagel-benchmark
from bagel_benchmark import metrics
from bagel_benchmark.node_classification import utils
from bagel_benchmark.explainers.grad_explainer_node import grad_node_explanation
1. load the dataset and train the GNN model.
We run all our experiments on a servers with intel Xeon Silver 4210 CPU and an INVIDIA A100 GPU.
Hyperparameters settings for all dataset
GNN layers | epochs | optimizer | lr | weight decay |
---|---|---|---|---|
2 | 200 | Adam | 0.01 | 5e-4 |
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data_set="Cora"
dataset, data, results_path = utils.load_dataset(data_set)
data.to(device)
# We train 2 layers GNN models for 200 epochs.
# We use Adam optimizer with weight decay of 5e-4 and learnin rate of 0.01.
model = utils.GCNNet(dataset)
model.to(device)
#### train the GNN model
accuracy = utils.train_model(model,data)
2. Generate the explanation.
node = 10
feature_mask, node_mask = grad_node_explanation(model,node,data.x, data.edge_index)
print(feature_mask)
print(node_mask)
3. Finally we evaluate the explanation.
#### Calculate Sparsity
feature_sparsity = False
Node_sparsity = True
sparsity = metrics.sparsity(feature_sparsity, Node_sparsity, feature_mask, node_mask)
print(sparsity)
### Calculate RDT-Fidelity
feature_mask = torch.from_numpy(feature_mask).reshape(1,-1)
fidelity = metrics.fidelity(model, node, data.x,data.edge_index, feature_mask=feature_mask)
print(fidelity)
For graph classification we measure Faithfulness (comprehensiveness and sufficiency), Plausibility and RDT-Fidelity
We show a demo for Movie Reviews dataset. The raw Movie Reviews text dataset is stored in this folder.
1. Click here to create the graphs from the text.
for example, for the text"? romeo and juliet ' , and ? the twelfth night ' . it is easier for me to believe that he had a wet dream and that 's how all his plays develop , but please spare me all of this unnecessary melodrama."
2. We train the GNN.
We run all our experiments on a servers with intel Xeon Silver 4210 CPU and an INVIDIA A100 GPU.
hyperparameters settings
Dataset | GNN layers | epochs | optimizer | lr | weight decay | pooling |
---|---|---|---|---|---|---|
MUTAG | 2 | 200 | Adam | 0.01 | 0.0 | mean |
PROTEINS | 2 | 200 | Adam | 0.01 | 0.0 | mean |
Movie Reviews | 2 | 200 | Adam | 0.01 | 0.0 | mean |
ENZYMES | 2 | 200 | Adam | 0.001 | 0.0 | mean |
Details of GNNs
GNN | hidden units |
---|---|
GCN | 64 |
GAT | 64 |
GIN | 32 |
APPNP | 64 |
from bagel_benchmark.graph_classification import models
## The molecules dataset can be loaded as
data_set = "ENZYMES" ### Similalry MUTAG or PROTEINS can be loaded by replacing data_set="MUTAG" or "PROTEINS"
dataset = models.load_dataset(data_set)
#### load the movie review dataset and train the GNN model
from bagel_benchmark.graph_classification.utils_movie_reviews import load_dataset, train_gnn
from bagel_benchmark.metrics import suff_and_comp
from bagel_benchmark.explainers.grad_explainer_graph import grad_weights
train_loader, test_loader, test_dataset = load_dataset()
dataset_dim = [300,2] ### features size is 300 and there are 2 labels.
model = models.GCN(dataset_dim)
train_gnn(model, train_loader, test_loader)
3. Generate the explanation
#let idx is the index on the graph in the test loader
data = test_dataset[idx]
data.batch = torch.zeros(data.x.shape[0], device=device).long()
explanation = grad_weights(model, data)
4. Finally evaluate the explanation
suff, comp = suff_and_comp(idx, model,explanation,test_dataset)
If you find this benchmark useful in your research, Please consider citing our paper:
@article{rathee2022bagel,
title={BAGEL: A Benchmark for Assessing Graph Neural Network Explanations},
author={Rathee, Mandeep and Funke, Thorben and Anand, Avishek and Khosla, Megha},
journal={arXiv preprint arXiv:2206.13983},
year={2022}
}