This is an external contribution to OGB, please refer to the NeuRIPS paper for the detailed setting.
ogb>=1.3.4
torch>=1.7.0
torch-geometric==master (pip install git+https://github.com/rusty1s/pytorch_geometric.git)
This repository includes the following example scripts:
- MLP: Full-batch MLP training based on Node2Vec features. This script requires node embeddings be saved in
embedding.pt
. To generate them, please runpython node2vec.py
[requirestorch-geometric>=1.5.0
]. - GNN: Full-batch GNN training using either the GCN or GraphSAGE operator (
--use_sage
) [requirestorch-geometric>=1.6.0
]. - Matrix Factorization: Full-batch Matrix Factorization training.
In order to select the values, we employed Grid Search using wandb.ai based sweeps (c.f. yaml-configuration files). Due to the huge size of our dataset, hyperparameter tuning is challenging. To overcome this challenge, we subsample a region of the mouse brain in order to create a small graph. To ensure we are not introducing any bias, we measured the KL-Divergence to ensure that our small graph is representative of the whole brain in its distribution of vasculature. We selected the best set of hyperparameters on the small graph and used it on the actual graph with small modifications if needed.
# Run with default config
python gnn.py
# Run with custom config
python gnn.py --hidden_channels=128
# Use Node2Vec embeddings
python gnn.py --hidden_channels=128 --use_node_embedding
Following the NeurIPS paper, we propose to use a spatial sampling strategy that constitutes a more challening and realistic scenario in comparison to random sampling. Random sampling ultimately creates unrealistic edges, i.e. vessels connecting totally different brain regions or unrealistic structures. These are very easy to identify for a classifier, leading to overly optimistic scores on the link prediction task. In contrast, spatial sampling (as proposed in our paper) creates more challenging edges, and biologically more realistic structures. These are harder to differentiate from the (actual) edges in the whole brain graph. Employing the spatial sampling criteria, we force the link predictor to learn more meaningful representations that will lead to biologically accurate results when using the link prediction algorithm for missing link prediction or graph completion. We kindly ask you to employ the spatial sampling structure for negative edges by making use of the presampled negative edges.
Model | Highest Valid Accuracy (%) | Final Test Accuracy (%) | #num_params | Hardware |
---|---|---|---|---|
MF | 49.99 ± 0.06 | 49.97 ± 0.05 | 8641 | GeForce Quadro RTX 8000 Ti (48GB GPU) |
MLP | 48.01 ± 1.32 | 47.94 ± 1.33 | 1037577 | GeForce Quadro RTX 8000 Ti (48GB GPU) |
GCN | 43.49 ± 9.61 | 43.53 ± 9.61 | 396289 | GeForce Quadro RTX 8000 Ti (48GB GPU) |
GraphSAGE | 49.93 ± 6.76 | 49.89 ± 6.78 | 396289 | GeForce Quadro RTX 8000 Ti (48GB GPU) |
GCN + Node2Vec | 49.60 ± 0.61 | 49.54 ± 0.57 | 226744513 | GeForce Quadro RTX 8000 Ti (48GB GPU) |
GraphSAGE + Node2VEc | 47.36 ± 1.36 | 47.35 ± 1.36 | 226892737 | GeForce Quadro RTX 8000 Ti (48GB GPU) |
Please consider citing this work if any of our code or dataset is helpful for your research. Considering the specific graphs and baseline models please also cite the respective original articles as described in the preprint.
@misc{paetzold2021brain,
title={Whole Brain Vessel Graphs: A Dataset and Benchmark for Graph Learning and Neuroscience (VesselGraph)},
author={Johannes C. Paetzold and Julian McGinnis and Suprosanna Shit and Ivan Ezhov and Paul Büschl and Chinmay Prabhakar and Mihail I. Todorov and Anjany Sekuboyina and Georgios Kaissis and Ali Ertürk and Stephan Günnemann and Bjoern H. Menze},
year={2021},
eprint={2108.13233},
archivePrefix={arXiv},
primaryClass={cs.LG}
}