HeaRT

Official code for the NeurIPS'23 paper "Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking", and ICLR'24 paper "Revisiting Link Prediction: A Data Perspective".

Installation

Please see the installation.md for how to install the proper requirements.

Download Data

All data can be downloaded by running the download_data.sh script:

cd HeaRT  # Must be in the root directory
bash download_data.sh

This includes the negative samples generated by HeaRT and the splits for Cora, Citeseer, and Pubmed. The data for the OGB datasets will be automatically downloaded from the ogb package.

Reproduce Results

The commands needed to reproduce all the results with the appropriate hyperparameters can be found in the scripts/hyparameters directory. We include a file for each dataset which includes the command to train and evaluate each possible method.

For example, to reproduce the results on ogbl-collab under the existing evaluation setting, the command for each method can be found in the ogbl-collab.sh file located in the scripts/hyperparameter/existing_setting_ogb/ directory.

To run the code, we need to first go to the appropriate setting directory. This includes:

benchmarking/exist_setting_small: Run models on Cora, Citeseer, and Pubmed under the existing setting.
benchmarking/exist_setting_ogb: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under the existing setting.
benchmarking/exist_setting_ddi: Run models on on ogbl-ddi under the existing setting.
benchmarking/HeaRT_small: Run models on Cora, Citeseer, and Pubmed under HeaRT.
benchmarking/HeaRT_ogb: Run models on ogbl-collab, ogbl-ppa, and ogbl-citation2 under HeaRT.
benchmarking/HeaRT_ddi/: Run models on ogbl-ddi under HeaRT.

Below we give examples of running GCN on the different groups of datasets under both settings:

Cora/Citeseer/Pubmed under the existing setting.

cd benchmarking/exist_setting_small/
python  main_gnn_CoraCiteseerPubmed.py  --data_name cora  --gnn_model GCN --lr 0.01 --dropout 0.3 --l2 1e-4 --num_layers 1  --num_layers_predictor 3 --hidden_channels 128 --epochs 9999 --kill_cnt 10 --eval_steps 5  --batch_size 1024

ogbl-collab under the existing setting (similar for ogbl-ppa and ogbl-citation2):

cd benchmarking/exist_setting_ogb/
python main_gnn_ogb.py  --use_valedges_as_input  --data_name ogbl-collab  --gnn_model GCN --hidden_channels 256 --lr 0.001 --dropout 0.  --num_layers 3 --num_layers_predictor 3 --epochs 9999 --kill_cnt 100  --batch_size 65536

ogbl-ddi under the existing setting:

cd benchmarking/exist_setting_ddi/
python main_gnn_ddi.py --data_name ogbl-ddi --gnn_model GCN  --lr 0.01 --dropout 0.5  --num_layers 3 --num_layers_predictor 3  --hidden_channels 256 --epochs 9999 --eval_steps 1 --kill_cnt 100 --batch_size 65536

Cora/Citeseer/Pubmed under HeaRT:

cd benchmarking/HeaRT_small/
python main_gnn_CoraCiteseerPubmed.py  --data_name cora  --gnn_model GCN  --lr 0.001 --dropout 0.5 --l2 0 --num_layers 1 --hidden_channels 256  --num_layers_predictor 3  --epochs 9999 --kill_cnt 10 --eval_steps 5  --batch_size 1024

ogbl-collab under HeaRT (similar for ogbl-ppa and ogbl-citation2):

cd benchmarking/HeaRT_ogb/
python main_gnn_ogb.py  --data_name ogbl-collab  --use_valedges_as_input --gnn_model GCN  --lr 0.001 --dropout 0.3 --num_layers 3 --hidden_channels 256  --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1  --batch_size 65536

ogbl-ddi under HeaRT:

cd benchmarking/HeaRT_ddi/
python main_gnn_ddi.py  --data_name ogbl-ddi   --gnn_model GCN --lr 0.01 --dropout 0 --num_layers 3 --hidden_channels 256  --num_layers_predictor 3 --epochs 9999 --kill_cnt 100 --eval_steps 1  --batch_size 65536

Generate Negative Samples using HeaRT

The set of negative samples generated by HeaRT, that were used in the study, can be reproduced via the scripts in the scripts/HeaRT/ directory.

A custom set of negative samples can be produced by running the heart_negatives/create_heart_negatives.py script. Multiple options exist to customize the negative samples. This includes:

The CN metric used. Can be either CN or RA (default is RA). Specified via the --cn-metric argument.
The aggregation function used. Can be either min or mean (default is min). Specified via the --agg argument.
The number of negatives generated per positive sample. Specified via the --num-samples argument (default is 500).
The PPR parameters. This includes the tolerance used for approximating the PPR (--eps argument) and the teleporation probability (--alpha argument). alpha is fixed at 0.15 for all datasets. For the tolerance, eps, we recommend following the settings found in scripts/HeaRT.

Updates

November 3rd, 2023

Modified the negative samples for ogbl-collab to allow train/valid positive samples to be negatives. Please see Appendix I in the paper for our rationale.

Feb 17th, 2024

Uploaded the implementation for the decoupled SEAL in the ICLR 2024 paper "Revisiting Link Prediction: A Data Perspective". The commands are available in the scripts/hyparameters under the existing setting.

Cite

@inproceedings{
  li2023evaluating,
  title={Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking},
  author={Li, Juanhui and Shomer, Harry and Mao, Haitao and Zeng, Shenglai and Ma, Yao and Shah, Neil and Tang, Jiliang and Yin, Dawei},
  booktitle={Neural Information Processing Systems {NeurIPS}, Datasets and Benchmarks Track},
  year={2023}
}

@article{mao2023revisiting,
  title={Revisiting link prediction: A data perspective},
  author={Mao, Haitao and Li, Juanhui and Shomer, Harry and Li, Bingheng and Fan, Wenqi and Ma, Yao and Zhao, Tong and Shah, Neil and Tang, Jiliang},
  journal={The Twelfth International Conference on Learning Representations},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
benchmarking		benchmarking
heart_negatives		heart_negatives
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_data.sh		download_data.sh
heart_env.yml		heart_env.yml
installation.md		installation.md
peg_env.yml		peg_env.yml
peg_requirements.txt		peg_requirements.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HeaRT

Installation

Download Data

Reproduce Results

Generate Negative Samples using HeaRT

Updates

Cite

About

Releases

Packages

Contributors 2

Languages

License

Juanhui28/HeaRT

Folders and files

Latest commit

History

Repository files navigation

HeaRT

Installation

Download Data

Reproduce Results

Generate Negative Samples using HeaRT

Updates

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages