Skip to content

All in One: Multi-task Prompting for Graph Neural Networks, KDD 2023.

Notifications You must be signed in to change notification settings

hezheqiao2022/ProG

 
 

Repository files navigation

Testing Status Testing Status Testing Status Testing Status Testing Status

| Quick Start | Website | Paper | Video | Media Coverage | Call For Contribution |

Big News!

  • We are so happy to announce that we have finished most updating works from ProG to ProG++! (the main branch of this repository. If you wish to find the original ProG package, go to the ori branch)
  • From v0.2, the term "ProG" means ProG++ by default!

🌟ProG++🌟: A Unified Python Library for Graph Prompting

ProG++ (the main branch of this repository) is an extended library of the original ProG (see in the ori branch of this repository), which supports more graph prompt models. Some implemented models are as follows (We are now implementing more related models and we will keep integrating more models to ProG++):

  • [All in One] X. Sun, H. Cheng, J. Li, B. Liu, and J. Guan, “All in One: Multi-Task Prompting for Graph Neural Networks,” KDD, 2023
  • [GPF Plus] T. Fang, Y. Zhang, Y. Yang, C. Wang, and L. Chen, “Universal Prompt Tuning for Graph Neural Networks,” NeurIPS, 2023.
  • [GraphPrompt] Liu Z, Yu X, Fang Y, et al. Graphprompt: Unifying pre-training and downstream tasks for graph neural networks. The Web Conference, 2023.
  • [GPPT] M. Sun, K. Zhou, X. He, Y. Wang, and X. Wang, “GPPT: Graph Pre-Training and Prompt Tuning to Generalize Graph Neural Networks,” KDD, 2022
  • [GPF] T. Fang, Y. Zhang, Y. Yang, and C. Wang, “Prompt tuning for graph neural networks,” arXiv preprint, 2022.

We released a comprehensive survey on graph prompt!

Xiangguo Sun, Jiawen Zhang, Xixi Wu, Hong Cheng, Yun Xiong, Jia Li.

Graph Prompt Learning: A Comprehensive Survey and Beyond

in arXiv https://arxiv.org/abs/2311.16534

(under review in TKDE)

In this survey, we present more details of ProG++ and also release a repository🦀 for a comprehensive collection of research papers, benchmark datasets, and readily accessible code implementations.

The Architecture of ProG++


🌹Please cite our work if you find help for you:

@inproceedings{sun2023all,
  title={All in One: Multi-Task Prompting for Graph Neural Networks},
  author={Sun, Xiangguo and Cheng, Hong and Li, Jia and Liu, Bo and Guan, Jihong},
  booktitle={Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery \& data mining (KDD'23)},
  year={2023},
  pages = {2120–2131},
  location = {Long Beach, CA, USA},
  isbn = {9798400701030},
  url = {https://doi.org/10.1145/3580305.3599256},
  doi = {10.1145/3580305.3599256}
}

@article{sun2023graph,
  title = {Graph Prompt Learning: A Comprehensive Survey and Beyond},
  author = {Sun, Xiangguo and Zhang, Jiawen and Wu, Xixi and Cheng, Hong and Xiong, Yun and Li, Jia},
  year = {2023},
  journal = {arXiv:2311.16534},
  eprint = {2311.16534},
  archiveprefix = {arxiv}
}


@article{zhao2024all,
      title={All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining}, 
      author={Haihong Zhao and Aochuan Chen and Xiangguo Sun and Hong Cheng and Jia Li},
      year={2024},
      eprint={2402.09834},
      archivePrefix={arXiv}
}


@inproceedings{gao2024protein,
  title={Protein Multimer Structure Prediction via {PPI}-guided Prompt Learning},
  author={Ziqi Gao and Xiangguo Sun and Zijing Liu and Yu Li and Hong Cheng and Jia Li},
  booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
  year={2024},
  url={https://openreview.net/forum?id=OHpvivXrQr}
}


@article{chen2024prompt,
      title={Prompt Learning on Temporal Interaction Graphs}, 
      author={Xi Chen and Siwei Zhang and Yun Xiong and Xixi Wu and Jiawei Zhang and Xiangguo Sun and Yao Zhang and Yinglong Zhao and Yulin Kang},
      year={2024},
      eprint={2402.06326},
      archivePrefix={arXiv},
      journal = {arXiv:2402.06326}
}

@article{li2024survey,
      title={A Survey of Graph Meets Large Language Model: Progress and Future Directions}, 
      author={Yuhan Li and Zhixun Li and Peisong Wang and Jia Li and Xiangguo Sun and Hong Cheng and Jeffrey Xu Yu},
      year={2024},
      eprint={2311.12399},
      archivePrefix={arXiv},
      journal = {arXiv:2311.12399}
}


Quick Start

Pre-train your GNN model

We have designed four pre_trained class (Edgepred_GPPT, Edgepred_Gprompt, GraphCL, SimGRACE), which is in ProG.pretrain module, you can pre_train the model by running pre_train.py and setting the parameters you want.

from ProG.utils import mkdir, load_data4pretrain
from ProG import PreTrain

mkdir('./pre_trained_gnn/')

from ProG.pretrain import Edgepred_GPPT, Edgepred_Gprompt, GraphCL, SimGRACE

# pt = Edgepred_GPPT(dataset_name = 'Cora', gnn_type = 'GCN', hid_dim = 128, gln =3, num_epoch=100)

# pt = Edgepred_GPPT(dataset_name = 'MUTAG', gnn_type = 'GCN', hid_dim = 128, gln =3, num_epoch=100)
# pt = Edgepred_Gprompt(dataset_name = 'Cora', gnn_type = 'GCN', hid_dim = 128, gln =3, num_epoch=100)
pt = GraphCL(dataset_name = 'ENZYMES', gnn_type = 'GCN', hid_dim = 128, gln =3, num_epoch=50)
# pt = SimGRACE(dataset_name = 'MUTAG', gnn_type = 'GCN', hid_dim = 128, gln =3, num_epoch=50)

pt.pretrain()

Do the Downstreamtask

in downstreamtask.py, we designed 3 task(Node classification, edge prediction, graph classification) Here are some examples

from ProG.tasker import NodeTask, LinkTask, GraphTask
from ProG.prompt import GPF, GPF_plus, GPPTPrompt, GPrompt, LightPrompt

tasker = NodeTask(pre_train_model_path = 'None', 
                  dataset_name = 'Cora', num_layer = 3, gnn_type = 'GCN', prompt_type = 'gpf', shot_num = 5)

# tasker = LinkTask(pre_train_model_path = './pre_trained_gnn/Cora.Edgepred_Gprompt.GCN.pth', 
#                      dataset_name = 'Cora', gnn_type = 'GAT', prompt_type = 'None')

# tasker = GraphTask(pre_train_model_path = './pre_trained_gnn/MUTAG.SimGRACE.GCN.128hidden_dim.pth', 
#                      dataset_name = 'MUTAG', gnn_type = 'GCN', prompt_type = 'gpf', shot_num = 50)

# tasker = GraphTask(pre_train_model_path = 'None', 
#                      dataset_name = 'MUTAG', gnn_type = 'GCN', prompt_type = 'ProG', shot_num = 20)

# tasker = GraphTask(pre_train_model_path = 'None', 
#                      dataset_name = 'ENZYMES', gnn_type = 'GCN', prompt_type = 'None', shot_num = 50)
tasker.run()

Kindly note that the comparison takes the same pre-trained pth.The absolute value of performance won't mean much because the final results may vary depending on different pre-training states.It would be more interesting to see the relative performance with other training paradigms.

Contact

  • For More Information, Further discussion, Contact: Website
  • Email: xiangguosun at cuhk dot edu dot hk

Media Coverage

Media Reports

Online Discussion

Other research papers released by us

Call for Contributors!

Once you are invited as a contributor, you would be asked to follow the following steps:

  • step 1. create a temp branch (e.g. xgTemp) from the main branch (latest branch).
  • step 2. fetch origin/xgTemp to your local xgTemp, and make your own changes via PyCharm etc.
  • step 3. push your changes from local xgTemp to your github cloud branch: origin/xgTemp.
  • step 4. open a pull request to merge from your branch to main.

When you finish all these jobs. I will get a notification and approve merging your branch to main. Once I finish, I will delete your branch, and next time you will repeat the above jobs.

A widely tested main branch will then be merged to the stable branch and a new version will be released based on stable branch.

TODO List

Note Current experimental datasets: Node/Edge:Cora/Citeseer/Pubmed; Graph:MUTAG

  • Write a comprehensive usage document(refer to pyG)
  • Dataset: support more graph-level datasets, PROTEINS, IMDB-BINARY, REDDIT-BINARY, ENZYMES; Add node-level datasets.
  • Write a tutorial, and polish data code, to make our readers feel more easily to deal with their own data. That is to: (1) provide a demo/tutorial to let our readers know how to deal with data; (2) polish data code, making it more robust, reliable, and readable.
  • Pre_train: implementation of DGI. (Deep Graph Infomax), InfoGraph, contextpred, AttrMasking, ContextPred, GraphMAE, GraphLoG, JOAO
  • Add Prompt: prodigy (NeurIPS'2023 Spotlight)
  • induced graph(1.better way to generate induced graph/2.simplify the 3 type of generate-func)
  • add prompt type table (prompt_type, prompt paradigm, loss function, task_type)
  • add pre_train type table
  • support deep GNN layers by adding the feature DeepGCNLayer

Dataset

Graphs Graph classes Avg. nodes Avg. edges Node features Node classes Task (N/E/G)
Cora 1 2,708 5,429 1,433 7 N
Pubmed 1 19,717 88,648 500 3 N
CiteSeer 1 3,327 9,104 3,703 6 N
Mutag 188 17.9 39.6 ? 7 N
Reddit 1 232,965 23,213,838 602 41 N
Amazon 1 13,752 491,722 767 10 N
Flickr 1 89,250 899,756 500 7 N
PROTEINS 1,113 39.06 72.82 1 3 N, G
ENZYMES 600 32.63 62.14 18 3 N, G

Prompt Class

Graphs Task (N/E/G)
GPF N , G
GPPTPrompt N
GPrompt N, E, G
ProGPrompt N, G

Environment Setup

--Python 3.9.17 

--PyTorch 2.0.1 

--torch-geometric 2.3.1

installation for PYG quick start

pip install torch_geometric

pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu118.html # Optional dependencies

or run this command

conda install pyg -c pyg

About

All in One: Multi-task Prompting for Graph Neural Networks, KDD 2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%