Skip to content

sunshy-1/PHE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs

version version preprint TKDD PyTorch

This is the Pytorch implementation for our TKDD'25 paper: PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs.

Abstract

In recent years, graph neural networks (GNNs) have facilitated the development of graph data mining. However, training GNNs requires sufficient labeled task-specific data, which is expensive and sometimes unavailable. To be less dependent on labeled data, recent studies propose to pre-train GNNs in a self-supervised manner and then apply the pre-trained GNNs to downstream tasks with limited labeled data. However, most existing methods are designed solely for homogeneous graphs (real-world graphs are mostly heterogeneous) and do not consider semantic mismatch (the semantic difference between the original data and the ideal data containing more transferable semantic information). In this paper, we propose an effective framework to pre-train GNNs on the large-scale heterogeneous graph. We first design a structure-aware pre-training task, which aims to capture structural properties in heterogeneous graphs. Then, we design a semantic-aware pre-training task to tackle the mismatch. Specifically, we construct a perturbation subspace composed of semantic neighbors to help deal with the semantic mismatch. Semantic neighbors make the model focus more on the general knowledge in the semantic space, which in turn assists the model in learning knowledge with better transferability. Finally, extensive experiments are conducted on real-world large-scale heterogeneous graphs to demonstrate the superiority of the proposed method over state-of-the-art baselines. The overall framework is as follows:

Framework

Environment Requirement

# More details can be seen in ./code/requirements.txt.
torch==1.8.1+cu111 
torch-cluster==1.5.9  
torch-scatter==2.0.6  
torch-sparse==0.6.10  
torch-spline-conv==1.2.1  
torch-geometric==1.4.3

Dataset

You can download the datasets from link, [password] joha, and put them in the file ./code/OAG_dataset.

Run the Code

cd code && bash scripts.sh

Acknowledgment of Open-Source Code Contributions

The code is based on the open-source repositories: HGT and GPT-GNN, many thanks to the authors!

You are welcome to cite our paper:

@inproceedings{SunMa25,
  author = {Sun, Shengyin and Chen, Ma and Chen, Jiehao},
  title = {PHE: Structure and Semantic Enhanced Pre-Training of Graph Neural Networks for Large-Scale Heterogeneous Graphs},
  year = {2025},
  booktitle = {ACM Transactions on Knowledge Discovery from Data},
  pages = {1–26}
}

About

[TKDD'25] PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors