GitHub - sunshy-1/PHE: [TKDD'25] PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs

PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs

This is the Pytorch implementation for our TKDD'25 paper: PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs.

Abstract

In recent years, graph neural networks (GNNs) have facilitated the development of graph data mining. However, training GNNs requires sufficient labeled task-specific data, which is expensive and sometimes unavailable. To be less dependent on labeled data, recent studies propose to pre-train GNNs in a self-supervised manner and then apply the pre-trained GNNs to downstream tasks with limited labeled data. However, most existing methods are designed solely for homogeneous graphs (real-world graphs are mostly heterogeneous) and do not consider semantic mismatch (the semantic difference between the original data and the ideal data containing more transferable semantic information). In this paper, we propose an effective framework to pre-train GNNs on the large-scale heterogeneous graph. We first design a structure-aware pre-training task, which aims to capture structural properties in heterogeneous graphs. Then, we design a semantic-aware pre-training task to tackle the mismatch. Specifically, we construct a perturbation subspace composed of semantic neighbors to help deal with the semantic mismatch. Semantic neighbors make the model focus more on the general knowledge in the semantic space, which in turn assists the model in learning knowledge with better transferability. Finally, extensive experiments are conducted on real-world large-scale heterogeneous graphs to demonstrate the superiority of the proposed method over state-of-the-art baselines. The overall framework is as follows:

Environment Requirement

# More details can be seen in ./code/requirements.txt.
torch==1.8.1+cu111 
torch-cluster==1.5.9  
torch-scatter==2.0.6  
torch-sparse==0.6.10  
torch-spline-conv==1.2.1  
torch-geometric==1.4.3

Dataset

You can download the datasets from link, [password] joha, and put them in the file ./code/OAG_dataset.

Run the Code

cd code && bash scripts.sh

Acknowledgment of Open-Source Code Contributions

The code is based on the open-source repositories: HGT and GPT-GNN, many thanks to the authors!

You are welcome to cite our paper:

@inproceedings{SunMa25,
  author = {Sun, Shengyin and Chen, Ma and Chen, Jiehao},
  title = {PHE: Structure and Semantic Enhanced Pre-Training of Graph Neural Networks for Large-Scale Heterogeneous Graphs},
  year = {2025},
  booktitle = {ACM Transactions on Knowledge Discovery from Data},
  pages = {1–26}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
code		code
fig		fig
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs

Abstract

Environment Requirement

Dataset

Run the Code

Acknowledgment of Open-Source Code Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PHE: Structure and Semantic Enhanced Pre-training of Graph Neural Networks for Large-Scale Heterogeneous Graphs

Abstract

Environment Requirement

Dataset

Run the Code

Acknowledgment of Open-Source Code Contributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages