This is the implementation of PMGT described in the paper: "Pre-training Graph Transformer with Multimodal Side Information for Recommendation" (In ACM MM2021).
To run the code, you need the following dependencies
-
pre-training:
- Python 3
- TensorFlow-gpu 1.13.1
- graphlearn 1.0.1
-
downstream:
- Python 3
- PyTorch 1.8.1
- config_file/ : hyper-parameters;
- data/ : data pre-processing & pre-processed file;
- down_rec/ : implementation of downstream tasks;
- utils/ : optimization and bert modules and so on.
The pre-trained representations of items in the video game dataset can be downloaded from here. Please move the unzipped files to the folder 'data/video/', and then run the codes of downstream tasks directly.
Using the pre-trained item representations.
$ python run_rec.py --data_type video --pretrain 1 --lr 0.001 --l2_re 0
Using the randomly initialized item representations.
$ python run_rec.py --data_type video --pretrain 0 --lr 0.001 --l2_re 0
Using the pre-trained item representations.
$ python run_ctr.py --data_type_video --pretrain 1 --lr 0.001 --l2_re 0.0001
Using the randomly initialized item representations.
$ python run_ctr.py --data_type_video --pretrain 0 --lr 0.001 --l2_re 0.0001
The experimental datasets are collected from the Amazon Review Datasets.
- Video Games
- Toys and Games
- Tools and Home Improvement
Using the original data to build the pre-training graph dataset and downstream task dataset.
$ python data_process.py
Note that the experimental datasets used in the original paper are processed based on some internal APIs. Thus, there exist some difference between the following experimental statistics and the statistics reported in the original paper.
Datasets | Data for Downstream tasks | Item Graph | Threshold | |||
# Users | # Items | # Interact. | # Nodes | # Edges | ||
VG | 27,988 | 6,551 | 98,278 | 7,252 | 88,606 | 3 |
TG | 118,153 | 6,238 | 294,507 | 6,451 | 15,363 | 4 |
THI | 164,717 | 5,751 | 431,455 | 5,982 | 12,927 | 3 |
$ python main.py --data_type video --is_train 1
$ python main.py --data_type video --is_train 0
See the detailed in Quick Validation
Datasets | Methods | Top-N Recommendation | |||
REC-R@10 | REC-R@20 | REC-N@10 | REC-N@20 | ||
VG | |||||
NCF | 0.1698 | 0.2510 | 0.0970 | 0.1192 | |
NCF-PMGT | 0.2588 | 0.3518 | 0.1688 | 0.1945 | |
TG | |||||
NCF | 0.2598 | 0.3295 | 0.1942 | 0.2129 | |
NCF-PMGT | 0.2926 | 0.3682 | 0.2194 | 0.2397 | |
THI | |||||
NCF | 0.2687 | 0.3188 | 0.2232 | 0.2367 | |
NCF-PMGT | 0.2909 | 0.3509 | 0.2390 | 0.2552 |